Successful Install on Watchguard Firebox X700!
-
Folks,
I've been following this large thread for quite some time now. I have a few posts a few pages back, where I commented on the fact that I aquired one of these boxes and tried pfsense. Of course, I got the timeout errors.
Today, I decided to try again with the latest 2.0 build (1/1/11). Low and behold, no timeouts!! The box has had an uptime of 9 hours with 4 ports (interfaces) configured as well as 3 or 4 IPSEC tunnels. It's also being used in production with no timeouts showing in system.log. The only issue I had was that I needed to remove the crypto card for IPSEC traffic to pass. No idea why, however I'm not too bothered as 1.2Ghz is plenty for me.
Have the watchdog timeouts been fixed, and are these cheap boxes excellent little pfsense gems?
-
Have the watchdog timeouts been fixed, and are these cheap boxes excellent little pfsense gems?
If that's true then it's great news. However I wouldn't get your hopes up just yet. Reading back through this and other threads on this issue, people have seemingly solved the timeout problem before only for it to come back after some time.
Have there been any changes to the re driver recently?Steve
-
If that's true then it's great news. However I wouldn't get your hopes up just yet. Reading back through this and other threads on this issue, people have seemingly solved the timeout problem before only for it to come back after some time.
Have there been any changes to the re driver recently?
SteveThere have not been changes to the driver, but rather in the way that pfsense 2.0 works.
By disabling device polling, hardware checksum offload, hardware tcp segmentation offload and hardware large receive offload, as well as changing the system tunables net.inet.tcp.tso and hw.bce.tso_enable to 0, watchdog timeouts are a thing of the past.Except for one situation: When accessing the webgui on a macbook pro over a 2.4ghz wirelss N connection coming from a first generation Apple time capsule, timeouts are thrown up.
Attempts to replicate this through other wireless base stations, different connections and different devices have failed, which leads me to believe that this is a different issue entirely.
That laptop is never used for accessing the webgui, so it is irrelevant to me.As usual, ymmv of course.
Edit: typo, shuffle sections
-
There have not been changes to the driver, but rather in the way that pfsense 2.0 works.
By disabling device polling, hardware checksum offload, hardware tcp segmentation offload and hardware large receive offload, as well as changing the system tunables net.inet.tcp.tso and hw.bce.tso_enable to 0, watchdog timeouts are a thing of the past.Are you saying that these things are changed by default? I haven't touched any of those settings.
BTW, I don't have access to the serial console. If timeouts were being thrown, where would I see them? In system.log?
I've had the traffic graph up all night and no matter what I do, I have yet to see one timeout with this build. Even Windows CIFS transfers work between interfaces
-
Are you saying that these things are changed by default? I haven't touched any of those settings.
BTW, I don't have access to the serial console. If timeouts were being thrown, where would I see them? In system.log?
A few settings have been changed by default, but I changed them all manually a while ago, just to be sure.
Timeouts are seen in console, system.log and felt by having a non-responsive internet connection / webgui. -
Which build are you running?
-
Which build are you running?
At the moment, my pfsense version is 2.0-BETA5 (i386) built on Sat Jan 1 17:53:01 EST 2.
I usually update once a week on saturday. -
At the moment, my pfsense version is 2.0-BETA5 (i386) built on Sat Jan 1 17:53:01 EST 2.
I usually update once a week on saturday.Im not far from you:
2.0-BETA5 (i386) built on Sat Jan 1 19:56:40 EST 2011Have you seen timeouts at all with this current build?
-
Im not far from you:
2.0-BETA5 (i386) built on Sat Jan 1 19:56:40 EST 2011Have you seen timeouts at all with this current build?
No.
In fact, I haven't seen any timeouts whatsoever using any 2.0b4 build (ignoring an odd issue with a macbook pro) since this post:
http://forum.pfsense.org/index.php/topic,25870.msg147085.html#msg147085 -
In fact, I haven't seen any timeouts whatsoever using any 2.0b4 build (ignoring an odd issue with a macbook pro) since this post:
http://forum.pfsense.org/index.php/topic,25870.msg147085.html#msg147085Did the MBP cause timeouts on the build you're using today?
Also, reading your other post, I do experience the "went backwards" error at bootup, however it doesn't stop anything from working.
I'm using an HP Procurve switch between my pfsense and machines. In my inital testing, I did have my laptop plugged directly into the FB, but still no timeouts were seen
-
Just to keep everyone updated, I ran some Windows CIFS tests with my laptop connected directly to a port on the Firebox. The CIFS servers is connected to another interface however there is a switch between server and FB.
My first test was 5 or 6 files totalling around 1GB. My second test was lots of smallish (30MB) files totalling around 200MB. During the tests, I had a Traffic Graph open in Firefox on my desktop machine (connected to same interface as CIFS server).
Not a single watchdog timeout happened. I have yet to see any timeouts on my current build (2.0-BETA5 (i386) built on Sat Jan 1 19:56:40 EST 2011), and the box has had an uptime of 1 day, 03:19 with 4 interfaces activated (5 during my CIFS tests).
The only things I noticed during my CIFS tests was that it was capped at around 60Mbps, and as I removed the network cable from my laptop after the 2nd test (after a few hours of inactivity), I noticed that "check_reload_status: Linkup starting re5" was displayed in system.log, but this is probably normal.
The capping issue could be due to some default config changed that has potentially stopped the timeouts. But that's ok as within our company, I have designated these boxes for use in "Medium Traffic Sites", or at least I will once we've had a few weeks of no timeouts.
(Btw, The "Low Traffic Sites" have ALIX 2D3 and the "High Traffic Sites" have Supermicro Servers)
What you all think?
Thanks
-
It wouldn't surprise me to find that the 60Mbps cap is a result of the low quality Realtek NICs, especially since all the offloading options have been disabled.
Edit: Thinking about it the offloading options are supposed to free up the cpu not the NIC so in fact, unless the cpu is maxxed out, this may be the faster setup.
-
It wouldn't surprise me to find that the 60Mbps cap is a result of the low quality Realtek NICs, especially since all the offloading options have been disabled.
Yeah, you're probably right. The ALIX boards which are good quality parts cap at around 80Mbps, so 60Mbps on low quality hardware seems ok. I'm just interested to hear from anyone who is having timeout issues with the current 2.0 build. I may buy another x700 and deploy it at another office so at least we will have 2 real-life tests going on
-
I'm really trying to get to the bottom of why things are beginning to work now. Can any devs comment on if any defaults have changed? Looking at the FreeBSD code:
http://svn.freebsd.org/viewvc/base/stable/8/sys/dev/re/if_re.c?view=log
you will notice that the Firebox NICS (8139) hasn't been mentioned since April 2009 (which in fact does address the timeout issue on the FB NICs). So either the Apr 09 fix didn't make it to pfSense until recently, or that a more recent driver change has fixed it (i.e. the Apr 09 fix did nothing). Or pfsense default tuning has changed
I'm clueless as to how the freebsd svn code relates to what goes into pfSense beta snapshots..
Any clues are appreciated
Cheers
-
Did the MBP cause timeouts on the build you're using
Honestly, it only happens when browsing pfsense's webgui, so I don't know.
The machine has been used for all kinds of other Internet related activities, so I'm confident that it doesn't cause timeouts during normal use.Also, regarding the speed of the firebox:
I use the fireboxe to firewall a 120/10 connection and it reaches a sustained 98.8 mbit down while uploading at 9.89 mbit with around 48% CPU usage (tends to fluctuate up and down a lot).
When I only just got the fireboxes I ran some tests using a 100/100 connection and while the machine still threw up watchdog timeouts back then (February or march) it was able to firewall 98/98 mbit when testing with FTP transfers. -
I'm really trying to get to the bottom of why things are beginning to work now. Can any devs comment on if any defaults have changed? Looking at the FreeBSD code:
The change was not in FreeBSD but in pfsense. The defaults were changed such that all the cpu offloading was turned off. You can still turn it back on manually.
Also, regarding the speed of the firebox:
I use the fireboxe to firewall a 120/10 connection and it reaches a sustained 98.8 mbit down while uploading at 9.89 mbit with around 48% CPU usage (tends to fluctuate up and down a lot).
When I only just got the fireboxes I ran some tests using a 100/100 connection and while the machine still threw up watchdog timeouts back then (February or march) it was able to firewall 98/98 mbit when testing with FTP transfers.That's interesting. It ties in better with Watchguards claimed 275/300Mbps 'firewall throughput', under their linux based OS.
Steve
-
@iFloris, that's interesting stats regarding your throughput. It could just my CIFS server or something. I guess I need to set up an HTTP server locally and test via that. However, 60Mbps is ok for me as my fastest WAN connection is 15/1
Now for some more test results:
This time, I transfered a whole directory of files from a CIFS server to my laptop which is plugged directly into the FB. The CIFS servers goes via an HP procurve switch. I also ran "cat /dev/random > /dev/null" from a SSH shell. I was also viewing the RRD graph. The whole test lasted about 6 or 7 minutes.
Not a single timeout :D
Pics attached. You'll notice a dip in traffic in the RRD graph. Not sure what this was about (Probably just Windows CIFS being silly). You'll notice not a single timeout in system.log (also attached)
-
That's interesting. It ties in better with Watchguards claimed 275/300Mbps 'firewall throughput', under their linux based OS.
That's interesting indeed, I never knew that Watchguard claimed such a throughput.
This morning I had to get a large file for a project and thought I'd post the throughput and cpu use as a reference.It would seem that the cpu usage / speed ratio I reporter earlier either changed somewhat in the past months or that there is some process going on that I don't know about causing a few percents of cpu usage on the firebox.
See attached image, sorry about all the white space.
Pfsense reports a speed in of 98.47 and out of 3.47 mbit at a cpu usage of around 50% with less than 500 states.
This is with a minimal amount of firewall rules, as I only have a few port forwards configured.
Packages installed are minimal; OpenVPN Client Export Utility, RRD Summary, Unbound, arpwatch, ifBWStats, phpSysInfo and vnstat2.Also, I hadn't noticed before that the cpu graphs to the above right note something way different to what the bar in system information shows.
-
That's really interesting. There are always people asking what throughput different hardware is capable of, at last here's some numbers! :)
Watchguards specs are here: http://www.watchguard.com/products/x2500.asp
You have to look at the top X-Core model as all ther others are software restricted.
You've inspired me to do some testing. However to do this I'd have to swapout my box as my wan connection is only ~10Mbps. Would a test between two of the other ports be equivalent? I can't see why not they are still firewalled.
Then I'm possibly up against streaming a file at a sufficient speed. There must be a software package for doing this that doesn't rely on disk speed anyone recommend anything?Steve
-
So here is my dreaded post. I finally managed to make my x700 timeout :(
It happens when I VNC from my LAN (re2 in my case) to a server on the other side of an IPSEC tunnel.
Does anyone have any clues on what I can try to do to fix this?