Attention Firebox X Series Users - Testing Needed
-
I have to say, ever since this thread was started I've been having more and more watchdog timeouts.
In previous builds I would get them very seldom (not even once a day), but since whatever code was changed to fix this I am seeing them, on average, 5 times a day, often for about a minute each time.
Nothing has changed configuration wise between builds, I'm running WAN on re0, LAN on re1, DMZ on re2 and Wifi on re4/5. From what I can tell I never have any timeouts on re0/1.
In any case, I'll give it a few more builds, but unless we can get it fixed I'll have to roll back to the earlier build as the system, as it stands today, is getting to be unusable (and yes, I know, don't run v2.0 in production :p).
-
There have only been 2 patches that have even made it into the publicly downloadable builds, and 1 of them was late late yesterday. My guess is that you are seeing more watchdog timeouts due to something changing in your environment as opposed to changes in pfSense– since, with respect to the Realtek interface code, it has only changed twice (and one of those changes was for the better), the other patch that just got put in yesterday will likely be rolled back since it did not seem to improve things (and it actually may have made it worse). Working with the driver maintainer is challenging since there is a 17 hour time difference between he and I, plus I need to have his patches incorporated and wait for a new build to test before I can get back to him.
Like I said, stay tuned. When I've worked out something that appears to have solved it, I will need people like you to beat it up--originally, I thought I had it licked (since it solved the particular problem that I was causing), but there are still others present.
-
Hello Dimitri
got a firebox x500 off of ebay and was hoping i wouldnt run into the watchdog errors with the realtec network cards, but i wasnt that lucky.
gave pfSense-1.2.3-20090708-1942 snapshot a test tonight and i am able to reproduce the watchdog errors with the cat /dev/urandom test or even by installing the NUT package and then going to the web interface "Services -> NUT" about 5-8 secs after the page starts to load i get the following error in the console:
re2: watchdog timeout
once this error pops up in the console screen i am unable to ping to/from that interface until i hardpower off the x500 device. From the console if i hit 5 "Reboot System" or type reboot pfsense starts running the shutdown process but then stops at the "Rebooting…" message.
re2:watchdog timeout re2:watchdog timeout re2:watchdog timeout # reboot pflog0: promiscuous mode disabled TWaiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...4 2 0 0 done All buffers synced. Uptime: 38m24s Rebooting...
Another oddity
Running "halt system" from the console menu works until i hit the "press any key to reboot" part. As soon as i hit the "AnyKey" the speaker on the x500 screams like crazy. Both "reboot" and "halt/reboot" work just fine until the watchdog errors starts to pop up. Any other debugging i can do on my end to help?
-loki
pciconf -lcv
hostb0@pci0:0:0:0: class=0x060000 card=0x11308086 chip=0x11308086 rev=0x04 hdr=0x00 class = bridge subclass = HOST-PCI cap 09[88] = vendor (length 4) Intel cap 14 version 1 cap 02[a0] = AGP 2x 1x SBA disabled pcib1@pci0:0:1:0: class=0x060400 card=0x00000000 chip=0x11318086 rev=0x04 hdr=0x01 class = bridge subclass = PCI-PCI pcib2@pci0:0:30:0: class=0x060400 card=0x00000000 chip=0x244e8086 rev=0x05 hdr=0x01 class = bridge subclass = PCI-PCI isab0@pci0:0:31:0: class=0x060100 card=0x00000000 chip=0x24408086 rev=0x05 hdr=0x00 class = bridge subclass = PCI-ISA atapci0@pci0:0:31:1: class=0x010180 card=0x24408086 chip=0x244b8086 rev=0x05 hdr=0x00 class = mass storage subclass = ATA ral0@pci0:2:6:0: class=0x028000 card=0x3c421186 chip=0x02011814 rev=0x01 hdr=0x00 class = network cap 01[40] = powerspec 2 supports D0 D3 current D0 re0@pci0:2:9:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re1@pci0:2:10:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re2@pci0:2:11:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re3@pci0:2:12:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re4@pci0:2:13:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re5@pci0:2:14:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0
dmesg
Copyright (c) 1992-2009 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.2-RELEASE-p2 #0: Wed Jul 8 19:39:37 EDT 2009 sullrich@FreeBSD-7_2-RELENG_1_2-snapshots.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.7 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Celeron(TM) CPU 1200MHz (1202.73-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x6b4 Stepping = 4 Features=0x383f9ff <fpu,vme,de,pse,tsc,msr,pae,mce,cx8,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse>real memory = 536870912 (512 MB) avail memory = 508604416 (485 MB) wlan: mac acl policy registered kbd1 at kbdmux0 cryptosoft0: <software crypto="">on motherboard pcib0: <intel 82815="" (i815="" gmch)="" host="" to="" hub="" bridge="">pcibus 0 on motherboard pir0: <pci 11="" interrupt="" routing="" table:="" entries="">on motherboard $PIR: Using invalid BIOS IRQ 9 from 2.13.INTA for link 0x63 pci0: <pci bus="">on pcib0 agp0: <intel 82815="" (i815="" gmch)="" host="" to="" pci="" bridge="">on hostb0 pcib1: <pci-pci bridge="">at device 1.0 on pci0 pci1: <pci bus="">on pcib1 pcib2: <pcibios pci-pci="" bridge="">at device 30.0 on pci0 pci2: <pci bus="">on pcib2 ral0: <ralink technology="" rt2560="">mem 0xefefe000-0xefefffff irq 3 at device 6.0 on pci2 ral0: MAC/BBP RT2560 (rev 0x04), RF RT2525 ral0: Ethernet address: 00:0f:a3:74:4a:7a ral0: [ITHREAD] re0: <realtek 10="" 8139c+="" 100basetx="">port 0xd500-0xd5ff mem 0xefefa000-0xefefa1ff irq 10 at device 9.0 on pci2 re0: Chip rev. 0x74800000 re0: MAC rev. 0x00000000 miibus0: <mii bus="">on re0 rlphy0: <realtek internal="" media="" interface="">PHY 0 on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re0: Ethernet address: 00:90:7f:2f:1a:63 re0: [FILTER] re1: <realtek 10="" 8139c+="" 100basetx="">port 0xd600-0xd6ff mem 0xefefb000-0xefefb1ff irq 5 at device 10.0 on pci2 re1: Chip rev. 0x74800000 re1: MAC rev. 0x00000000 miibus1: <mii bus="">on re1 rlphy1: <realtek internal="" media="" interface="">PHY 0 on miibus1 rlphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re1: Ethernet address: 00:90:7f:2f:1a:64 re1: [FILTER] re2: <realtek 10="" 8139c+="" 100basetx="">port 0xd900-0xd9ff mem 0xefefc000-0xefefc1ff irq 11 at device 11.0 on pci2 re2: Chip rev. 0x74800000 re2: MAC rev. 0x00000000 miibus2: <mii bus="">on re2 rlphy2: <realtek internal="" media="" interface="">PHY 0 on miibus2 rlphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re2: Ethernet address: 00:90:7f:2f:1a:65 re2: [FILTER] re3: <realtek 10="" 8139c+="" 100basetx="">port 0xda00-0xdaff mem 0xefefd000-0xefefd1ff irq 12 at device 12.0 on pci2 re3: Chip rev. 0x74800000 re3: MAC rev. 0x00000000 miibus3: <mii bus="">on re3 rlphy3: <realtek internal="" media="" interface="">PHY 0 on miibus3 rlphy3: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re3: Ethernet address: 00:90:7f:2f:1a:66 re3: [FILTER] re4: <realtek 10="" 8139c+="" 100basetx="">port 0xdd00-0xddff irq 9 at device 13.0 on pci2 re4: Chip rev. 0x74800000 re4: MAC rev. 0x00000000 miibus4: <mii bus="">on re4 rlphy4: <realtek internal="" media="" interface="">PHY 0 on miibus4 rlphy4: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re4: Ethernet address: 00:90:7f:2f:1a:67 re4: [FILTER] re5: <realtek 10="" 8139c+="" 100basetx="">port 0xde00-0xdeff irq 6 at device 14.0 on pci2 re5: Chip rev. 0x74800000 re5: MAC rev. 0x00000000 miibus5: <mii bus="">on re5 rlphy5: <realtek internal="" media="" interface="">PHY 0 on miibus5 rlphy5: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re5: Ethernet address: 00:90:7f:2f:1a:68 re5: [FILTER] isab0: <pci-isa bridge="">at device 31.0 on pci0 isa0: <isa bus="">on isab0 atapci0: <intel ich2="" udma100="" controller="">port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 31.1 on pci0 ata0: <ata 0="" channel="">on atapci0 ata0: [ITHREAD] ata1: <ata 1="" channel="">on atapci0 ata1: [ITHREAD] cpu0 on motherboard pmtimer0 on isa0 orm0: <isa option="" rom="">at iomem 0xe0000-0xe0fff pnpid ORM0000 on isa0 atkbdc0: <keyboard controller="" (i8042)="">at port 0x60,0x64 on isa0 atkbd0: <at keyboard="">irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] ppc0: <parallel port="">at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/16 bytes threshold ppbus0: <parallel port="" bus="">on ppc0 ppbus0: [ITHREAD] plip0: <plip network="" interface="">on ppbus0 plip0: WARNING: using obsoleted IFF_NEEDSGIANT flag lpt0: <printer>on ppbus0 lpt0: Interrupt-driven port ppi0: <parallel i="" o="">on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A, console sio0: [FILTER] sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled unknown: <pnp0c01>can't assign resources (memory) unknown: <pnp0303>can't assign resources (port) speaker0: <pc speaker="">at port 0x61 pnpid PNP0800 on isa0 unknown: <pnp0501>can't assign resources (port) unknown: <pnp0401>can't assign resources (port) RTC BIOS diagnostic error 20 <config_unit>Timecounter "TSC" frequency 1202733008 Hz quality 800 Timecounters tick every 1.000 msec IPsec: Initialized Security Association Processing. ad2: DMA limited to UDMA33, controller found non-ATA66 cable ad2: 57231MB <ic25n060atmr04 0="" mo3oad5a="">at ata1-master UDMA33 GEOM: ad2: partition 1 does not start on a track boundary. GEOM: ad2: partition 1 does not end on a track boundary. Trying to mount root from ufs:/dev/ad2s1a re2: link state changed to UP re2: link state changed to DOWN re0: link state changed to UP re0: link state changed to DOWN re2: link state changed to UP re0: link state changed to UP re1: link state changed to DOWN re3: link state changed to DOWN re4: link state changed to DOWN re5: link state changed to DOWN pflog0: promiscuous mode enabled</ic25n060atmr04></config_unit></pnp0401></pnp0501></pc></pnp0303></pnp0c01></parallel></printer></plip></parallel></parallel></at></keyboard></isa></ata></ata></intel></isa></pci-isa></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></ralink></pci></pcibios></pci></pci-pci></intel></pci></pci></intel></software></fpu,vme,de,pse,tsc,msr,pae,mce,cx8,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse>
vmstat -i
interrupt total rate irq0: clk 2445041 1000 irq4: sio0 736 0 irq7: ppbus0 ppc0 1 0 irq8: rtc 312912 127 irq10: re0 849 0 irq11: re2 3204 1 irq15: ata1 10586 4 Total 2773329 1134
-
Not yet– Pyun has a couple of paid projects so the progress on this issue is at a bit of a standstill.
I will preface that what I am about to say may be completely rediculous, but my understanding is that the WatchGuard OS is based on Linux. If that's true, then perhaps someone can look at the Linux driver and compare it to the BSD one? Obviously they are structured differently and this may not make sense, but when the WatchGuard box is running the WatchGuard software, the firebox is a very stable unit, so someone knows how to make these realtek chips work!
-
Not yet – Pyun has a couple of paid projects so the progress on this issue is at a bit of a standstill.
ok thanks for the update.
then perhaps someone can look at the Linux driver and compare it to the BSD one?
I will try a debian install on the X500 some time this weekend.
Think the linux driver might have the same issue with the realtek drivers, its hard to find of the issue was ever fix or people just started using other network cards.
google around for:
"8139c problem oversized ethernet frame"
"realtec 8139c Abnormal interrupt"http://www.joshua.raleigh.nc.us/docs/linux-2.4.10_html/286454.html
http://article.gmane.org/gmane.linux.drivers.realtek.devel/420The X500 does have a pci slot, gonna try using a old sun pci quad port 10/100 network card which works in another pc running 1.2.3.rc2 version of pfsense. At least this should prove the X500 motherboard doesnt have issues controlling acpi/dma/interrupts of network cards.
Here is a pic of the sun card
http://www.sun.com/products/networking/ethernet/sunquadfastethernet/images/I1_hw_quadfastether_pci_i.jpg -
Debian net install works on the X500, now just need to find a way to overlay all the pfsense extras on the base Debian install :)
Using the same switch/cable/client the debian network driver seems to provide higher throughput.
Debian
iperf -c 192.168.100.2 -p 5010 -t 60------------------------------------------------------------ [ ID] Interval Transfer Bandwidth [108] 0.0-60.0 sec 602 MBytes 84.1 Mbits/sec
Freebsd7.2/Pfsense 1.2.3rc2
iperf -c 192.168.100.2 -p 5010 -t 60
------------------------------------------------------------ [ ID] Interval Transfer Bandwidth [108] 0.0-60.0 sec 465 MBytes 65.0 Mbits/sec
Wish i knew why this watchdog issue happens to some X500's device more then others.
-
I'm having exactly the same on my X500 too.
re0: watchdog timeout
re0: watchdog timeout
re0: watchdog timeout
re0: watchdog timeout
re0: watchdog timeoutI get them on all ports, internal, external whether connected to a switch/cable modem etc etc, nothing makes a difference. Its a shame, the Firebox running pfSense is really good except for the watchdog timeouts!
-
i only get the odd occassional timeout on my x500 since i've upgraded. is the updated code in the new embedded version? i'm running embedded 1.2.3-rc2 and fancy moving over to the new embedded but don't wreck what appears to be a stable install.
of the timeouts i get, they are generally when i'm playing about in the web interface. there's no timeouts if i leave it alone -
spoke too soon. still getting them but no where near as much. does the new nanobsd embedded have the patch installed?
-
I am planning to install debian on my x500 and use it as a "LAMP" server.
How was the net install conducted? Did you manage to get a keyboard to work as I'm having no luck following diagrams on another topic.
Cheers,
Andy
-
Wondering if there's any progress/updates here? I've got two different Firebox x700s that both display the watchdog timeouts on re0 (my LAN port). I was originally running 1.2.3 RC2 and upgraded to the latest firmware in the 1.2.X snapshots.
loki - care to elaborate on how you prepped your firebox for a netboot install of debian?
-
loki - care to elaborate on how you prepped your firebox for a netboot install of debian?
install a base debian from a net install cd on a normal pc. edited /etc/fstab and set the serial port for console access, pop the drive back into the firebox.
Overall wasnt very happy with the older firebox hardware, the network cards just dont seem to have great support with bsd.
I am now running the following jetway with 2g of mem and 1.2.3rc2, pretty happy with it.
xxxx://www.newegg.com/Product/Product.aspx?Item=N82E16856107059
-
I know it's been a while, but is there any progress on this? ???
-
I've been getting the same errors as everyone in this thread, using two fireboxes, an x500 as transparent firewall and an x700 as router/firewall. Like Spy Alelo, I'm also curious to see if there has been any progress on this and if there is perhaps something new that we can test/patch.
-
Call me crazy, but I removed the crypto card to test a mini PCI WiFi card, and have not had a single timeout while messing with the GUI. I removed the WiFi card anyway, since it wasnt supported, and still no timeouts. I have not upgraded the firmware, still using 1.2.3 release nor changed any settings.
Again, it may have been a fluke, but I will keep testing. The only thing that may make any sense, is that the crypto card was in some way being used for SSL on the WebGUI (for which I do have SSL enabled), and there may be some compatibility issue between it and the Realtek interfaces. I mean, seriously, I download over 60GB of data a month using torrents, not a single issue. Also use a VoIP phone non-stop sustaining a VPN connection while using a web based ticketing system 5 days a week, 8 hours a day and never get a single drop or a timeout. It only happens when I access the WebGUI within the first two minutes. And not a single timeout after removing the card? Can anyone else experiment and confirm this?
-
@Spy Alelo, did you ever find out if it indeed was the crypto card that was causing the timeouts?
@Spy:
Call me crazy, but I removed the crypto card to test a mini PCI WiFi card, and have not had a single timeout while messing with the GUI. I removed the WiFi card anyway, since it wasnt supported, and still no timeouts. I have not upgraded the firmware, still using 1.2.3 release nor changed any settings.
Again, it may have been a fluke, but I will keep testing. The only thing that may make any sense, is that the crypto card was in some way being used for SSL on the WebGUI (for which I do have SSL enabled), and there may be some compatibility issue between it and the Realtek interfaces. I mean, seriously, I download over 60GB of data a month using torrents, not a single issue. Also use a VoIP phone non-stop sustaining a VPN connection while using a web based ticketing system 5 days a week, 8 hours a day and never get a single drop or a timeout. It only happens when I access the WebGUI within the first two minutes. And not a single timeout after removing the card? Can anyone else experiment and confirm this?
-
It did timeout, eventually. I found an easy way to make it timeout, and that is to just download some MP3s from my local webserver using its external DNS name over HTTP. That only happens locally, since from the internet, that issue is not present.
It just doesnt make sense, if you ask me.
-
I'm still having the same issue on my x500, although its not as bad as it used to be, but who knows. Its easy to reproduce, just have any traffic going through it and start hitting the web interface, usually listing the states will do it. I know it was worse when HTTPS was enabled. I've tried checking off disable hardware checksum, I've run the ifconfig re0 -tso, I've played with the ACPI settings in the device.hints but to no avail. I'm on 1.2.3-release now. One option I've seen is to disable ACPI in the BIOS but that involves the weird connector and finding a pci video card (man I threw a way a whole box of those a while back) so I havent done it.
It hasn't ever done it on its own, it only happens for me when I hit the web interface, so maybe its not such a problem, but it is annoying when you are trying to debug something and the whole thing locks out. Has anyone made any progress on this? ???
-
I am still having the same issue with 2.x. The horrible timeouts are with the 2.x versions of pfSense, for which we are turning TSO off and has a major improvement after that, but the timeouts will still be there with some major hits on the webgui just like in 1.2.3-release.
I don't know if this will ever be fixed, since a lot of the BSD developers think of Realtek NICs as crap and refuse to do anything about it, they just recommend to use Intel or something else. Which we obviously can't do.
-
Anyone else tried this?
I have two X500's as firewalls/VPN gateways and was having the timeout problem (one was worse than the other – different HW revisions?)
None of the options on System > Advanced > Networking did anything for me, but setting TCP Offload Engine (not the BCE one) in Systems > Advanced > Tunables to 0 (disabled) has allowed them both to run without issue for over a month now. Even the cat /dev/random over SSH doesn't make it hiccup.The snapshot I'm running is almost a month old now, but if the current builds still do this out of the box, it may be worth a shot.