Attention Firebox X Series Users - Testing Needed
-
Also, if you are able to repro on a fresh flash, can you try a capture?
tcpdump s 0 -w /tmp/re1.pcap -ni re1
(Replace re1 with re0 or re2, etc–whichever interface is giving you timeouts).
-
I noticed in your dmesg output:
–---
ACPI Error (tbxfroot-0308): A valid RSDP was not found [20070320]
ACPI: Table initialisation failed: AE_NOT_FOUND
ACPI: Try disabling either ACPI or apic support.
–---I don't have this in my dmesg output. What's in your /boot/loader.conf?
-
Defaults, plus disable DMA for my CF card (wouldn't work otherwise) and a commented out disable ACPI (tried on 1.2.2 to stop watchdog errors).
cat /boot/loader.conf
autoboot_delay="1"
vm.kmem_size="435544320"
vm.kmem_size_max="535544320"
kern.ipc.nmbclusters="0"
hw.ata.ata_dma=0
#hint.acpi.0.disabled=1 -
From a standard 2.0 flash, /boot/loader.conf contains:
hw.ata.atapi_dma="0"
hw.ata.ata_dma="0"
loader_color="NO"
console=comconsole
autoboot_delay="5"
hw.ata.wc="0"
kern.ipc.nmbclusters="0"
beastie_disable="YES"
vm.kmem_size="435544320"
vm.kmem_size_max="535544320" -
If you are able to get into the BIOS on your firebox, can you make sure you are testing with ACPI enabled? The error on your dmesg indicates that ACPI might be turned off in the BIOS.
-
If you are able to get into the BIOS on your firebox, can you make sure you are testing with ACPI enabled? The error on your dmesg indicates that ACPI might be turned off in the BIOS.
If I can find the PS2 port pinout for the motherboard, a PS2 connector I can repurpose, a PCI video card, and the time, I'll see what I can manage. Unless there is an easier way I'm missing?
-
I just tried turning on TSO on re1 (the interface I always see watchdog's on) and it definitely increased the time before cat'ing /dev/random caused an error. It still happened, but it took 1-2 minutes instead of 15 seconds. During the time I was cat'ing /dev/random, I definitely did notice a performance hit - I ran a speed test before and during, got 6.5mbit before and 4mbit (consistently). Still not fixed, but this seems like progress… I'll still try to get a clean image on at some point in the next few days to see if that helps.
-
I know I've never been in the BIOS of my fireboxes, just seems strange to see those odd messages in your dmesg output.
Don't tear up a connector yet– just do a fresh flash to 2.0 and test that way and let me know.
-
Just FYI, haven't been able to reproduce timeouts on 2.0 still, using cat /dev/random. As you can see, about 40GB of traffic has come out of that interface, Not a single watchdog timeout on 2.0.
LAN interface (re2)
Status up
MAC address 00:90:7f:32:8a:94
IP address 192.168.1.1
Subnet mask 255.255.255.0
Media 100baseTX <full-duplex>
In/out packets 41004200/41004169 (2.63 GB/41.24 GB)
In/out packets (pass) 41004169/64873580 (2.63 GB/41.24 GB)
In/out packets (block) 31/0 (3 KB/0 bytes)
In/out errors 0/0
Collisions 0Have you tried a fresh default flash of 2.0 yet?</full-duplex>
-
The problem still exists after a clean reinstall. Tested against 2009-04-25 17:12 build, with the following configuration:
- Added to /boot/loader.conf:
hw.ata.ata_dma=0 - Configured interfaces (set WAN=re0, set LAN=re1, configured IP/netmask on re1)
- Configured DHCP (reservations, address range, domain name, NTP server)
Cat'ing /dev/random still causes watchdog errors within 15-20 seconds. ACPI errors still show in dmesg. I also reset BIOS settings to defaults (via front LCD panel), with no noticeable changes.
- Added to /boot/loader.conf:
-
Any particular reason you're editing loader.conf and adding hw.ata.ata_dma=0? Like I said, Im using all defaults– the only other diff that I can see between you and me (besides the ACPI error) is that you are using re1 for LAN, and I am using re2 (normally I leave re0 and re1 in case I have multiple WANs-- my re1 is empty currently).
What is your LAN interface plugged into (a switch I presume? what kind)? My re2 (LAN) is plugged directly into my laptop with a crossover cable. I have also changed speed/duplex settings on my laptop during tests last week just to be sure there isn't a problem with a particular speed, and there is not.
Would it be possible to plug your LAN interface directly into a computer (via crossover) to take the switch out of the equation?
I am still puzzled by the ACPI errors in your dmesg output-- I have 3 firebox x5/7/1000 series units, and none of them have that error during bootup.
-
Ok, so I just flashed a 2.0 image, and changed my LAN interface to be re1 instead of re2–- guess what? watchdog timeouts.
-
That is odd, because I changed my LAN interface to re2 and still had watchdog timeouts. I haven't yet had a chance to connect directly without a switch, to see if that makes any difference. I'm currently running a Linksys SD2008, which is a 8-port unmanaged gig switch - By this weekend I'll probably have a Cisco 2950-24 sitting inline between the Firebox and Linksys. I'll update this thread with the results of direct vs Cisco vs Linksys, once I get to test.
-
For those of you watching this thread–stay tuned. I am still working with Pyun. I have another patch to test!
-
If you are able to get into the BIOS on your firebox, can you make sure you are testing with ACPI enabled? The error on your dmesg indicates that ACPI might be turned off in the BIOS.
If I can find the PS2 port pinout for the motherboard, a PS2 connector I can repurpose, a PCI video card, and the time, I'll see what I can manage. Unless there is an easier way I'm missing?
See my keyboard hack here: http://forum.pfsense.org/index.php/topic,7458.msg84324.html#msg84324
-
I have to say, ever since this thread was started I've been having more and more watchdog timeouts.
In previous builds I would get them very seldom (not even once a day), but since whatever code was changed to fix this I am seeing them, on average, 5 times a day, often for about a minute each time.
Nothing has changed configuration wise between builds, I'm running WAN on re0, LAN on re1, DMZ on re2 and Wifi on re4/5. From what I can tell I never have any timeouts on re0/1.
In any case, I'll give it a few more builds, but unless we can get it fixed I'll have to roll back to the earlier build as the system, as it stands today, is getting to be unusable (and yes, I know, don't run v2.0 in production :p).
-
There have only been 2 patches that have even made it into the publicly downloadable builds, and 1 of them was late late yesterday. My guess is that you are seeing more watchdog timeouts due to something changing in your environment as opposed to changes in pfSense– since, with respect to the Realtek interface code, it has only changed twice (and one of those changes was for the better), the other patch that just got put in yesterday will likely be rolled back since it did not seem to improve things (and it actually may have made it worse). Working with the driver maintainer is challenging since there is a 17 hour time difference between he and I, plus I need to have his patches incorporated and wait for a new build to test before I can get back to him.
Like I said, stay tuned. When I've worked out something that appears to have solved it, I will need people like you to beat it up--originally, I thought I had it licked (since it solved the particular problem that I was causing), but there are still others present.
-
Hello Dimitri
got a firebox x500 off of ebay and was hoping i wouldnt run into the watchdog errors with the realtec network cards, but i wasnt that lucky.
gave pfSense-1.2.3-20090708-1942 snapshot a test tonight and i am able to reproduce the watchdog errors with the cat /dev/urandom test or even by installing the NUT package and then going to the web interface "Services -> NUT" about 5-8 secs after the page starts to load i get the following error in the console:
re2: watchdog timeout
once this error pops up in the console screen i am unable to ping to/from that interface until i hardpower off the x500 device. From the console if i hit 5 "Reboot System" or type reboot pfsense starts running the shutdown process but then stops at the "Rebooting…" message.
re2:watchdog timeout re2:watchdog timeout re2:watchdog timeout # reboot pflog0: promiscuous mode disabled TWaiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...4 2 0 0 done All buffers synced. Uptime: 38m24s Rebooting...
Another oddity
Running "halt system" from the console menu works until i hit the "press any key to reboot" part. As soon as i hit the "AnyKey" the speaker on the x500 screams like crazy. Both "reboot" and "halt/reboot" work just fine until the watchdog errors starts to pop up. Any other debugging i can do on my end to help?
-loki
pciconf -lcv
hostb0@pci0:0:0:0: class=0x060000 card=0x11308086 chip=0x11308086 rev=0x04 hdr=0x00 class = bridge subclass = HOST-PCI cap 09[88] = vendor (length 4) Intel cap 14 version 1 cap 02[a0] = AGP 2x 1x SBA disabled pcib1@pci0:0:1:0: class=0x060400 card=0x00000000 chip=0x11318086 rev=0x04 hdr=0x01 class = bridge subclass = PCI-PCI pcib2@pci0:0:30:0: class=0x060400 card=0x00000000 chip=0x244e8086 rev=0x05 hdr=0x01 class = bridge subclass = PCI-PCI isab0@pci0:0:31:0: class=0x060100 card=0x00000000 chip=0x24408086 rev=0x05 hdr=0x00 class = bridge subclass = PCI-ISA atapci0@pci0:0:31:1: class=0x010180 card=0x24408086 chip=0x244b8086 rev=0x05 hdr=0x00 class = mass storage subclass = ATA ral0@pci0:2:6:0: class=0x028000 card=0x3c421186 chip=0x02011814 rev=0x01 hdr=0x00 class = network cap 01[40] = powerspec 2 supports D0 D3 current D0 re0@pci0:2:9:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re1@pci0:2:10:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re2@pci0:2:11:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re3@pci0:2:12:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re4@pci0:2:13:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re5@pci0:2:14:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0
dmesg
Copyright (c) 1992-2009 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.2-RELEASE-p2 #0: Wed Jul 8 19:39:37 EDT 2009 sullrich@FreeBSD-7_2-RELENG_1_2-snapshots.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.7 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Celeron(TM) CPU 1200MHz (1202.73-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x6b4 Stepping = 4 Features=0x383f9ff <fpu,vme,de,pse,tsc,msr,pae,mce,cx8,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse>real memory = 536870912 (512 MB) avail memory = 508604416 (485 MB) wlan: mac acl policy registered kbd1 at kbdmux0 cryptosoft0: <software crypto="">on motherboard pcib0: <intel 82815="" (i815="" gmch)="" host="" to="" hub="" bridge="">pcibus 0 on motherboard pir0: <pci 11="" interrupt="" routing="" table:="" entries="">on motherboard $PIR: Using invalid BIOS IRQ 9 from 2.13.INTA for link 0x63 pci0: <pci bus="">on pcib0 agp0: <intel 82815="" (i815="" gmch)="" host="" to="" pci="" bridge="">on hostb0 pcib1: <pci-pci bridge="">at device 1.0 on pci0 pci1: <pci bus="">on pcib1 pcib2: <pcibios pci-pci="" bridge="">at device 30.0 on pci0 pci2: <pci bus="">on pcib2 ral0: <ralink technology="" rt2560="">mem 0xefefe000-0xefefffff irq 3 at device 6.0 on pci2 ral0: MAC/BBP RT2560 (rev 0x04), RF RT2525 ral0: Ethernet address: 00:0f:a3:74:4a:7a ral0: [ITHREAD] re0: <realtek 10="" 8139c+="" 100basetx="">port 0xd500-0xd5ff mem 0xefefa000-0xefefa1ff irq 10 at device 9.0 on pci2 re0: Chip rev. 0x74800000 re0: MAC rev. 0x00000000 miibus0: <mii bus="">on re0 rlphy0: <realtek internal="" media="" interface="">PHY 0 on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re0: Ethernet address: 00:90:7f:2f:1a:63 re0: [FILTER] re1: <realtek 10="" 8139c+="" 100basetx="">port 0xd600-0xd6ff mem 0xefefb000-0xefefb1ff irq 5 at device 10.0 on pci2 re1: Chip rev. 0x74800000 re1: MAC rev. 0x00000000 miibus1: <mii bus="">on re1 rlphy1: <realtek internal="" media="" interface="">PHY 0 on miibus1 rlphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re1: Ethernet address: 00:90:7f:2f:1a:64 re1: [FILTER] re2: <realtek 10="" 8139c+="" 100basetx="">port 0xd900-0xd9ff mem 0xefefc000-0xefefc1ff irq 11 at device 11.0 on pci2 re2: Chip rev. 0x74800000 re2: MAC rev. 0x00000000 miibus2: <mii bus="">on re2 rlphy2: <realtek internal="" media="" interface="">PHY 0 on miibus2 rlphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re2: Ethernet address: 00:90:7f:2f:1a:65 re2: [FILTER] re3: <realtek 10="" 8139c+="" 100basetx="">port 0xda00-0xdaff mem 0xefefd000-0xefefd1ff irq 12 at device 12.0 on pci2 re3: Chip rev. 0x74800000 re3: MAC rev. 0x00000000 miibus3: <mii bus="">on re3 rlphy3: <realtek internal="" media="" interface="">PHY 0 on miibus3 rlphy3: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re3: Ethernet address: 00:90:7f:2f:1a:66 re3: [FILTER] re4: <realtek 10="" 8139c+="" 100basetx="">port 0xdd00-0xddff irq 9 at device 13.0 on pci2 re4: Chip rev. 0x74800000 re4: MAC rev. 0x00000000 miibus4: <mii bus="">on re4 rlphy4: <realtek internal="" media="" interface="">PHY 0 on miibus4 rlphy4: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re4: Ethernet address: 00:90:7f:2f:1a:67 re4: [FILTER] re5: <realtek 10="" 8139c+="" 100basetx="">port 0xde00-0xdeff irq 6 at device 14.0 on pci2 re5: Chip rev. 0x74800000 re5: MAC rev. 0x00000000 miibus5: <mii bus="">on re5 rlphy5: <realtek internal="" media="" interface="">PHY 0 on miibus5 rlphy5: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re5: Ethernet address: 00:90:7f:2f:1a:68 re5: [FILTER] isab0: <pci-isa bridge="">at device 31.0 on pci0 isa0: <isa bus="">on isab0 atapci0: <intel ich2="" udma100="" controller="">port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 31.1 on pci0 ata0: <ata 0="" channel="">on atapci0 ata0: [ITHREAD] ata1: <ata 1="" channel="">on atapci0 ata1: [ITHREAD] cpu0 on motherboard pmtimer0 on isa0 orm0: <isa option="" rom="">at iomem 0xe0000-0xe0fff pnpid ORM0000 on isa0 atkbdc0: <keyboard controller="" (i8042)="">at port 0x60,0x64 on isa0 atkbd0: <at keyboard="">irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] ppc0: <parallel port="">at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/16 bytes threshold ppbus0: <parallel port="" bus="">on ppc0 ppbus0: [ITHREAD] plip0: <plip network="" interface="">on ppbus0 plip0: WARNING: using obsoleted IFF_NEEDSGIANT flag lpt0: <printer>on ppbus0 lpt0: Interrupt-driven port ppi0: <parallel i="" o="">on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A, console sio0: [FILTER] sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled unknown: <pnp0c01>can't assign resources (memory) unknown: <pnp0303>can't assign resources (port) speaker0: <pc speaker="">at port 0x61 pnpid PNP0800 on isa0 unknown: <pnp0501>can't assign resources (port) unknown: <pnp0401>can't assign resources (port) RTC BIOS diagnostic error 20 <config_unit>Timecounter "TSC" frequency 1202733008 Hz quality 800 Timecounters tick every 1.000 msec IPsec: Initialized Security Association Processing. ad2: DMA limited to UDMA33, controller found non-ATA66 cable ad2: 57231MB <ic25n060atmr04 0="" mo3oad5a="">at ata1-master UDMA33 GEOM: ad2: partition 1 does not start on a track boundary. GEOM: ad2: partition 1 does not end on a track boundary. Trying to mount root from ufs:/dev/ad2s1a re2: link state changed to UP re2: link state changed to DOWN re0: link state changed to UP re0: link state changed to DOWN re2: link state changed to UP re0: link state changed to UP re1: link state changed to DOWN re3: link state changed to DOWN re4: link state changed to DOWN re5: link state changed to DOWN pflog0: promiscuous mode enabled</ic25n060atmr04></config_unit></pnp0401></pnp0501></pc></pnp0303></pnp0c01></parallel></printer></plip></parallel></parallel></at></keyboard></isa></ata></ata></intel></isa></pci-isa></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></ralink></pci></pcibios></pci></pci-pci></intel></pci></pci></intel></software></fpu,vme,de,pse,tsc,msr,pae,mce,cx8,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse>
vmstat -i
interrupt total rate irq0: clk 2445041 1000 irq4: sio0 736 0 irq7: ppbus0 ppc0 1 0 irq8: rtc 312912 127 irq10: re0 849 0 irq11: re2 3204 1 irq15: ata1 10586 4 Total 2773329 1134
-
Not yet– Pyun has a couple of paid projects so the progress on this issue is at a bit of a standstill.
I will preface that what I am about to say may be completely rediculous, but my understanding is that the WatchGuard OS is based on Linux. If that's true, then perhaps someone can look at the Linux driver and compare it to the BSD one? Obviously they are structured differently and this may not make sense, but when the WatchGuard box is running the WatchGuard software, the firebox is a very stable unit, so someone knows how to make these realtek chips work!
-
Not yet – Pyun has a couple of paid projects so the progress on this issue is at a bit of a standstill.
ok thanks for the update.
then perhaps someone can look at the Linux driver and compare it to the BSD one?
I will try a debian install on the X500 some time this weekend.
Think the linux driver might have the same issue with the realtek drivers, its hard to find of the issue was ever fix or people just started using other network cards.
google around for:
"8139c problem oversized ethernet frame"
"realtec 8139c Abnormal interrupt"http://www.joshua.raleigh.nc.us/docs/linux-2.4.10_html/286454.html
http://article.gmane.org/gmane.linux.drivers.realtek.devel/420The X500 does have a pci slot, gonna try using a old sun pci quad port 10/100 network card which works in another pc running 1.2.3.rc2 version of pfsense. At least this should prove the X500 motherboard doesnt have issues controlling acpi/dma/interrupts of network cards.
Here is a pic of the sun card
http://www.sun.com/products/networking/ethernet/sunquadfastethernet/images/I1_hw_quadfastether_pci_i.jpg