Attention Firebox X Series Users - Testing Needed
-
Attention Firebox X500/700/1000 Users using pfSense:
Watchdog timeouts getting’ you down? Thinkin’ about throwin’ that old Firebox in to the fireplace? Don’t do that just yet! :)
Thanks to the pfSense devs, along with Pyun YongHyeon, the maintainer for the FreeBSD Realtek network driver, it appears that we may have solved the issue with the watchdog timeouts on the Realtek 8139C+ chips that are used in these units. For the past couple of days, I have worked with Pyun, and yesterday Pyun sent me a patch, and that patch was committed to the 1.2.3 snapshot builds, as well as to the 2.0 alpha snapshot builds by the pfSense devs, and is part of any snapshot build as of yesterday (4/17) at 2pm Eastern time, or later.
Snapshot builds can be downloaded from
http://snapshots.pfsense.org/FreeBSD7/RELENG_1_2/
or
http://snapshots.pfsense.org/FreeBSD7/HEAD/I have been testing a build with this patch since yesterday, and have yet to see a single watchdog timeout on my interfaces—and no modifications to loader.conf have been made. This is a default install—no special options have been set anywhere.
If at all possible, please try to install a recent snapshot build on your firebox units (those of you that have them) and test this patch. If you do still receive watchdog timeouts, please let me know either on this list, or off-list. Either way, please try to detail what you were doing when the watchdog timeout occurred so that we can try to reproduce it, and Pyun can fix it.
Thanks to all that have helped, and thanks to those that are willing to test!
Dimitri Rodis
Integrita Systems LLC
http://www.integritasystems.com -
Running 1.2.3 built at Tue Apr 21 23:12:33 EDT 2009, on a X500, and I am still receiving watchdog timeouts. I experienced 6 watchdog timeouts after approximately 5 hours of uptime, at which time the box would have been sitting basically idle (everyone in the house was asleep). I did previously disable ACPI in the loader.conf - I will attempt running with ACPI reenabled now, to see if that makes a difference.
Based on RRD's, there was a small spike in CPU to about 10% during the event, prior to which cpu was 1-2%. Also, there was a jump from 50 to 100 states that lasted for about 20 minutes prior, as well as a small increase in throughput for about 5 minutes prior. The increase in throughput can be explained by a ssh bruteforce attempt directed at a server on the LAN (WAN TCP 22 port forwarded -> LAN) at a rate of 100 attempts over 5 minutes (04:59:52 - 05:04:29). Below is a snippet of the system.log/filter.log from before/during/after the event…
@system.log:
Apr 22 04:38:29 gateway dhclient[304]: DHCPREQUEST on re0 to x.x.x.x port 67
Apr 22 04:38:29 gateway dhclient[304]: SENDING DIRECT
Apr 22 04:38:29 gateway dhclient[304]: DHCPACK from x.x.x.x
Apr 22 04:38:29 gateway dhclient[304]: bound to x.x.x.x – renewal in 1800 seconds.
Apr 22 05:04:46 gateway kernel: re1: watchdog timeout
Apr 22 05:05:09 gateway kernel: re1: watchdog timeout
Apr 22 05:05:45 gateway kernel: re1: watchdog timeout
Apr 22 05:05:47 gateway miniupnpd[1081]: SUBSCRIBE not implemented. ENABLE_EVENTS compile option disabled
Apr 22 05:05:47 gateway last message repeated 2 times
Apr 22 05:06:36 gateway kernel: re1: watchdog timeout
Apr 22 05:07:01 gateway kernel: re1: watchdog timeout
Apr 22 05:07:15 gateway miniupnpd[1081]: SUBSCRIBE not implemented. ENABLE_EVENTS compile option disabled
Apr 22 05:07:15 gateway last message repeated 2 times
Apr 22 05:08:09 gateway kernel: re1: watchdog timeout
Apr 22 05:08:29 gateway dhclient[304]: DHCPREQUEST on re0 to x.x.x.x port 67
Apr 22 05:08:29 gateway dhclient[304]: SENDING DIRECT
Apr 22 05:08:29 gateway dhclient[304]: DHCPACK from x.x.x.x
Apr 22 05:08:29 gateway dhclient[304]: bound to x.x.x.x – renewal in 1800 seconds.@filter.log:
Apr 22 05:03:00 gateway pf: 579. 531199 rule 220/0(match): block in on re0: (tos 0x0, ttl 105, id 4509, offset 0, flags [none], proto ICMP (1), length 61) re.mo.te.ip > w.a.n.ip: ICMP echo request, id 512, seq 41905, length 41
Apr 22 05:03:02 gateway pf: 2. 079394 rule 220/0(match): block in on re0: (tos 0x0, ttl 105, id 35051, offset 0, flags [none], proto ICMP (1), length 61) re.mo.te.ip > w.a.n.ip: ICMP echo request, id 512, seq 36274, length 41
Apr 22 05:03:15 gateway pf: 12. 819850 rule 220/0(match): block in on re0: (tos 0x0, ttl 98, id 1460, offset 0, flags [none], proto TCP (6), length 40) re.mo.te.ip.6000 > w.a.n.ip.139: S, cksum 0x57a6 (correct), 800129024:800129024(0) win 16384
Apr 22 05:05:03 gateway pf: 108. 115100 rule 220/0(match): block in on re0: (tos 0x0, ttl 44, id 52961, offset 0, flags [DF], proto UDP (17), length 597) re.mo.te.ip.50803 > w.a.n.ip.1026: UDP, length 569
Apr 22 05:08:29 gateway pf: 205. 927414 rule 220/0(match): block in on re0: (tos 0x0, ttl 250, id 8836, offset 0, flags [DF], proto UDP (17), length 328) re.mo.te.ip.67 > w.a.n.ip.68: BOOTP/DHCP, Reply, length 300, xid 0xa192aa09, Flags [none]
Apr 22 05:08:29 gateway pf: Client-IP w.a.n.ip
Apr 22 05:08:29 gateway pf: Your-IP w.a.n.ip
Apr 22 05:08:29 gateway pf: Client-Ethernet-Address 00ad:be:ef:00 [|bootp]
Apr 22 05:10:39 gateway pf: 129. 668942 rule 220/0(match): block in on re0: (tos 0x0, ttl 108, id 21133, offset 0, flags [none], proto UDP (17), length 78) re.mo.te.ip.41344 > w.a.n.ip.137: NBT UDP PACKET(137): QUERY; REQUEST; BROADCASTLet me know if there is anything else that I can provide to help track this down.
-
Try a 2.0 build if possible. My experience is that there are still watchdog timeouts in 1.2.3, as you said. I think there might be something not quite right with the build process for 1.2.3, as 1.2.3 and 2.0 are supposed to be using the same base OS and patch set currently, and it does not appear that that is currently the case.
I'm working through this with the devs currently.
-
It happened again, with ACPI enabled - I had the dashboard up on 1.2.3 and walked away for a bit, came back to 6 or so watchdog errors. So far, 1.2.2 seemed more stable than 1.2.3 in regards to watchdog errors. I'm upgrading to the latest 2.0 build now, and crossing my fingers that everything I need is stable enough (and that the upgrade works, don't feel like pulling my cf card for a fresh install).
-
Just happened again on 2.0 (Wed Apr 22 06:33:58 EDT 2009 build) when I clicked the "System Tunables" tab under System, Advanced. Unable to reproduce following the same steps.
-
Found a way to recreate the watchdog errors… ssh to your Firebox running PFS and cat /dev/random.
-
Can you post your output from:
pciconf -lcv
dmesg
vmstat -iThanks for testing!
-
After you do that, can you try a fresh flash? I've only tried fresh flashes, and have still not been able to repro watchdogs on 2.0 (even with 3 cat /dev/random running).
-
@pciconf:
hostb0@pci0:0:0:0: class=0x060000 card=0x11308086 chip=0x11308086 rev=0x04 hdr=0x00
class = bridge
subclass = HOST-PCI
cap 09[88] = vendor (length 4) Intel cap 14 version 1
cap 02[a0] = AGP 2x 1x SBA disabled
pcib1@pci0:0:1:0: class=0x060400 card=0x00000000 chip=0x11318086 rev=0x04 hdr=0x01
class = bridge
subclass = PCI-PCI
pcib2@pci0:0:30:0: class=0x060400 card=0x00000000 chip=0x244e8086 rev=0x05 hdr=0x01
class = bridge
subclass = PCI-PCI
isab0@pci0:0:31:0: class=0x060100 card=0x00000000 chip=0x24408086 rev=0x05 hdr=0x00
class = bridge
subclass = PCI-ISA
atapci0@pci0:0:31:1: class=0x010180 card=0x24408086 chip=0x244b8086 rev=0x05 hdr=0x00
class = mass storage
subclass = ATA
safe0@pci0:2:6:0: class=0xff0000 card=0x00010001 chip=0x114116ae rev=0x01 hdr=0x00
re0@pci0:2:9:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
class = network
subclass = ethernet
cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0
re1@pci0:2:10:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
class = network
subclass = ethernet
cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0
re2@pci0:2:11:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
class = network
subclass = ethernet
cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0
re3@pci0:2:12:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
class = network
subclass = ethernet
cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0
re4@pci0:2:13:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
class = network
subclass = ethernet
cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0
re5@pci0:2:14:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
class = network
subclass = ethernet
cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0@dmesg:
Copyright 1992-2009 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.1-RELEASE-p5 #0: Wed Apr 22 13:05:32 EDT 2009
sullrich@RELENG_2_0-snapshots.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_wrap.7
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Celeron(TM) CPU 1200MHz (1202.74-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0x6b4 Stepping = 4
Features=0x383f9ff<fpu,vme,de,pse,tsc,msr,pae,mce,cx8,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse>real memory = 268435456 (256 MB)
avail memory = 248643584 (237 MB)
wlan: mac acl policy registered
ACPI Error (tbxfroot-0308): A valid RSDP was not found [20070320]
ACPI: Table initialisation failed: AE_NOT_FOUND
ACPI: Try disabling either ACPI or apic support.
cryptosoft0: <software crypto="">on motherboard
pcib0: <intel 82815="" (i815="" gmch)="" host="" to="" hub="" bridge="">pcibus 0 on motherboard
pir0: <pci 11="" interrupt="" routing="" table:="" entries="">on motherboard
$PIR: Using invalid BIOS IRQ 9 from 2.13.INTA for link 0x63
pci0: <pci bus="">on pcib0
pcib1: <pci-pci bridge="">at device 1.0 on pci0
pci1: <pci bus="">on pcib1
pcib2: <pcibios pci-pci="" bridge="">at device 30.0 on pci0
pci2: <pci bus="">on pcib2
safe0 mem 0xe7bfe000-0xe7bfffff irq 3 at device 6.0 on pci2
safe0: [ITHREAD]
safe0: SafeNet SafeXcel-1141 rng des/3des aes md5 sha1 null
re0: <realtek 10="" 8139c+="" 100basetx="">port 0xd500-0xd5ff mem 0xefefa000-0xefefa1ff irq 10 at device 9.0 on pci2
re0: Chip rev. 0x74800000
re0: MAC rev. 0x00000000
miibus0: <mii bus="">on re0
rlphy0: <realtek internal="" media="" interface="">PHY 0 on miibus0
rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
re0: Ethernet address: 00:90:7f:30:d6:1d
re0: [FILTER]
re1: <realtek 10="" 8139c+="" 100basetx="">port 0xd600-0xd6ff mem 0xefefb000-0xefefb1ff irq 5 at device 10.0 on pci2
re1: Chip rev. 0x74800000
re1: MAC rev. 0x00000000
miibus1: <mii bus="">on re1
rlphy1: <realtek internal="" media="" interface="">PHY 0 on miibus1
rlphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
re1: Ethernet address: 00:90:7f:30:d6:1e
re1: [FILTER]
re2: <realtek 10="" 8139c+="" 100basetx="">port 0xd900-0xd9ff mem 0xefefc000-0xefefc1ff irq 11 at device 11.0 on pci2
re2: Chip rev. 0x74800000
re2: MAC rev. 0x00000000
miibus2: <mii bus="">on re2
rlphy2: <realtek internal="" media="" interface="">PHY 0 on miibus2
rlphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
re2: Ethernet address: 00:90:7f:30:d6:1f
re2: [FILTER]
re3: <realtek 10="" 8139c+="" 100basetx="">port 0xda00-0xdaff mem 0xefefd000-0xefefd1ff irq 12 at device 12.0 on pci2
re3: Chip rev. 0x74800000
re3: MAC rev. 0x00000000
miibus3: <mii bus="">on re3
rlphy3: <realtek internal="" media="" interface="">PHY 0 on miibus3
rlphy3: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
re3: Ethernet address: 00:90:7f:30:d6:20
re3: [FILTER]
re4: <realtek 10="" 8139c+="" 100basetx="">port 0xdd00-0xddff mem 0xefefe000-0xefefe1ff irq 9 at device 13.0 on pci2
re4: Chip rev. 0x74800000
re4: MAC rev. 0x00000000
miibus4: <mii bus="">on re4
rlphy4: <realtek internal="" media="" interface="">PHY 0 on miibus4
rlphy4: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
re4: Ethernet address: 00:90:7f:30:d6:21
re4: [FILTER]
re5: <realtek 10="" 8139c+="" 100basetx="">port 0xde00-0xdeff mem 0xefeff000-0xefeff1ff irq 6 at device 14.0 on pci2
re5: Chip rev. 0x74800000
re5: MAC rev. 0x00000000
miibus5: <mii bus="">on re5
rlphy5: <realtek internal="" media="" interface="">PHY 0 on miibus5
rlphy5: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
re5: Ethernet address: 00:90:7f:30:d6:22
re5: [FILTER]
isab0: <pci-isa bridge="">at device 31.0 on pci0
isa0: <isa bus="">on isab0
atapci0: <intel ich2="" udma100="" controller="">port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 31.1 on pci0
ata0: <ata 0="" channel="">on atapci0
ata0: [ITHREAD]
ata1: <ata 1="" channel="">on atapci0
ata1: [ITHREAD]
cpu0 on motherboard
orm0: <isa option="" rom="">at iomem 0xe0000-0xe0fff pnpid ORM0000 on isa0
ppc0: <parallel port="">at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/16 bytes threshold
ppbus0: <parallel port="" bus="">on ppc0
ppbus0: [ITHREAD]
ppi0: <parallel i="" o="">on ppbus0
ppc0: [GIANT-LOCKED]
ppc0: [ITHREAD]
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A, console
sio0: [FILTER]
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
unknown: <pnp0c01>can't assign resources (memory)
speaker0: <pc speaker="">at port 0x61 pnpid PNP0800 on isa0
unknown: <pnp0501>can't assign resources (port)
unknown: <pnp0401>can't assign resources (port)
RTC BIOS diagnostic error 20<config_unit>Timecounter "TSC" frequency 1202735037 Hz quality 800
Timecounters tick every 10.000 msec
IPsec: Initialized Security Association Processing.
ad0: FAILURE - SET_MULTI status=51 <ready,dsc,error>error=4<aborted>ad0: 3887MB <cf4ghs 20080116="">at ata0-master PIO4
Trying to mount root from ufs:/dev/ad0s1a
re0: link state changed to UP
re0: link state changed to DOWN
re0: link state changed to UP
re1: link state changed to DOWN
re2: link state changed to UP
re2: link state changed to DOWN
bridge0: Ethernet address: 1e:64:49:eb:aa:05
re2: promiscuous mode enabled
re1: promiscuous mode enabled
re3: link state changed to DOWN
re4: link state changed to DOWN
re5: link state changed to DOWN
pflog0: promiscuous mode enabled
re1: link state changed to UP
re2: link state changed to UP
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout</cf4ghs></aborted></ready,dsc,error></config_unit></pnp0401></pnp0501></pc></pnp0c01></parallel></parallel></parallel></isa></ata></ata></intel></isa></pci-isa></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></pci></pcibios></pci></pci-pci></pci></pci></intel></software></fpu,vme,de,pse,tsc,msr,pae,mce,cx8,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse>@vmstat:
interrupt total rate
irq0: clk 1206531 99
irq3: safe0 1 0
irq4: sio0 293 0
irq5: re1 142810 11
irq7: ppbus0 ppc0 1 0
irq8: rtc 1544290 127
irq10: re0 150821 12
irq11: re2 15686 1
irq14: ata0 1742049 144
Total 4802482 398 -
After you do that, can you try a fresh flash? I've only tried fresh flashes, and have still not been able to repro watchdogs on 2.0 (even with 3 cat /dev/random running).
I'll see if I can get a chance for a fresh flash this weekend. A single cat /dev/random will make my box fire off a few watchdog errors, become unresponsive for 2-3 minutes, and eventually end with a dead ssh session, every time I've tried.
-
Also, if you are able to repro on a fresh flash, can you try a capture?
tcpdump s 0 -w /tmp/re1.pcap -ni re1
(Replace re1 with re0 or re2, etc–whichever interface is giving you timeouts).
-
I noticed in your dmesg output:
–---
ACPI Error (tbxfroot-0308): A valid RSDP was not found [20070320]
ACPI: Table initialisation failed: AE_NOT_FOUND
ACPI: Try disabling either ACPI or apic support.
–---I don't have this in my dmesg output. What's in your /boot/loader.conf?
-
Defaults, plus disable DMA for my CF card (wouldn't work otherwise) and a commented out disable ACPI (tried on 1.2.2 to stop watchdog errors).
cat /boot/loader.conf
autoboot_delay="1"
vm.kmem_size="435544320"
vm.kmem_size_max="535544320"
kern.ipc.nmbclusters="0"
hw.ata.ata_dma=0
#hint.acpi.0.disabled=1 -
From a standard 2.0 flash, /boot/loader.conf contains:
hw.ata.atapi_dma="0"
hw.ata.ata_dma="0"
loader_color="NO"
console=comconsole
autoboot_delay="5"
hw.ata.wc="0"
kern.ipc.nmbclusters="0"
beastie_disable="YES"
vm.kmem_size="435544320"
vm.kmem_size_max="535544320" -
If you are able to get into the BIOS on your firebox, can you make sure you are testing with ACPI enabled? The error on your dmesg indicates that ACPI might be turned off in the BIOS.
-
If you are able to get into the BIOS on your firebox, can you make sure you are testing with ACPI enabled? The error on your dmesg indicates that ACPI might be turned off in the BIOS.
If I can find the PS2 port pinout for the motherboard, a PS2 connector I can repurpose, a PCI video card, and the time, I'll see what I can manage. Unless there is an easier way I'm missing?
-
I just tried turning on TSO on re1 (the interface I always see watchdog's on) and it definitely increased the time before cat'ing /dev/random caused an error. It still happened, but it took 1-2 minutes instead of 15 seconds. During the time I was cat'ing /dev/random, I definitely did notice a performance hit - I ran a speed test before and during, got 6.5mbit before and 4mbit (consistently). Still not fixed, but this seems like progress… I'll still try to get a clean image on at some point in the next few days to see if that helps.
-
I know I've never been in the BIOS of my fireboxes, just seems strange to see those odd messages in your dmesg output.
Don't tear up a connector yet– just do a fresh flash to 2.0 and test that way and let me know.
-
Just FYI, haven't been able to reproduce timeouts on 2.0 still, using cat /dev/random. As you can see, about 40GB of traffic has come out of that interface, Not a single watchdog timeout on 2.0.
LAN interface (re2)
Status up
MAC address 00:90:7f:32:8a:94
IP address 192.168.1.1
Subnet mask 255.255.255.0
Media 100baseTX <full-duplex>
In/out packets 41004200/41004169 (2.63 GB/41.24 GB)
In/out packets (pass) 41004169/64873580 (2.63 GB/41.24 GB)
In/out packets (block) 31/0 (3 KB/0 bytes)
In/out errors 0/0
Collisions 0Have you tried a fresh default flash of 2.0 yet?</full-duplex>
-
The problem still exists after a clean reinstall. Tested against 2009-04-25 17:12 build, with the following configuration:
- Added to /boot/loader.conf:
hw.ata.ata_dma=0 - Configured interfaces (set WAN=re0, set LAN=re1, configured IP/netmask on re1)
- Configured DHCP (reservations, address range, domain name, NTP server)
Cat'ing /dev/random still causes watchdog errors within 15-20 seconds. ACPI errors still show in dmesg. I also reset BIOS settings to defaults (via front LCD panel), with no noticeable changes.
- Added to /boot/loader.conf:
-
Any particular reason you're editing loader.conf and adding hw.ata.ata_dma=0? Like I said, Im using all defaults– the only other diff that I can see between you and me (besides the ACPI error) is that you are using re1 for LAN, and I am using re2 (normally I leave re0 and re1 in case I have multiple WANs-- my re1 is empty currently).
What is your LAN interface plugged into (a switch I presume? what kind)? My re2 (LAN) is plugged directly into my laptop with a crossover cable. I have also changed speed/duplex settings on my laptop during tests last week just to be sure there isn't a problem with a particular speed, and there is not.
Would it be possible to plug your LAN interface directly into a computer (via crossover) to take the switch out of the equation?
I am still puzzled by the ACPI errors in your dmesg output-- I have 3 firebox x5/7/1000 series units, and none of them have that error during bootup.
-
Ok, so I just flashed a 2.0 image, and changed my LAN interface to be re1 instead of re2–- guess what? watchdog timeouts.
-
That is odd, because I changed my LAN interface to re2 and still had watchdog timeouts. I haven't yet had a chance to connect directly without a switch, to see if that makes any difference. I'm currently running a Linksys SD2008, which is a 8-port unmanaged gig switch - By this weekend I'll probably have a Cisco 2950-24 sitting inline between the Firebox and Linksys. I'll update this thread with the results of direct vs Cisco vs Linksys, once I get to test.
-
For those of you watching this thread–stay tuned. I am still working with Pyun. I have another patch to test!
-
If you are able to get into the BIOS on your firebox, can you make sure you are testing with ACPI enabled? The error on your dmesg indicates that ACPI might be turned off in the BIOS.
If I can find the PS2 port pinout for the motherboard, a PS2 connector I can repurpose, a PCI video card, and the time, I'll see what I can manage. Unless there is an easier way I'm missing?
See my keyboard hack here: http://forum.pfsense.org/index.php/topic,7458.msg84324.html#msg84324
-
I have to say, ever since this thread was started I've been having more and more watchdog timeouts.
In previous builds I would get them very seldom (not even once a day), but since whatever code was changed to fix this I am seeing them, on average, 5 times a day, often for about a minute each time.
Nothing has changed configuration wise between builds, I'm running WAN on re0, LAN on re1, DMZ on re2 and Wifi on re4/5. From what I can tell I never have any timeouts on re0/1.
In any case, I'll give it a few more builds, but unless we can get it fixed I'll have to roll back to the earlier build as the system, as it stands today, is getting to be unusable (and yes, I know, don't run v2.0 in production :p).
-
There have only been 2 patches that have even made it into the publicly downloadable builds, and 1 of them was late late yesterday. My guess is that you are seeing more watchdog timeouts due to something changing in your environment as opposed to changes in pfSense– since, with respect to the Realtek interface code, it has only changed twice (and one of those changes was for the better), the other patch that just got put in yesterday will likely be rolled back since it did not seem to improve things (and it actually may have made it worse). Working with the driver maintainer is challenging since there is a 17 hour time difference between he and I, plus I need to have his patches incorporated and wait for a new build to test before I can get back to him.
Like I said, stay tuned. When I've worked out something that appears to have solved it, I will need people like you to beat it up--originally, I thought I had it licked (since it solved the particular problem that I was causing), but there are still others present.
-
Hello Dimitri
got a firebox x500 off of ebay and was hoping i wouldnt run into the watchdog errors with the realtec network cards, but i wasnt that lucky.
gave pfSense-1.2.3-20090708-1942 snapshot a test tonight and i am able to reproduce the watchdog errors with the cat /dev/urandom test or even by installing the NUT package and then going to the web interface "Services -> NUT" about 5-8 secs after the page starts to load i get the following error in the console:
re2: watchdog timeout
once this error pops up in the console screen i am unable to ping to/from that interface until i hardpower off the x500 device. From the console if i hit 5 "Reboot System" or type reboot pfsense starts running the shutdown process but then stops at the "Rebooting…" message.
re2:watchdog timeout re2:watchdog timeout re2:watchdog timeout # reboot pflog0: promiscuous mode disabled TWaiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...4 2 0 0 done All buffers synced. Uptime: 38m24s Rebooting...
Another oddity
Running "halt system" from the console menu works until i hit the "press any key to reboot" part. As soon as i hit the "AnyKey" the speaker on the x500 screams like crazy. Both "reboot" and "halt/reboot" work just fine until the watchdog errors starts to pop up. Any other debugging i can do on my end to help?
-loki
pciconf -lcv
hostb0@pci0:0:0:0: class=0x060000 card=0x11308086 chip=0x11308086 rev=0x04 hdr=0x00 class = bridge subclass = HOST-PCI cap 09[88] = vendor (length 4) Intel cap 14 version 1 cap 02[a0] = AGP 2x 1x SBA disabled pcib1@pci0:0:1:0: class=0x060400 card=0x00000000 chip=0x11318086 rev=0x04 hdr=0x01 class = bridge subclass = PCI-PCI pcib2@pci0:0:30:0: class=0x060400 card=0x00000000 chip=0x244e8086 rev=0x05 hdr=0x01 class = bridge subclass = PCI-PCI isab0@pci0:0:31:0: class=0x060100 card=0x00000000 chip=0x24408086 rev=0x05 hdr=0x00 class = bridge subclass = PCI-ISA atapci0@pci0:0:31:1: class=0x010180 card=0x24408086 chip=0x244b8086 rev=0x05 hdr=0x00 class = mass storage subclass = ATA ral0@pci0:2:6:0: class=0x028000 card=0x3c421186 chip=0x02011814 rev=0x01 hdr=0x00 class = network cap 01[40] = powerspec 2 supports D0 D3 current D0 re0@pci0:2:9:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re1@pci0:2:10:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re2@pci0:2:11:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re3@pci0:2:12:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re4@pci0:2:13:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 re5@pci0:2:14:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00 class = network subclass = ethernet cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0
dmesg
Copyright (c) 1992-2009 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.2-RELEASE-p2 #0: Wed Jul 8 19:39:37 EDT 2009 sullrich@FreeBSD-7_2-RELENG_1_2-snapshots.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.7 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Celeron(TM) CPU 1200MHz (1202.73-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x6b4 Stepping = 4 Features=0x383f9ff <fpu,vme,de,pse,tsc,msr,pae,mce,cx8,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse>real memory = 536870912 (512 MB) avail memory = 508604416 (485 MB) wlan: mac acl policy registered kbd1 at kbdmux0 cryptosoft0: <software crypto="">on motherboard pcib0: <intel 82815="" (i815="" gmch)="" host="" to="" hub="" bridge="">pcibus 0 on motherboard pir0: <pci 11="" interrupt="" routing="" table:="" entries="">on motherboard $PIR: Using invalid BIOS IRQ 9 from 2.13.INTA for link 0x63 pci0: <pci bus="">on pcib0 agp0: <intel 82815="" (i815="" gmch)="" host="" to="" pci="" bridge="">on hostb0 pcib1: <pci-pci bridge="">at device 1.0 on pci0 pci1: <pci bus="">on pcib1 pcib2: <pcibios pci-pci="" bridge="">at device 30.0 on pci0 pci2: <pci bus="">on pcib2 ral0: <ralink technology="" rt2560="">mem 0xefefe000-0xefefffff irq 3 at device 6.0 on pci2 ral0: MAC/BBP RT2560 (rev 0x04), RF RT2525 ral0: Ethernet address: 00:0f:a3:74:4a:7a ral0: [ITHREAD] re0: <realtek 10="" 8139c+="" 100basetx="">port 0xd500-0xd5ff mem 0xefefa000-0xefefa1ff irq 10 at device 9.0 on pci2 re0: Chip rev. 0x74800000 re0: MAC rev. 0x00000000 miibus0: <mii bus="">on re0 rlphy0: <realtek internal="" media="" interface="">PHY 0 on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re0: Ethernet address: 00:90:7f:2f:1a:63 re0: [FILTER] re1: <realtek 10="" 8139c+="" 100basetx="">port 0xd600-0xd6ff mem 0xefefb000-0xefefb1ff irq 5 at device 10.0 on pci2 re1: Chip rev. 0x74800000 re1: MAC rev. 0x00000000 miibus1: <mii bus="">on re1 rlphy1: <realtek internal="" media="" interface="">PHY 0 on miibus1 rlphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re1: Ethernet address: 00:90:7f:2f:1a:64 re1: [FILTER] re2: <realtek 10="" 8139c+="" 100basetx="">port 0xd900-0xd9ff mem 0xefefc000-0xefefc1ff irq 11 at device 11.0 on pci2 re2: Chip rev. 0x74800000 re2: MAC rev. 0x00000000 miibus2: <mii bus="">on re2 rlphy2: <realtek internal="" media="" interface="">PHY 0 on miibus2 rlphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re2: Ethernet address: 00:90:7f:2f:1a:65 re2: [FILTER] re3: <realtek 10="" 8139c+="" 100basetx="">port 0xda00-0xdaff mem 0xefefd000-0xefefd1ff irq 12 at device 12.0 on pci2 re3: Chip rev. 0x74800000 re3: MAC rev. 0x00000000 miibus3: <mii bus="">on re3 rlphy3: <realtek internal="" media="" interface="">PHY 0 on miibus3 rlphy3: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re3: Ethernet address: 00:90:7f:2f:1a:66 re3: [FILTER] re4: <realtek 10="" 8139c+="" 100basetx="">port 0xdd00-0xddff irq 9 at device 13.0 on pci2 re4: Chip rev. 0x74800000 re4: MAC rev. 0x00000000 miibus4: <mii bus="">on re4 rlphy4: <realtek internal="" media="" interface="">PHY 0 on miibus4 rlphy4: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re4: Ethernet address: 00:90:7f:2f:1a:67 re4: [FILTER] re5: <realtek 10="" 8139c+="" 100basetx="">port 0xde00-0xdeff irq 6 at device 14.0 on pci2 re5: Chip rev. 0x74800000 re5: MAC rev. 0x00000000 miibus5: <mii bus="">on re5 rlphy5: <realtek internal="" media="" interface="">PHY 0 on miibus5 rlphy5: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto re5: Ethernet address: 00:90:7f:2f:1a:68 re5: [FILTER] isab0: <pci-isa bridge="">at device 31.0 on pci0 isa0: <isa bus="">on isab0 atapci0: <intel ich2="" udma100="" controller="">port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 31.1 on pci0 ata0: <ata 0="" channel="">on atapci0 ata0: [ITHREAD] ata1: <ata 1="" channel="">on atapci0 ata1: [ITHREAD] cpu0 on motherboard pmtimer0 on isa0 orm0: <isa option="" rom="">at iomem 0xe0000-0xe0fff pnpid ORM0000 on isa0 atkbdc0: <keyboard controller="" (i8042)="">at port 0x60,0x64 on isa0 atkbd0: <at keyboard="">irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] ppc0: <parallel port="">at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/16 bytes threshold ppbus0: <parallel port="" bus="">on ppc0 ppbus0: [ITHREAD] plip0: <plip network="" interface="">on ppbus0 plip0: WARNING: using obsoleted IFF_NEEDSGIANT flag lpt0: <printer>on ppbus0 lpt0: Interrupt-driven port ppi0: <parallel i="" o="">on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A, console sio0: [FILTER] sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled unknown: <pnp0c01>can't assign resources (memory) unknown: <pnp0303>can't assign resources (port) speaker0: <pc speaker="">at port 0x61 pnpid PNP0800 on isa0 unknown: <pnp0501>can't assign resources (port) unknown: <pnp0401>can't assign resources (port) RTC BIOS diagnostic error 20 <config_unit>Timecounter "TSC" frequency 1202733008 Hz quality 800 Timecounters tick every 1.000 msec IPsec: Initialized Security Association Processing. ad2: DMA limited to UDMA33, controller found non-ATA66 cable ad2: 57231MB <ic25n060atmr04 0="" mo3oad5a="">at ata1-master UDMA33 GEOM: ad2: partition 1 does not start on a track boundary. GEOM: ad2: partition 1 does not end on a track boundary. Trying to mount root from ufs:/dev/ad2s1a re2: link state changed to UP re2: link state changed to DOWN re0: link state changed to UP re0: link state changed to DOWN re2: link state changed to UP re0: link state changed to UP re1: link state changed to DOWN re3: link state changed to DOWN re4: link state changed to DOWN re5: link state changed to DOWN pflog0: promiscuous mode enabled</ic25n060atmr04></config_unit></pnp0401></pnp0501></pc></pnp0303></pnp0c01></parallel></printer></plip></parallel></parallel></at></keyboard></isa></ata></ata></intel></isa></pci-isa></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></ralink></pci></pcibios></pci></pci-pci></intel></pci></pci></intel></software></fpu,vme,de,pse,tsc,msr,pae,mce,cx8,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse>
vmstat -i
interrupt total rate irq0: clk 2445041 1000 irq4: sio0 736 0 irq7: ppbus0 ppc0 1 0 irq8: rtc 312912 127 irq10: re0 849 0 irq11: re2 3204 1 irq15: ata1 10586 4 Total 2773329 1134
-
Not yet– Pyun has a couple of paid projects so the progress on this issue is at a bit of a standstill.
I will preface that what I am about to say may be completely rediculous, but my understanding is that the WatchGuard OS is based on Linux. If that's true, then perhaps someone can look at the Linux driver and compare it to the BSD one? Obviously they are structured differently and this may not make sense, but when the WatchGuard box is running the WatchGuard software, the firebox is a very stable unit, so someone knows how to make these realtek chips work!
-
Not yet – Pyun has a couple of paid projects so the progress on this issue is at a bit of a standstill.
ok thanks for the update.
then perhaps someone can look at the Linux driver and compare it to the BSD one?
I will try a debian install on the X500 some time this weekend.
Think the linux driver might have the same issue with the realtek drivers, its hard to find of the issue was ever fix or people just started using other network cards.
google around for:
"8139c problem oversized ethernet frame"
"realtec 8139c Abnormal interrupt"http://www.joshua.raleigh.nc.us/docs/linux-2.4.10_html/286454.html
http://article.gmane.org/gmane.linux.drivers.realtek.devel/420The X500 does have a pci slot, gonna try using a old sun pci quad port 10/100 network card which works in another pc running 1.2.3.rc2 version of pfsense. At least this should prove the X500 motherboard doesnt have issues controlling acpi/dma/interrupts of network cards.
Here is a pic of the sun card
http://www.sun.com/products/networking/ethernet/sunquadfastethernet/images/I1_hw_quadfastether_pci_i.jpg -
Debian net install works on the X500, now just need to find a way to overlay all the pfsense extras on the base Debian install :)
Using the same switch/cable/client the debian network driver seems to provide higher throughput.
Debian
iperf -c 192.168.100.2 -p 5010 -t 60------------------------------------------------------------ [ ID] Interval Transfer Bandwidth [108] 0.0-60.0 sec 602 MBytes 84.1 Mbits/sec
Freebsd7.2/Pfsense 1.2.3rc2
iperf -c 192.168.100.2 -p 5010 -t 60
------------------------------------------------------------ [ ID] Interval Transfer Bandwidth [108] 0.0-60.0 sec 465 MBytes 65.0 Mbits/sec
Wish i knew why this watchdog issue happens to some X500's device more then others.
-
I'm having exactly the same on my X500 too.
re0: watchdog timeout
re0: watchdog timeout
re0: watchdog timeout
re0: watchdog timeout
re0: watchdog timeoutI get them on all ports, internal, external whether connected to a switch/cable modem etc etc, nothing makes a difference. Its a shame, the Firebox running pfSense is really good except for the watchdog timeouts!
-
i only get the odd occassional timeout on my x500 since i've upgraded. is the updated code in the new embedded version? i'm running embedded 1.2.3-rc2 and fancy moving over to the new embedded but don't wreck what appears to be a stable install.
of the timeouts i get, they are generally when i'm playing about in the web interface. there's no timeouts if i leave it alone -
spoke too soon. still getting them but no where near as much. does the new nanobsd embedded have the patch installed?
-
I am planning to install debian on my x500 and use it as a "LAMP" server.
How was the net install conducted? Did you manage to get a keyboard to work as I'm having no luck following diagrams on another topic.
Cheers,
Andy
-
Wondering if there's any progress/updates here? I've got two different Firebox x700s that both display the watchdog timeouts on re0 (my LAN port). I was originally running 1.2.3 RC2 and upgraded to the latest firmware in the 1.2.X snapshots.
loki - care to elaborate on how you prepped your firebox for a netboot install of debian?
-
loki - care to elaborate on how you prepped your firebox for a netboot install of debian?
install a base debian from a net install cd on a normal pc. edited /etc/fstab and set the serial port for console access, pop the drive back into the firebox.
Overall wasnt very happy with the older firebox hardware, the network cards just dont seem to have great support with bsd.
I am now running the following jetway with 2g of mem and 1.2.3rc2, pretty happy with it.
xxxx://www.newegg.com/Product/Product.aspx?Item=N82E16856107059
-
I know it's been a while, but is there any progress on this? ???
-
I've been getting the same errors as everyone in this thread, using two fireboxes, an x500 as transparent firewall and an x700 as router/firewall. Like Spy Alelo, I'm also curious to see if there has been any progress on this and if there is perhaps something new that we can test/patch.
-
Call me crazy, but I removed the crypto card to test a mini PCI WiFi card, and have not had a single timeout while messing with the GUI. I removed the WiFi card anyway, since it wasnt supported, and still no timeouts. I have not upgraded the firmware, still using 1.2.3 release nor changed any settings.
Again, it may have been a fluke, but I will keep testing. The only thing that may make any sense, is that the crypto card was in some way being used for SSL on the WebGUI (for which I do have SSL enabled), and there may be some compatibility issue between it and the Realtek interfaces. I mean, seriously, I download over 60GB of data a month using torrents, not a single issue. Also use a VoIP phone non-stop sustaining a VPN connection while using a web based ticketing system 5 days a week, 8 hours a day and never get a single drop or a timeout. It only happens when I access the WebGUI within the first two minutes. And not a single timeout after removing the card? Can anyone else experiment and confirm this?