Attention Firebox X Series Users - Testing Needed
-
It happened again, with ACPI enabled - I had the dashboard up on 1.2.3 and walked away for a bit, came back to 6 or so watchdog errors. So far, 1.2.2 seemed more stable than 1.2.3 in regards to watchdog errors. I'm upgrading to the latest 2.0 build now, and crossing my fingers that everything I need is stable enough (and that the upgrade works, don't feel like pulling my cf card for a fresh install).
-
Just happened again on 2.0 (Wed Apr 22 06:33:58 EDT 2009 build) when I clicked the "System Tunables" tab under System, Advanced. Unable to reproduce following the same steps.
-
Found a way to recreate the watchdog errors… ssh to your Firebox running PFS and cat /dev/random.
-
Can you post your output from:
pciconf -lcv
dmesg
vmstat -iThanks for testing!
-
After you do that, can you try a fresh flash? I've only tried fresh flashes, and have still not been able to repro watchdogs on 2.0 (even with 3 cat /dev/random running).
-
@pciconf:
hostb0@pci0:0:0:0: class=0x060000 card=0x11308086 chip=0x11308086 rev=0x04 hdr=0x00
class = bridge
subclass = HOST-PCI
cap 09[88] = vendor (length 4) Intel cap 14 version 1
cap 02[a0] = AGP 2x 1x SBA disabled
pcib1@pci0:0:1:0: class=0x060400 card=0x00000000 chip=0x11318086 rev=0x04 hdr=0x01
class = bridge
subclass = PCI-PCI
pcib2@pci0:0:30:0: class=0x060400 card=0x00000000 chip=0x244e8086 rev=0x05 hdr=0x01
class = bridge
subclass = PCI-PCI
isab0@pci0:0:31:0: class=0x060100 card=0x00000000 chip=0x24408086 rev=0x05 hdr=0x00
class = bridge
subclass = PCI-ISA
atapci0@pci0:0:31:1: class=0x010180 card=0x24408086 chip=0x244b8086 rev=0x05 hdr=0x00
class = mass storage
subclass = ATA
safe0@pci0:2:6:0: class=0xff0000 card=0x00010001 chip=0x114116ae rev=0x01 hdr=0x00
re0@pci0:2:9:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
class = network
subclass = ethernet
cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0
re1@pci0:2:10:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
class = network
subclass = ethernet
cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0
re2@pci0:2:11:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
class = network
subclass = ethernet
cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0
re3@pci0:2:12:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
class = network
subclass = ethernet
cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0
re4@pci0:2:13:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
class = network
subclass = ethernet
cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0
re5@pci0:2:14:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
class = network
subclass = ethernet
cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0@dmesg:
Copyright 1992-2009 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.1-RELEASE-p5 #0: Wed Apr 22 13:05:32 EDT 2009
sullrich@RELENG_2_0-snapshots.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_wrap.7
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Celeron(TM) CPU 1200MHz (1202.74-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0x6b4 Stepping = 4
Features=0x383f9ff<fpu,vme,de,pse,tsc,msr,pae,mce,cx8,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse>real memory = 268435456 (256 MB)
avail memory = 248643584 (237 MB)
wlan: mac acl policy registered
ACPI Error (tbxfroot-0308): A valid RSDP was not found [20070320]
ACPI: Table initialisation failed: AE_NOT_FOUND
ACPI: Try disabling either ACPI or apic support.
cryptosoft0: <software crypto="">on motherboard
pcib0: <intel 82815="" (i815="" gmch)="" host="" to="" hub="" bridge="">pcibus 0 on motherboard
pir0: <pci 11="" interrupt="" routing="" table:="" entries="">on motherboard
$PIR: Using invalid BIOS IRQ 9 from 2.13.INTA for link 0x63
pci0: <pci bus="">on pcib0
pcib1: <pci-pci bridge="">at device 1.0 on pci0
pci1: <pci bus="">on pcib1
pcib2: <pcibios pci-pci="" bridge="">at device 30.0 on pci0
pci2: <pci bus="">on pcib2
safe0 mem 0xe7bfe000-0xe7bfffff irq 3 at device 6.0 on pci2
safe0: [ITHREAD]
safe0: SafeNet SafeXcel-1141 rng des/3des aes md5 sha1 null
re0: <realtek 10="" 8139c+="" 100basetx="">port 0xd500-0xd5ff mem 0xefefa000-0xefefa1ff irq 10 at device 9.0 on pci2
re0: Chip rev. 0x74800000
re0: MAC rev. 0x00000000
miibus0: <mii bus="">on re0
rlphy0: <realtek internal="" media="" interface="">PHY 0 on miibus0
rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
re0: Ethernet address: 00:90:7f:30:d6:1d
re0: [FILTER]
re1: <realtek 10="" 8139c+="" 100basetx="">port 0xd600-0xd6ff mem 0xefefb000-0xefefb1ff irq 5 at device 10.0 on pci2
re1: Chip rev. 0x74800000
re1: MAC rev. 0x00000000
miibus1: <mii bus="">on re1
rlphy1: <realtek internal="" media="" interface="">PHY 0 on miibus1
rlphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
re1: Ethernet address: 00:90:7f:30:d6:1e
re1: [FILTER]
re2: <realtek 10="" 8139c+="" 100basetx="">port 0xd900-0xd9ff mem 0xefefc000-0xefefc1ff irq 11 at device 11.0 on pci2
re2: Chip rev. 0x74800000
re2: MAC rev. 0x00000000
miibus2: <mii bus="">on re2
rlphy2: <realtek internal="" media="" interface="">PHY 0 on miibus2
rlphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
re2: Ethernet address: 00:90:7f:30:d6:1f
re2: [FILTER]
re3: <realtek 10="" 8139c+="" 100basetx="">port 0xda00-0xdaff mem 0xefefd000-0xefefd1ff irq 12 at device 12.0 on pci2
re3: Chip rev. 0x74800000
re3: MAC rev. 0x00000000
miibus3: <mii bus="">on re3
rlphy3: <realtek internal="" media="" interface="">PHY 0 on miibus3
rlphy3: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
re3: Ethernet address: 00:90:7f:30:d6:20
re3: [FILTER]
re4: <realtek 10="" 8139c+="" 100basetx="">port 0xdd00-0xddff mem 0xefefe000-0xefefe1ff irq 9 at device 13.0 on pci2
re4: Chip rev. 0x74800000
re4: MAC rev. 0x00000000
miibus4: <mii bus="">on re4
rlphy4: <realtek internal="" media="" interface="">PHY 0 on miibus4
rlphy4: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
re4: Ethernet address: 00:90:7f:30:d6:21
re4: [FILTER]
re5: <realtek 10="" 8139c+="" 100basetx="">port 0xde00-0xdeff mem 0xefeff000-0xefeff1ff irq 6 at device 14.0 on pci2
re5: Chip rev. 0x74800000
re5: MAC rev. 0x00000000
miibus5: <mii bus="">on re5
rlphy5: <realtek internal="" media="" interface="">PHY 0 on miibus5
rlphy5: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
re5: Ethernet address: 00:90:7f:30:d6:22
re5: [FILTER]
isab0: <pci-isa bridge="">at device 31.0 on pci0
isa0: <isa bus="">on isab0
atapci0: <intel ich2="" udma100="" controller="">port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 31.1 on pci0
ata0: <ata 0="" channel="">on atapci0
ata0: [ITHREAD]
ata1: <ata 1="" channel="">on atapci0
ata1: [ITHREAD]
cpu0 on motherboard
orm0: <isa option="" rom="">at iomem 0xe0000-0xe0fff pnpid ORM0000 on isa0
ppc0: <parallel port="">at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/16 bytes threshold
ppbus0: <parallel port="" bus="">on ppc0
ppbus0: [ITHREAD]
ppi0: <parallel i="" o="">on ppbus0
ppc0: [GIANT-LOCKED]
ppc0: [ITHREAD]
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A, console
sio0: [FILTER]
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
unknown: <pnp0c01>can't assign resources (memory)
speaker0: <pc speaker="">at port 0x61 pnpid PNP0800 on isa0
unknown: <pnp0501>can't assign resources (port)
unknown: <pnp0401>can't assign resources (port)
RTC BIOS diagnostic error 20<config_unit>Timecounter "TSC" frequency 1202735037 Hz quality 800
Timecounters tick every 10.000 msec
IPsec: Initialized Security Association Processing.
ad0: FAILURE - SET_MULTI status=51 <ready,dsc,error>error=4<aborted>ad0: 3887MB <cf4ghs 20080116="">at ata0-master PIO4
Trying to mount root from ufs:/dev/ad0s1a
re0: link state changed to UP
re0: link state changed to DOWN
re0: link state changed to UP
re1: link state changed to DOWN
re2: link state changed to UP
re2: link state changed to DOWN
bridge0: Ethernet address: 1e:64:49:eb:aa:05
re2: promiscuous mode enabled
re1: promiscuous mode enabled
re3: link state changed to DOWN
re4: link state changed to DOWN
re5: link state changed to DOWN
pflog0: promiscuous mode enabled
re1: link state changed to UP
re2: link state changed to UP
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout
re1: watchdog timeout</cf4ghs></aborted></ready,dsc,error></config_unit></pnp0401></pnp0501></pc></pnp0c01></parallel></parallel></parallel></isa></ata></ata></intel></isa></pci-isa></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></pci></pcibios></pci></pci-pci></pci></pci></intel></software></fpu,vme,de,pse,tsc,msr,pae,mce,cx8,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse>@vmstat:
interrupt total rate
irq0: clk 1206531 99
irq3: safe0 1 0
irq4: sio0 293 0
irq5: re1 142810 11
irq7: ppbus0 ppc0 1 0
irq8: rtc 1544290 127
irq10: re0 150821 12
irq11: re2 15686 1
irq14: ata0 1742049 144
Total 4802482 398 -
After you do that, can you try a fresh flash? I've only tried fresh flashes, and have still not been able to repro watchdogs on 2.0 (even with 3 cat /dev/random running).
I'll see if I can get a chance for a fresh flash this weekend. A single cat /dev/random will make my box fire off a few watchdog errors, become unresponsive for 2-3 minutes, and eventually end with a dead ssh session, every time I've tried.
-
Also, if you are able to repro on a fresh flash, can you try a capture?
tcpdump s 0 -w /tmp/re1.pcap -ni re1
(Replace re1 with re0 or re2, etc–whichever interface is giving you timeouts).
-
I noticed in your dmesg output:
–---
ACPI Error (tbxfroot-0308): A valid RSDP was not found [20070320]
ACPI: Table initialisation failed: AE_NOT_FOUND
ACPI: Try disabling either ACPI or apic support.
–---I don't have this in my dmesg output. What's in your /boot/loader.conf?
-
Defaults, plus disable DMA for my CF card (wouldn't work otherwise) and a commented out disable ACPI (tried on 1.2.2 to stop watchdog errors).
cat /boot/loader.conf
autoboot_delay="1"
vm.kmem_size="435544320"
vm.kmem_size_max="535544320"
kern.ipc.nmbclusters="0"
hw.ata.ata_dma=0
#hint.acpi.0.disabled=1 -
From a standard 2.0 flash, /boot/loader.conf contains:
hw.ata.atapi_dma="0"
hw.ata.ata_dma="0"
loader_color="NO"
console=comconsole
autoboot_delay="5"
hw.ata.wc="0"
kern.ipc.nmbclusters="0"
beastie_disable="YES"
vm.kmem_size="435544320"
vm.kmem_size_max="535544320" -
If you are able to get into the BIOS on your firebox, can you make sure you are testing with ACPI enabled? The error on your dmesg indicates that ACPI might be turned off in the BIOS.
-
If you are able to get into the BIOS on your firebox, can you make sure you are testing with ACPI enabled? The error on your dmesg indicates that ACPI might be turned off in the BIOS.
If I can find the PS2 port pinout for the motherboard, a PS2 connector I can repurpose, a PCI video card, and the time, I'll see what I can manage. Unless there is an easier way I'm missing?
-
I just tried turning on TSO on re1 (the interface I always see watchdog's on) and it definitely increased the time before cat'ing /dev/random caused an error. It still happened, but it took 1-2 minutes instead of 15 seconds. During the time I was cat'ing /dev/random, I definitely did notice a performance hit - I ran a speed test before and during, got 6.5mbit before and 4mbit (consistently). Still not fixed, but this seems like progress… I'll still try to get a clean image on at some point in the next few days to see if that helps.
-
I know I've never been in the BIOS of my fireboxes, just seems strange to see those odd messages in your dmesg output.
Don't tear up a connector yet– just do a fresh flash to 2.0 and test that way and let me know.
-
Just FYI, haven't been able to reproduce timeouts on 2.0 still, using cat /dev/random. As you can see, about 40GB of traffic has come out of that interface, Not a single watchdog timeout on 2.0.
LAN interface (re2)
Status up
MAC address 00:90:7f:32:8a:94
IP address 192.168.1.1
Subnet mask 255.255.255.0
Media 100baseTX <full-duplex>
In/out packets 41004200/41004169 (2.63 GB/41.24 GB)
In/out packets (pass) 41004169/64873580 (2.63 GB/41.24 GB)
In/out packets (block) 31/0 (3 KB/0 bytes)
In/out errors 0/0
Collisions 0Have you tried a fresh default flash of 2.0 yet?</full-duplex>
-
The problem still exists after a clean reinstall. Tested against 2009-04-25 17:12 build, with the following configuration:
- Added to /boot/loader.conf:
hw.ata.ata_dma=0 - Configured interfaces (set WAN=re0, set LAN=re1, configured IP/netmask on re1)
- Configured DHCP (reservations, address range, domain name, NTP server)
Cat'ing /dev/random still causes watchdog errors within 15-20 seconds. ACPI errors still show in dmesg. I also reset BIOS settings to defaults (via front LCD panel), with no noticeable changes.
- Added to /boot/loader.conf:
-
Any particular reason you're editing loader.conf and adding hw.ata.ata_dma=0? Like I said, Im using all defaults– the only other diff that I can see between you and me (besides the ACPI error) is that you are using re1 for LAN, and I am using re2 (normally I leave re0 and re1 in case I have multiple WANs-- my re1 is empty currently).
What is your LAN interface plugged into (a switch I presume? what kind)? My re2 (LAN) is plugged directly into my laptop with a crossover cable. I have also changed speed/duplex settings on my laptop during tests last week just to be sure there isn't a problem with a particular speed, and there is not.
Would it be possible to plug your LAN interface directly into a computer (via crossover) to take the switch out of the equation?
I am still puzzled by the ACPI errors in your dmesg output-- I have 3 firebox x5/7/1000 series units, and none of them have that error during bootup.
-
Ok, so I just flashed a 2.0 image, and changed my LAN interface to be re1 instead of re2–- guess what? watchdog timeouts.
-
That is odd, because I changed my LAN interface to re2 and still had watchdog timeouts. I haven't yet had a chance to connect directly without a switch, to see if that makes any difference. I'm currently running a Linksys SD2008, which is a 8-port unmanaged gig switch - By this weekend I'll probably have a Cisco 2950-24 sitting inline between the Firebox and Linksys. I'll update this thread with the results of direct vs Cisco vs Linksys, once I get to test.