Attention Firebox X Series Users - Testing Needed



  • Attention Firebox X500/700/1000 Users using pfSense:

    Watchdog timeouts getting’ you down? Thinkin’ about throwin’ that old Firebox in to the fireplace? Don’t do that just yet!  :)

    Thanks to the pfSense devs, along with Pyun YongHyeon, the maintainer for the FreeBSD Realtek network driver, it appears that we may have solved the issue with the watchdog timeouts on the Realtek 8139C+ chips that are used in these units. For the past couple of days, I have worked with Pyun, and yesterday Pyun sent me a patch, and that patch was committed to the 1.2.3 snapshot builds, as well as to the 2.0 alpha snapshot builds by the pfSense devs, and is part of any snapshot build as of yesterday (4/17) at 2pm Eastern time, or later.

    Snapshot builds can be downloaded from
    http://snapshots.pfsense.org/FreeBSD7/RELENG_1_2/
    or
    http://snapshots.pfsense.org/FreeBSD7/HEAD/

    I have been testing a build with this patch since yesterday, and have yet to see a single watchdog timeout on my interfaces—and no modifications to loader.conf have been made. This is a default install—no special options have been set anywhere.

    If at all possible, please try to install a recent snapshot build on your firebox units (those of you that have them) and test this patch.  If you do still receive watchdog timeouts, please let me know either on this list, or off-list. Either way, please try to detail what you were doing when the watchdog timeout occurred so that we can try to reproduce it, and Pyun can fix it.

    Thanks to all that have helped, and thanks to those that are willing to test!

    Dimitri Rodis
    Integrita Systems LLC
    http://www.integritasystems.com



  • Running 1.2.3 built at Tue Apr 21 23:12:33 EDT 2009, on a X500, and I am still receiving watchdog timeouts. I experienced 6 watchdog timeouts after approximately 5 hours of uptime, at which time the box would have been sitting basically idle (everyone in the house was asleep). I did previously disable ACPI in the loader.conf - I will attempt running with ACPI reenabled now, to see if that makes a difference.

    Based on RRD's, there was a small spike in CPU to about 10% during the event, prior to which cpu was 1-2%. Also, there was a jump from 50 to 100 states that lasted for about 20 minutes prior, as well as a small increase in throughput for about 5 minutes prior. The increase in throughput can be explained by a ssh bruteforce attempt directed at a server on the LAN (WAN TCP 22 port forwarded -> LAN) at a rate of 100 attempts over 5 minutes (04:59:52 - 05:04:29). Below is a snippet of the system.log/filter.log from before/during/after the event…

    @system.log:

    Apr 22 04:38:29 gateway dhclient[304]: DHCPREQUEST on re0 to x.x.x.x port 67                                     
    Apr 22 04:38:29 gateway dhclient[304]: SENDING DIRECT                                                                 
    Apr 22 04:38:29 gateway dhclient[304]: DHCPACK from x.x.x.x                                                     
    Apr 22 04:38:29 gateway dhclient[304]: bound to x.x.x.x – renewal in 1800 seconds.                             
    Apr 22 05:04:46 gateway kernel: re1: watchdog timeout                                                                 
    Apr 22 05:05:09 gateway kernel: re1: watchdog timeout                                                                 
    Apr 22 05:05:45 gateway kernel: re1: watchdog timeout                                                                 
    Apr 22 05:05:47 gateway miniupnpd[1081]: SUBSCRIBE not implemented. ENABLE_EVENTS compile option disabled             
    Apr 22 05:05:47 gateway last message repeated 2 times                                                                 
    Apr 22 05:06:36 gateway kernel: re1: watchdog timeout                                                                 
    Apr 22 05:07:01 gateway kernel: re1: watchdog timeout                                                                 
    Apr 22 05:07:15 gateway miniupnpd[1081]: SUBSCRIBE not implemented. ENABLE_EVENTS compile option disabled             
    Apr 22 05:07:15 gateway last message repeated 2 times                                                                 
    Apr 22 05:08:09 gateway kernel: re1: watchdog timeout                                                                 
    Apr 22 05:08:29 gateway dhclient[304]: DHCPREQUEST on re0 to x.x.x.x port 67                                     
    Apr 22 05:08:29 gateway dhclient[304]: SENDING DIRECT                                                                 
    Apr 22 05:08:29 gateway dhclient[304]: DHCPACK from x.x.x.x                                                     
    Apr 22 05:08:29 gateway dhclient[304]: bound to x.x.x.x – renewal in 1800 seconds.

    @filter.log:

    Apr 22 05:03:00 gateway pf: 579. 531199 rule 220/0(match): block in on re0: (tos 0x0, ttl 105, id 4509, offset 0, flags [none], proto ICMP (1), length 61) re.mo.te.ip > w.a.n.ip: ICMP echo request, id 512, seq 41905, length 41   
    Apr 22 05:03:02 gateway pf: 2. 079394 rule 220/0(match): block in on re0: (tos 0x0, ttl 105, id 35051, offset 0, flags [none], proto ICMP (1), length 61) re.mo.te.ip > w.a.n.ip: ICMP echo request, id 512, seq 36274, length 41   
    Apr 22 05:03:15 gateway pf: 12. 819850 rule 220/0(match): block in on re0: (tos 0x0, ttl 98, id 1460, offset 0, flags [none], proto TCP (6), length 40) re.mo.te.ip.6000 > w.a.n.ip.139: S, cksum 0x57a6 (correct), 800129024:800129024(0) win 16384                                                                                                       
    Apr 22 05:05:03 gateway pf: 108. 115100 rule 220/0(match): block in on re0: (tos 0x0, ttl 44, id 52961, offset 0, flags [DF], proto UDP (17), length 597) re.mo.te.ip.50803 > w.a.n.ip.1026: UDP, length 569                       
    Apr 22 05:08:29 gateway pf: 205. 927414 rule 220/0(match): block in on re0: (tos 0x0, ttl 250, id 8836, offset 0, flags [DF], proto UDP (17), length 328) re.mo.te.ip.67 > w.a.n.ip.68: BOOTP/DHCP, Reply, length 300, xid 0xa192aa09, Flags [none]                                                                                                         
    Apr 22 05:08:29 gateway pf:       Client-IP w.a.n.ip                                                             
    Apr 22 05:08:29 gateway pf:       Your-IP w.a.n.ip                                                               
    Apr 22 05:08:29 gateway pf:       Client-Ethernet-Address 00🇩🇪ad:be:ef:00 [|bootp]                                 
    Apr 22 05:10:39 gateway pf: 129. 668942 rule 220/0(match): block in on re0: (tos 0x0, ttl 108, id 21133, offset 0, flags [none], proto UDP (17), length 78) re.mo.te.ip.41344 > w.a.n.ip.137: NBT UDP PACKET(137): QUERY; REQUEST; BROADCAST

    Let me know if there is anything else that I can provide to help track this down.



  • Try a 2.0 build if possible. My experience is that there are still watchdog timeouts in 1.2.3, as you said. I think there might be something not quite right with the build process for 1.2.3, as 1.2.3 and 2.0 are supposed to be using the same base OS and patch set currently, and it does not appear that that is currently the case.

    I'm working through this with the devs currently.



  • It happened again, with ACPI enabled - I had the dashboard up on 1.2.3 and walked away for a bit, came back to 6 or so watchdog errors. So far, 1.2.2 seemed more stable than 1.2.3 in regards to watchdog errors. I'm upgrading to the latest 2.0 build now, and crossing my fingers that everything I need is stable enough (and that the upgrade works, don't feel like pulling my cf card for a fresh install).



  • Just happened again on 2.0 (Wed Apr 22 06:33:58 EDT 2009 build) when I clicked the "System Tunables" tab under System, Advanced. Unable to reproduce following the same steps.



  • Found a way to recreate the watchdog errors… ssh to your Firebox running PFS and cat /dev/random.



  • Can you post your output from:
    pciconf -lcv
    dmesg
    vmstat -i

    Thanks for testing!



  • After you do that, can you try a fresh flash? I've only tried fresh flashes, and have still not been able to repro watchdogs on 2.0 (even with 3 cat /dev/random running).



  • @pciconf:

    hostb0@pci0:0:0:0:      class=0x060000 card=0x11308086 chip=0x11308086 rev=0x04 hdr=0x00
        class      = bridge                                                               
        subclass  = HOST-PCI                                                             
        cap 09[88] = vendor (length 4) Intel cap 14 version 1                             
        cap 02[a0] = AGP 2x 1x SBA disabled                                               
    pcib1@pci0:0:1:0:      class=0x060400 card=0x00000000 chip=0x11318086 rev=0x04 hdr=0x01
        class      = bridge                                                               
        subclass  = PCI-PCI                                                               
    pcib2@pci0:0:30:0:      class=0x060400 card=0x00000000 chip=0x244e8086 rev=0x05 hdr=0x01
        class      = bridge                                                               
        subclass  = PCI-PCI                                                               
    isab0@pci0:0:31:0:      class=0x060100 card=0x00000000 chip=0x24408086 rev=0x05 hdr=0x00
        class      = bridge                                                               
        subclass  = PCI-ISA                                                               
    atapci0@pci0:0:31:1:    class=0x010180 card=0x24408086 chip=0x244b8086 rev=0x05 hdr=0x00
        class      = mass storage                                                         
        subclass  = ATA                                                                   
    safe0@pci0:2:6:0:      class=0xff0000 card=0x00010001 chip=0x114116ae rev=0x01 hdr=0x00
    re0@pci0:2:9:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00       
        class      = network                                                               
        subclass  = ethernet                                                             
        cap 01[50] = powerspec 2  supports D0 D1 D2 D3  current D0                         
    re1@pci0:2:10:0:        class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
        class      = network                                                               
        subclass  = ethernet                                                             
        cap 01[50] = powerspec 2  supports D0 D1 D2 D3  current D0                         
    re2@pci0:2:11:0:        class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
        class      = network                                                               
        subclass  = ethernet                                                             
        cap 01[50] = powerspec 2  supports D0 D1 D2 D3  current D0                         
    re3@pci0:2:12:0:        class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
        class      = network                                                               
        subclass  = ethernet                                                             
        cap 01[50] = powerspec 2  supports D0 D1 D2 D3  current D0                         
    re4@pci0:2:13:0:        class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
        class      = network                                                               
        subclass  = ethernet                                                             
        cap 01[50] = powerspec 2  supports D0 D1 D2 D3  current D0                         
    re5@pci0:2:14:0:        class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
        class      = network                                                               
        subclass  = ethernet                                                             
        cap 01[50] = powerspec 2  supports D0 D1 D2 D3  current D0

    @dmesg:

    Copyright © 1992-2009 The FreeBSD Project.                                                                         
    Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994                                             
            The Regents of the University of California. All rights reserved.                                           
    FreeBSD is a registered trademark of The FreeBSD Foundation.                                                         
    FreeBSD 7.1-RELEASE-p5 #0: Wed Apr 22 13:05:32 EDT 2009                                                             
        sullrich@RELENG_2_0-snapshots.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_wrap.7                 
    Timecounter "i8254" frequency 1193182 Hz quality 0                                                                   
    CPU: Intel(R) Celeron(TM) CPU                1200MHz (1202.74-MHz 686-class CPU)                                     
      Origin = "GenuineIntel"  Id = 0x6b4  Stepping = 4                                                                 
      Features=0x383f9ff<fpu,vme,de,pse,tsc,msr,pae,mce,cx8,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse>real memory  = 268435456 (256 MB)                                                                                   
    avail memory = 248643584 (237 MB)                                                                                   
    wlan: mac acl policy registered                                                                                     
    ACPI Error (tbxfroot-0308): A valid RSDP was not found [20070320]                                                   
    ACPI: Table initialisation failed: AE_NOT_FOUND                                                                     
    ACPI: Try disabling either ACPI or apic support.                                                                     
    cryptosoft0: <software crypto="">on motherboard                                                                       
    pcib0: <intel 82815="" (i815="" gmch)="" host="" to="" hub="" bridge="">pcibus 0 on motherboard                                         
    pir0: <pci 11="" interrupt="" routing="" table:="" entries="">on motherboard                                                       
    $PIR: Using invalid BIOS IRQ 9 from 2.13.INTA for link 0x63                                                         
    pci0: <pci bus="">on pcib0                                                                                             
    pcib1: <pci-pci bridge="">at device 1.0 on pci0                                                                       
    pci1: <pci bus="">on pcib1                                                                                             
    pcib2: <pcibios pci-pci="" bridge="">at device 30.0 on pci0                                                               
    pci2: <pci bus="">on pcib2                                                                                             
    safe0 mem 0xe7bfe000-0xe7bfffff irq 3 at device 6.0 on pci2                                                         
    safe0: [ITHREAD]                                                                                                     
    safe0: SafeNet SafeXcel-1141 rng des/3des aes md5 sha1 null                                                         
    re0: <realtek 10="" 8139c+="" 100basetx="">port 0xd500-0xd5ff mem 0xefefa000-0xefefa1ff irq 10 at device 9.0 on pci2         
    re0: Chip rev. 0x74800000                                                                                           
    re0: MAC rev. 0x00000000                                                                                             
    miibus0: <mii bus="">on re0                                                                                           
    rlphy0: <realtek internal="" media="" interface="">PHY 0 on miibus0                                                         
    rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto                                                       
    re0: Ethernet address: 00:90:7f:30:d6:1d                                                                             
    re0: [FILTER]                                                                                                       
    re1: <realtek 10="" 8139c+="" 100basetx="">port 0xd600-0xd6ff mem 0xefefb000-0xefefb1ff irq 5 at device 10.0 on pci2         
    re1: Chip rev. 0x74800000                                                                                           
    re1: MAC rev. 0x00000000                                                                                             
    miibus1: <mii bus="">on re1                                                                                           
    rlphy1: <realtek internal="" media="" interface="">PHY 0 on miibus1                                                         
    rlphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto                                                       
    re1: Ethernet address: 00:90:7f:30:d6:1e                                                                             
    re1: [FILTER]                                                                                                       
    re2: <realtek 10="" 8139c+="" 100basetx="">port 0xd900-0xd9ff mem 0xefefc000-0xefefc1ff irq 11 at device 11.0 on pci2       
    re2: Chip rev. 0x74800000                                                                                           
    re2: MAC rev. 0x00000000                                                                                             
    miibus2: <mii bus="">on re2                                                                                           
    rlphy2: <realtek internal="" media="" interface="">PHY 0 on miibus2                                                         
    rlphy2:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto                                                       
    re2: Ethernet address: 00:90:7f:30:d6:1f                                                                             
    re2: [FILTER]                                                                                                       
    re3: <realtek 10="" 8139c+="" 100basetx="">port 0xda00-0xdaff mem 0xefefd000-0xefefd1ff irq 12 at device 12.0 on pci2       
    re3: Chip rev. 0x74800000                                                                                           
    re3: MAC rev. 0x00000000                                                                                             
    miibus3: <mii bus="">on re3                                                                                           
    rlphy3: <realtek internal="" media="" interface="">PHY 0 on miibus3                                                         
    rlphy3:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto                                                       
    re3: Ethernet address: 00:90:7f:30:d6:20                                                                             
    re3: [FILTER]                                                                                                       
    re4: <realtek 10="" 8139c+="" 100basetx="">port 0xdd00-0xddff mem 0xefefe000-0xefefe1ff irq 9 at device 13.0 on pci2         
    re4: Chip rev. 0x74800000                                                                                           
    re4: MAC rev. 0x00000000                                                                                             
    miibus4: <mii bus="">on re4                                                                                           
    rlphy4: <realtek internal="" media="" interface="">PHY 0 on miibus4                                                         
    rlphy4:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto                                                       
    re4: Ethernet address: 00:90:7f:30:d6:21                                                                             
    re4: [FILTER]                                                                                                       
    re5: <realtek 10="" 8139c+="" 100basetx="">port 0xde00-0xdeff mem 0xefeff000-0xefeff1ff irq 6 at device 14.0 on pci2         
    re5: Chip rev. 0x74800000                                                                                           
    re5: MAC rev. 0x00000000                                                                                             
    miibus5: <mii bus="">on re5                                                                                           
    rlphy5: <realtek internal="" media="" interface="">PHY 0 on miibus5                                                         
    rlphy5:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto                                                       
    re5: Ethernet address: 00:90:7f:30:d6:22                                                                             
    re5: [FILTER]                                                                                                       
    isab0: <pci-isa bridge="">at device 31.0 on pci0                                                                       
    isa0: <isa bus="">on isab0                                                                                             
    atapci0: <intel ich2="" udma100="" controller="">port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 31.1 on pci0
    ata0: <ata 0="" channel="">on atapci0                                                                                     
    ata0: [ITHREAD]                                                                                                     
    ata1: <ata 1="" channel="">on atapci0                                                                                     
    ata1: [ITHREAD]                                                                                                     
    cpu0 on motherboard                                                                                                 
    orm0: <isa option="" rom="">at iomem 0xe0000-0xe0fff pnpid ORM0000 on isa0                                               
    ppc0: <parallel port="">at port 0x378-0x37f irq 7 on isa0                                                             
    ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode                                                           
    ppc0: FIFO with 16/16/16 bytes threshold                                                                             
    ppbus0: <parallel port="" bus="">on ppc0                                                                                 
    ppbus0: [ITHREAD]                                                                                                   
    ppi0: <parallel i="" o="">on ppbus0                                                                                       
    ppc0: [GIANT-LOCKED]                                                                                                 
    ppc0: [ITHREAD]                                                                                                     
    sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0                                                                   
    sio0: type 16550A, console                                                                                           
    sio0: [FILTER]                                                                                                       
    sio1: configured irq 3 not in bitmap of probed irqs 0                                                               
    sio1: port may not be enabled                                                                                       
    unknown: <pnp0c01>can't assign resources (memory)                                                                   
    speaker0: <pc speaker="">at port 0x61 pnpid PNP0800 on isa0                                                           
    unknown: <pnp0501>can't assign resources (port)                                                                     
    unknown: <pnp0401>can't assign resources (port)                                                                     
    RTC BIOS diagnostic error 20<config_unit>Timecounter "TSC" frequency 1202735037 Hz quality 800                                                               
    Timecounters tick every 10.000 msec                                                                                 
    IPsec: Initialized Security Association Processing.                                                                 
    ad0: FAILURE - SET_MULTI status=51 <ready,dsc,error>error=4<aborted>ad0: 3887MB <cf4ghs 20080116="">at ata0-master PIO4                                                                   
    Trying to mount root from ufs:/dev/ad0s1a                                                                           
    re0: link state changed to UP                                                                                       
    re0: link state changed to DOWN                                                                                     
    re0: link state changed to UP                                                                                       
    re1: link state changed to DOWN                                                                                     
    re2: link state changed to UP                                                                                       
    re2: link state changed to DOWN                                                                                     
    bridge0: Ethernet address: 1e:64:49:eb:aa:05                                                                         
    re2: promiscuous mode enabled                                                                                       
    re1: promiscuous mode enabled
    re3: link state changed to DOWN
    re4: link state changed to DOWN
    re5: link state changed to DOWN
    pflog0: promiscuous mode enabled
    re1: link state changed to UP
    re2: link state changed to UP
    re1: watchdog timeout
    re1: watchdog timeout
    re1: watchdog timeout
    re1: watchdog timeout
    re1: watchdog timeout
    re1: watchdog timeout
    re1: watchdog timeout
    re1: watchdog timeout
    re1: watchdog timeout
    re1: watchdog timeout
    re1: watchdog timeout
    re1: watchdog timeout
    re1: watchdog timeout
    re1: watchdog timeout
    re1: watchdog timeout
    re1: watchdog timeout</cf4ghs></aborted></ready,dsc,error></config_unit></pnp0401></pnp0501></pc></pnp0c01></parallel></parallel></parallel></isa></ata></ata></intel></isa></pci-isa></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></pci></pcibios></pci></pci-pci></pci></pci></intel></software></fpu,vme,de,pse,tsc,msr,pae,mce,cx8,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse>

    @vmstat:

    interrupt                          total      rate
    irq0: clk                        1206531        99
    irq3: safe0                            1          0
    irq4: sio0                          293          0
    irq5: re1                        142810        11
    irq7: ppbus0 ppc0                      1          0
    irq8: rtc                        1544290        127
    irq10: re0                        150821        12
    irq11: re2                        15686          1
    irq14: ata0                      1742049        144
    Total                            4802482        398



  • @DimitriRodis:

    After you do that, can you try a fresh flash? I've only tried fresh flashes, and have still not been able to repro watchdogs on 2.0 (even with 3 cat /dev/random running).

    I'll see if I can get a chance for a fresh flash this weekend. A single cat /dev/random will make my box fire off a few watchdog errors, become unresponsive for 2-3 minutes, and eventually end with a dead ssh session, every time I've tried.



  • Also, if you are able to repro on a fresh flash, can you try a capture?

    tcpdump s 0 -w /tmp/re1.pcap -ni re1

    (Replace re1 with re0 or re2, etc–whichever interface is giving you timeouts).



  • I noticed in your dmesg output:
    –---
    ACPI Error (tbxfroot-0308): A valid RSDP was not found [20070320]                                                   
    ACPI: Table initialisation failed: AE_NOT_FOUND                                                                     
    ACPI: Try disabling either ACPI or apic support.                                                                     
    –---

    I don't have this in my dmesg output. What's in your /boot/loader.conf?



  • Defaults, plus disable DMA for my CF card (wouldn't work otherwise) and a commented out disable ACPI (tried on 1.2.2 to stop watchdog errors).

    cat /boot/loader.conf

    autoboot_delay="1"
    vm.kmem_size="435544320"
    vm.kmem_size_max="535544320"
    kern.ipc.nmbclusters="0"
    hw.ata.ata_dma=0
    #hint.acpi.0.disabled=1



  • From a standard 2.0 flash, /boot/loader.conf contains:

    hw.ata.atapi_dma="0"
    hw.ata.ata_dma="0"
    loader_color="NO"
    console=comconsole
    autoboot_delay="5"
    hw.ata.wc="0"
    kern.ipc.nmbclusters="0"
    beastie_disable="YES"
    vm.kmem_size="435544320"
    vm.kmem_size_max="535544320"



  • If you are able to get into the BIOS on your firebox, can you make sure you are testing with ACPI enabled? The error on your dmesg indicates that ACPI might be turned off in the BIOS.



  • @DimitriRodis:

    If you are able to get into the BIOS on your firebox, can you make sure you are testing with ACPI enabled? The error on your dmesg indicates that ACPI might be turned off in the BIOS.

    If I can find the PS2 port pinout for the motherboard, a PS2 connector I can repurpose, a PCI video card, and the time, I'll see what I can manage. Unless there is an easier way I'm missing?



  • I just tried turning on TSO on re1 (the interface I always see watchdog's on) and it definitely increased the time before cat'ing /dev/random caused an error. It still happened, but it took 1-2 minutes instead of 15 seconds. During the time I was cat'ing /dev/random, I definitely did notice a performance hit - I ran a speed test before and during, got 6.5mbit before and 4mbit (consistently). Still not fixed, but this seems like progress… I'll still try to get a clean image on at some point in the next few days to see if that helps.



  • I know I've never been in the BIOS of my fireboxes, just seems strange to see those odd messages in your dmesg output.

    Don't tear up a connector yet– just do a fresh flash to 2.0 and test that way and let me know.



  • Just FYI, haven't been able to reproduce timeouts on 2.0 still, using cat /dev/random. As you can see, about 40GB of traffic has come out of that interface, Not a single watchdog timeout on 2.0.

    LAN interface (re2) 
    Status up 
    MAC address 00:90:7f:32:8a:94 
    IP address 192.168.1.1   
    Subnet mask 255.255.255.0 
    Media 100baseTX <full-duplex> 
    In/out packets 41004200/41004169 (2.63 GB/41.24 GB) 
    In/out packets (pass) 41004169/64873580 (2.63 GB/41.24 GB) 
    In/out packets (block) 31/0 (3 KB/0 bytes) 
    In/out errors 0/0 
    Collisions 0

    Have you tried a fresh default flash of 2.0 yet?</full-duplex>



  • The problem still exists after a clean reinstall. Tested against 2009-04-25 17:12 build, with the following configuration:

    • Added to /boot/loader.conf:
      hw.ata.ata_dma=0
    • Configured interfaces (set WAN=re0, set LAN=re1, configured IP/netmask on re1)
    • Configured DHCP (reservations, address range, domain name, NTP server)

    Cat'ing /dev/random still causes watchdog errors within 15-20 seconds. ACPI errors still show in dmesg. I also reset BIOS settings to defaults (via front LCD panel), with no noticeable changes.



  • Any particular reason you're editing loader.conf and adding hw.ata.ata_dma=0? Like I said, Im using all defaults– the only other diff that I can see between you and me (besides the ACPI error) is that you are using re1 for LAN, and I am using re2 (normally I leave re0 and re1 in case I have multiple WANs-- my re1 is empty currently).

    What is your LAN interface plugged into (a switch I presume? what kind)? My re2 (LAN) is plugged directly into my laptop with a crossover cable. I have also changed speed/duplex settings on my laptop during tests last week just to be sure there isn't a problem with a particular speed, and there is not.

    Would it be possible to plug your LAN interface directly into a computer (via crossover) to take the switch out of the equation?

    I am still puzzled by the ACPI errors in your dmesg output-- I have 3 firebox x5/7/1000 series units, and none of them have that error during bootup.



  • Ok, so I just flashed a 2.0 image, and changed my LAN interface to be re1 instead of re2–- guess what? watchdog timeouts.



  • That is odd, because I changed my LAN interface to re2 and still had watchdog timeouts. I haven't yet had a chance to connect directly without a switch, to see if that makes any difference. I'm currently running a Linksys SD2008, which is a 8-port unmanaged gig switch - By this weekend I'll probably have a Cisco 2950-24 sitting inline between the Firebox and Linksys. I'll update this thread with the results of direct vs Cisco vs Linksys, once I get to test.



  • For those of you watching this thread–stay tuned. I am still working with Pyun. I have another patch to test!



  • @rewt:

    @DimitriRodis:

    If you are able to get into the BIOS on your firebox, can you make sure you are testing with ACPI enabled? The error on your dmesg indicates that ACPI might be turned off in the BIOS.

    If I can find the PS2 port pinout for the motherboard, a PS2 connector I can repurpose, a PCI video card, and the time, I'll see what I can manage. Unless there is an easier way I'm missing?

    See my keyboard hack here: http://forum.pfsense.org/index.php/topic,7458.msg84324.html#msg84324



  • I have to say, ever since this thread was started I've been having more and more watchdog timeouts.

    In previous builds I would get them very seldom (not even once a day), but since whatever code was changed to fix this I am seeing them, on average, 5 times a day, often for about a minute each time.

    Nothing has changed configuration wise between builds, I'm running WAN on re0, LAN on re1, DMZ on re2 and Wifi on re4/5. From what I can tell I never have any timeouts on re0/1.

    In any case, I'll give it a few more builds, but unless we can get it fixed I'll have to roll back to the earlier build as the system, as it stands today, is getting to be unusable (and yes, I know, don't run v2.0 in production :p).



  • There have only been 2 patches that have even made it into the publicly downloadable builds, and 1 of them was late late yesterday. My guess is that you are seeing more watchdog timeouts due to something changing in your environment as opposed to changes in pfSense– since, with respect to the Realtek interface code, it has only changed twice (and one of those changes was for the better), the other patch that just got put in yesterday will likely be rolled back since it did not seem to improve things (and it actually may have made it worse). Working with the driver maintainer is challenging since there is a 17 hour time difference between he and I, plus I need to have his patches incorporated and wait for a new build to test before I can get back to him.

    Like I said, stay tuned. When I've worked out something that appears to have solved it, I will need people like you to beat it up--originally, I thought I had it licked (since it solved the particular problem that I was causing), but there are still others present.



  • Hello Dimitri

    got a firebox x500 off of ebay and was hoping i wouldnt run into the watchdog errors with the realtec network cards, but i wasnt that lucky.

    gave pfSense-1.2.3-20090708-1942 snapshot a test tonight and i am able to reproduce the watchdog errors with the cat /dev/urandom  test or even by installing the NUT package and then going to the web interface "Services -> NUT" about 5-8 secs after the page starts to load i get the following error in the console:

    
     re2: watchdog timeout
    
    

    once this error pops up in the console screen i am unable to ping to/from that interface until i hardpower off the x500 device. From the console if i hit 5 "Reboot System" or type reboot  pfsense starts running the shutdown process but then stops at the "Rebooting…" message.

    
    re2:watchdog timeout
    re2:watchdog timeout
    re2:watchdog timeout
     # reboot
     pflog0: promiscuous mode disabled
     TWaiting (max 60 seconds) for system process `vnlru' to stop...done 
     Waiting (max 60 seconds) for system process `bufdaemon' to stop...done 
     Waiting (max 60 seconds) for system process `syncer' to stop...
     Syncing disks, vnodes remaining...4 2 0 0 done
     All buffers synced.
     Uptime: 38m24s
     Rebooting...
    
    

    Another oddity

    Running "halt system" from the console menu works until i hit the "press any key to reboot" part. As soon as i hit the "AnyKey" the speaker on the x500 screams like crazy. Both "reboot" and "halt/reboot" work just fine until the watchdog errors starts to pop up.  Any other debugging i can do on my end to help?

    -loki

    pciconf -lcv

    
    hostb0@pci0:0:0:0:      class=0x060000 card=0x11308086 chip=0x11308086 rev=0x04 hdr=0x00
        class      = bridge
        subclass   = HOST-PCI
        cap 09[88] = vendor (length 4) Intel cap 14 version 1
        cap 02[a0] = AGP 2x 1x SBA disabled
    pcib1@pci0:0:1:0:       class=0x060400 card=0x00000000 chip=0x11318086 rev=0x04 hdr=0x01
        class      = bridge
        subclass   = PCI-PCI
    pcib2@pci0:0:30:0:      class=0x060400 card=0x00000000 chip=0x244e8086 rev=0x05 hdr=0x01
        class      = bridge
        subclass   = PCI-PCI
    isab0@pci0:0:31:0:      class=0x060100 card=0x00000000 chip=0x24408086 rev=0x05 hdr=0x00
        class      = bridge
        subclass   = PCI-ISA
    atapci0@pci0:0:31:1:    class=0x010180 card=0x24408086 chip=0x244b8086 rev=0x05 hdr=0x00
        class      = mass storage
        subclass   = ATA
    ral0@pci0:2:6:0:        class=0x028000 card=0x3c421186 chip=0x02011814 rev=0x01 hdr=0x00
        class      = network
        cap 01[40] = powerspec 2  supports D0 D3  current D0
    re0@pci0:2:9:0: class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
        class      = network
        subclass   = ethernet
        cap 01[50] = powerspec 2  supports D0 D1 D2 D3  current D0
    re1@pci0:2:10:0:        class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
        class      = network
        subclass   = ethernet
        cap 01[50] = powerspec 2  supports D0 D1 D2 D3  current D0
    re2@pci0:2:11:0:        class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
        class      = network
        subclass   = ethernet
        cap 01[50] = powerspec 2  supports D0 D1 D2 D3  current D0
    re3@pci0:2:12:0:        class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
        class      = network
        subclass   = ethernet
        cap 01[50] = powerspec 2  supports D0 D1 D2 D3  current D0
    re4@pci0:2:13:0:        class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
        class      = network
        subclass   = ethernet
        cap 01[50] = powerspec 2  supports D0 D1 D2 D3  current D0
    re5@pci0:2:14:0:        class=0x020000 card=0x813910ec chip=0x813910ec rev=0x20 hdr=0x00
        class      = network
        subclass   = ethernet
        cap 01[50] = powerspec 2  supports D0 D1 D2 D3  current D0
    
    

    dmesg

    
    Copyright (c) 1992-2009 The FreeBSD Project.
    Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
            The Regents of the University of California. All rights reserved.
    FreeBSD is a registered trademark of The FreeBSD Foundation.
    FreeBSD 7.2-RELEASE-p2 #0: Wed Jul  8 19:39:37 EDT 2009
        sullrich@FreeBSD-7_2-RELENG_1_2-snapshots.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.7
    Timecounter "i8254" frequency 1193182 Hz quality 0
    CPU: Intel(R) Celeron(TM) CPU                1200MHz (1202.73-MHz 686-class CPU)
      Origin = "GenuineIntel"  Id = 0x6b4  Stepping = 4
      Features=0x383f9ff <fpu,vme,de,pse,tsc,msr,pae,mce,cx8,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse>real memory  = 536870912 (512 MB)
    avail memory = 508604416 (485 MB)
    wlan: mac acl policy registered
    kbd1 at kbdmux0
    cryptosoft0: <software crypto="">on motherboard
    pcib0: <intel 82815="" (i815="" gmch)="" host="" to="" hub="" bridge="">pcibus 0 on motherboard
    pir0: <pci 11="" interrupt="" routing="" table:="" entries="">on motherboard
    $PIR: Using invalid BIOS IRQ 9 from 2.13.INTA for link 0x63
    pci0: <pci bus="">on pcib0
    agp0: <intel 82815="" (i815="" gmch)="" host="" to="" pci="" bridge="">on hostb0
    pcib1: <pci-pci bridge="">at device 1.0 on pci0
    pci1: <pci bus="">on pcib1
    pcib2: <pcibios pci-pci="" bridge="">at device 30.0 on pci0
    pci2: <pci bus="">on pcib2
    ral0: <ralink technology="" rt2560="">mem 0xefefe000-0xefefffff irq 3 at device 6.0 on pci2
    ral0: MAC/BBP RT2560 (rev 0x04), RF RT2525
    ral0: Ethernet address: 00:0f:a3:74:4a:7a
    ral0: [ITHREAD]
    re0: <realtek 10="" 8139c+="" 100basetx="">port 0xd500-0xd5ff mem 0xefefa000-0xefefa1ff irq 10 at device 9.0 on pci2
    re0: Chip rev. 0x74800000
    re0: MAC rev. 0x00000000
    miibus0: <mii bus="">on re0
    rlphy0: <realtek internal="" media="" interface="">PHY 0 on miibus0
    rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
    re0: Ethernet address: 00:90:7f:2f:1a:63
    re0: [FILTER]
    re1: <realtek 10="" 8139c+="" 100basetx="">port 0xd600-0xd6ff mem 0xefefb000-0xefefb1ff irq 5 at device 10.0 on pci2
    re1: Chip rev. 0x74800000
    re1: MAC rev. 0x00000000
    miibus1: <mii bus="">on re1
    rlphy1: <realtek internal="" media="" interface="">PHY 0 on miibus1
    rlphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
    re1: Ethernet address: 00:90:7f:2f:1a:64
    re1: [FILTER]
    re2: <realtek 10="" 8139c+="" 100basetx="">port 0xd900-0xd9ff mem 0xefefc000-0xefefc1ff irq 11 at device 11.0 on pci2
    re2: Chip rev. 0x74800000
    re2: MAC rev. 0x00000000
    miibus2: <mii bus="">on re2
    rlphy2: <realtek internal="" media="" interface="">PHY 0 on miibus2
    rlphy2:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
    re2: Ethernet address: 00:90:7f:2f:1a:65
    re2: [FILTER]
    re3: <realtek 10="" 8139c+="" 100basetx="">port 0xda00-0xdaff mem 0xefefd000-0xefefd1ff irq 12 at device 12.0 on pci2
    re3: Chip rev. 0x74800000
    re3: MAC rev. 0x00000000
    miibus3: <mii bus="">on re3
    rlphy3: <realtek internal="" media="" interface="">PHY 0 on miibus3
    rlphy3:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
    re3: Ethernet address: 00:90:7f:2f:1a:66
    re3: [FILTER]
    re4: <realtek 10="" 8139c+="" 100basetx="">port 0xdd00-0xddff irq 9 at device 13.0 on pci2
    re4: Chip rev. 0x74800000
    re4: MAC rev. 0x00000000
    miibus4: <mii bus="">on re4
    rlphy4: <realtek internal="" media="" interface="">PHY 0 on miibus4
    rlphy4:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
    re4: Ethernet address: 00:90:7f:2f:1a:67
    re4: [FILTER]
    re5: <realtek 10="" 8139c+="" 100basetx="">port 0xde00-0xdeff irq 6 at device 14.0 on pci2
    re5: Chip rev. 0x74800000
    re5: MAC rev. 0x00000000
    miibus5: <mii bus="">on re5
    rlphy5: <realtek internal="" media="" interface="">PHY 0 on miibus5
    rlphy5:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
    re5: Ethernet address: 00:90:7f:2f:1a:68
    re5: [FILTER]
    isab0: <pci-isa bridge="">at device 31.0 on pci0
    isa0: <isa bus="">on isab0
    atapci0: <intel ich2="" udma100="" controller="">port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 31.1 on pci0
    ata0: <ata 0="" channel="">on atapci0
    ata0: [ITHREAD]
    ata1: <ata 1="" channel="">on atapci0
    ata1: [ITHREAD]
    cpu0 on motherboard
    pmtimer0 on isa0
    orm0: <isa option="" rom="">at iomem 0xe0000-0xe0fff pnpid ORM0000 on isa0
    atkbdc0: <keyboard controller="" (i8042)="">at port 0x60,0x64 on isa0
    atkbd0: <at keyboard="">irq 1 on atkbdc0
    kbd0 at atkbd0
    atkbd0: [GIANT-LOCKED]
    atkbd0: [ITHREAD]
    ppc0: <parallel port="">at port 0x378-0x37f irq 7 on isa0
    ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
    ppc0: FIFO with 16/16/16 bytes threshold
    ppbus0: <parallel port="" bus="">on ppc0
    ppbus0: [ITHREAD]
    plip0: <plip network="" interface="">on ppbus0
    plip0: WARNING: using obsoleted IFF_NEEDSGIANT flag
    lpt0: <printer>on ppbus0
    lpt0: Interrupt-driven port
    ppi0: <parallel i="" o="">on ppbus0
    ppc0: [GIANT-LOCKED]
    ppc0: [ITHREAD]
    sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
    sio0: type 16550A, console
    sio0: [FILTER]
    sio1: configured irq 3 not in bitmap of probed irqs 0
    sio1: port may not be enabled
    unknown: <pnp0c01>can't assign resources (memory)
    unknown: <pnp0303>can't assign resources (port)
    speaker0: <pc speaker="">at port 0x61 pnpid PNP0800 on isa0
    unknown: <pnp0501>can't assign resources (port)
    unknown: <pnp0401>can't assign resources (port)
    RTC BIOS diagnostic error 20 <config_unit>Timecounter "TSC" frequency 1202733008 Hz quality 800
    Timecounters tick every 1.000 msec
    IPsec: Initialized Security Association Processing.
    ad2: DMA limited to UDMA33, controller found non-ATA66 cable
    ad2: 57231MB <ic25n060atmr04 0="" mo3oad5a="">at ata1-master UDMA33
    GEOM: ad2: partition 1 does not start on a track boundary.
    GEOM: ad2: partition 1 does not end on a track boundary.
    Trying to mount root from ufs:/dev/ad2s1a
    re2: link state changed to UP
    re2: link state changed to DOWN
    re0: link state changed to UP
    re0: link state changed to DOWN
    re2: link state changed to UP
    re0: link state changed to UP
    re1: link state changed to DOWN
    re3: link state changed to DOWN
    re4: link state changed to DOWN
    re5: link state changed to DOWN
    pflog0: promiscuous mode enabled</ic25n060atmr04></config_unit></pnp0401></pnp0501></pc></pnp0303></pnp0c01></parallel></printer></plip></parallel></parallel></at></keyboard></isa></ata></ata></intel></isa></pci-isa></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></realtek></mii></realtek></ralink></pci></pcibios></pci></pci-pci></intel></pci></pci></intel></software></fpu,vme,de,pse,tsc,msr,pae,mce,cx8,sep,mtrr,pge,mca,cmov,pat,pse36,mmx,fxsr,sse> 
    

    vmstat -i

    
    interrupt                          total       rate
    irq0: clk                        2445041       1000
    irq4: sio0                           736          0
    irq7: ppbus0 ppc0                      1          0
    irq8: rtc                         312912        127
    irq10: re0                           849          0
    irq11: re2                          3204          1
    irq15: ata1                        10586          4
    Total                            2773329       1134
    
    


  • Not yet– Pyun has a couple of paid projects so the progress on this issue is at a bit of a standstill.

    I will preface that what I am about to say may be completely rediculous, but my understanding is that the WatchGuard OS is based on Linux. If that's true, then perhaps someone can look at the Linux driver and compare it to the BSD one? Obviously they are structured differently and this may not make sense, but when the WatchGuard box is running the WatchGuard software, the firebox is a very stable unit, so someone knows how to make these realtek chips work!



  • Not yet – Pyun has a couple of paid projects so the progress on this issue is at a bit of a standstill.

    ok thanks for the update.

    then perhaps someone can look at the Linux driver and compare it to the BSD one?

    I will try a debian install on the X500 some time this weekend.

    Think the linux driver might have the same issue with the realtek drivers, its hard to find of the issue was ever fix or people just started using other network cards.
    google around for:
    "8139c problem oversized ethernet frame"
    "realtec 8139c  Abnormal interrupt"

    http://www.joshua.raleigh.nc.us/docs/linux-2.4.10_html/286454.html
    http://article.gmane.org/gmane.linux.drivers.realtek.devel/420

    The X500 does have a pci slot, gonna try using a old sun pci quad port 10/100 network card which works in another pc running 1.2.3.rc2 version of pfsense.  At least this should prove the X500 motherboard doesnt have issues controlling acpi/dma/interrupts of network cards.

    Here is a pic of the sun card
    http://www.sun.com/products/networking/ethernet/sunquadfastethernet/images/I1_hw_quadfastether_pci_i.jpg



  • Debian net install works on the X500, now just need to find a way to overlay all the pfsense extras on the base Debian install :)

    Using the same switch/cable/client the debian network driver seems to provide  higher throughput.

    Debian
    iperf -c 192.168.100.2 -p 5010 -t 60

    
    ------------------------------------------------------------
    [ ID] Interval       Transfer     Bandwidth
    [108]  0.0-60.0 sec   602 MBytes  84.1 Mbits/sec
    
    

    Freebsd7.2/Pfsense 1.2.3rc2

    iperf -c 192.168.100.2 -p 5010  -t 60

    
    ------------------------------------------------------------
    [ ID] Interval       Transfer     Bandwidth
    [108]  0.0-60.0 sec   465 MBytes  65.0 Mbits/sec
    
    

    Wish i knew why this watchdog issue happens to some X500's device more then others.



  • I'm having exactly the same on my X500 too.

    re0: watchdog timeout
    re0: watchdog timeout
    re0: watchdog timeout
    re0: watchdog timeout
    re0: watchdog timeout

    I get them on all ports, internal, external whether connected to a switch/cable modem etc etc, nothing makes a difference.  Its a shame, the Firebox running pfSense is really good except for the watchdog timeouts!



  • i only get the odd occassional timeout on my x500 since i've upgraded. is the updated code in the new embedded version? i'm running embedded 1.2.3-rc2 and fancy moving over to the new embedded but don't wreck what appears to be a stable install.
    of the timeouts i get, they are generally when i'm playing about in the web interface. there's no timeouts if i leave it alone



  • spoke too soon. still getting them but no where near as much. does the new nanobsd embedded have the patch installed?



  • I am planning to install debian on my x500 and use it as a "LAMP" server.

    How was the net install conducted? Did you manage to get a keyboard to work as I'm having no luck following diagrams on another topic.

    Cheers,

    Andy



  • Wondering if there's any progress/updates here?  I've got two different Firebox x700s that both display the watchdog timeouts on re0 (my LAN port).  I was originally running 1.2.3 RC2 and upgraded to the latest firmware in the 1.2.X snapshots.

    loki - care to elaborate on how you prepped your firebox for a netboot install of debian?



  • loki - care to elaborate on how you prepped your firebox for a netboot install of debian?

    install a base debian from a net install cd on a normal pc. edited /etc/fstab and set the serial port for console access, pop the drive back into the firebox.

    Overall wasnt very happy with the older firebox hardware, the network cards just dont seem to have great support with bsd.

    I am now running the following jetway with 2g of mem and 1.2.3rc2, pretty happy with it.

    xxxx://www.newegg.com/Product/Product.aspx?Item=N82E16856107059



  • I know it's been a while, but is there any progress on this? ???



  • I've been getting the same errors as everyone in this thread, using two fireboxes, an x500 as transparent firewall and an x700 as router/firewall. Like Spy Alelo, I'm also curious to see if there has been any progress on this and if there is perhaps something new that we can test/patch.



  • Call me crazy, but I removed the crypto card to test a mini PCI WiFi card, and have not had a single timeout while messing with the GUI. I removed the WiFi card anyway, since it wasnt supported, and still no timeouts. I have not upgraded the firmware, still using 1.2.3 release nor changed any settings.

    Again, it may have been a fluke, but I will keep testing. The only thing that may make any sense, is that the crypto card was in some way being used for SSL on the WebGUI (for which I do have SSL enabled), and there may be some compatibility issue between it and the Realtek interfaces. I mean, seriously, I download over 60GB of data a month using torrents, not a single issue. Also use a VoIP phone non-stop sustaining a VPN connection while using a web based ticketing system 5 days a week, 8 hours a day and never get a single drop or a timeout. It only happens when I access the WebGUI within the first two minutes. And not a single timeout after removing the card? Can anyone else experiment and confirm this?


Locked