LAN adapter watchdog timeout when running heavy load on pfSense



  • Hi

    given that pfSense is a real great product we decided to switch all our firewalls from IPCop and MonoWall to pfSense. The installation went smooth!

    Version: 1.0.1

    Environment:

    VIA EPIA-EK1000G Mini ITX with to LAN Cards (vr0 and vr1) integrated
    one additional Linksys LNE100TX (dc0)

    Problem:

    when running a larger traffic (e.g. Filetransfer) across the firewall (LAN<->WAN = about 40Mbps) than we get suddently

    #kernel vr0: watchdog timeout

    The problem can occur after 5 Seconds or 20 Minutes

    Tests 1:

    I changed vr0 (LAN) against the dc0 card

    Result: #kernel dc0: watchdog timeout

    Test 2:

    I assumed that the LAN environment could lead to an error. I tested several times the load test.

    Result: #kernel vr1: watchdog timeout

    That means that the WAN side disappeared

    Test 3:

    using only 2 external LAN-Cards (with an Adapter to the Mini ITX Board = Total 4 LAN Boards)

    Result: #kernel em0: watchdog timeout

    Additional hints:

    • we do have this Problem since using pfSense = we started a few monthes ago with RC 1
    • we had this problem as well with the older version of the VIA Mini ITX Board (800MHz)

    would be great if anybody had a hint for me!

    regards

    Günther



  • Hi again

    at least a little hint what I should test, replace, update?
    Does nobody had ever such problems?

    regards

    Günther



  • I haven't noticed this yet, not even with a wrap when heavily overloading it. What cpu load do you see when doing that loadtest? Is the webgui/ssh still responsive during the test? Also do problems appear when these messages show up or does everything still work like expected besides the log entries?



  • Found a post on this http://forum.ev1servers.net/showthread.php?t=55633
    It suggests the following.

    a) turn ACPI support OFF
    b) turn PNP OS support OFF [in bios]

    if possible

    c) change the NIC to some other brand like 3com or Intel.



  • work like expected besides the log entries?

    the system does fail and has to be rebooted. If the WAN network cards fails than it's still possible to access the Firewall via Web-GUI or SSH. The console does work in all cases

    a) turn ACPI support OFF
    b) turn PNP OS support OFF [in bios]
    if possible
    c) change the NIC to some other brand like 3com or Intel.

    thank you for the link. At the end of this thread I found

    Disabling ACPI did not help.
    Dedicating IRQ for the NIC did not help.
    Changing the hardware (nic/mb) did help, havent seen the error for a week now. So 'watchdog timeouts' is mostly hardware related problem.

    I think I will give a try to the last solution

    regards

    Günther



  • @gschoch:

    Problem:

    when running a larger traffic (e.g. Filetransfer) across the firewall (LAN<->WAN = about 40Mbps) than we get suddently

    #kernel vr0: watchdog timeout

    The problem can occur after 5 Seconds or 20 Minutes

    Im having problems similar to this. K6-500, 415MB ram. Regardless of what NICs I try (compaq dual-port server NIC, or generic cheapos), Pfsense locks up when I get close to 10-15Mbps, and occasionally at lower speeds with heavy use such as uTorrent. I suspect it may be a problem with the actual computer running pfsense (bad ram, bad cpu??). I ran Pfsense under VMware, on a 1.8Ghz "3000+" amd computer. Performance as far as throughput goes was no better than the K6-500, but I havent been able to reproduce any of the crashing I was experiancing.

    gschoch, do you have any way of monitoring temperatures (CPU, memory etc) in the routers? Ive had issues before with other PC's that wernt getting good ventilation, and hanging/crashing when getting well over 60-70+C.



  • …, do you have any way of monitoring temperatures (CPU, memory etc) in the routers?
    Ive had issues before with other PC's that wernt getting good ventilation,
    and hanging/crashing when getting well over 60-70+C.

    thank you for the answer. Yes we monitored and have about 6 pfSense boxes (but the same age or cooling or memory). All of them do have in common that they are based on VIA Mini ITX boards. All other components are different. But in all cases we do end up with this timeout errors. We will test now another set of NICs and if this does not help the Mini ITX boards have to be switched to another product.

    regards

    Günther



  • Hi

    just for others with the same problem:

    we switched to a D-Link DFE-580TX Networkadapter with 4 Ports and since then we had nomore any watchdog timeout

    regards

    Günther



  • This is problem FreeBSD v. 6.1
    Im using two card 3Com 3C905CX-TX-M, disable ACPI, disable sound blaster, change network cards, disable PNP OS, disable COM, LPT, change VGA card.
    No change anything.

    Im using pfSense v. 1.0.1:
    (filtered…)

    Dec 8 05:24:45 kernel: xl1: watchdog timeout
    Dec 7 19:26:13 kernel: xl1: watchdog timeout
    Dec 7 16:03:00 kernel: xl1: watchdog timeout
    Dec 7 11:15:14 kernel: xl1: watchdog timeout
    Dec 7 09:18:26 kernel: xl1: watchdog timeout
    Dec 7 03:43:00 kernel: xl1: watchdog timeout
    Dec 7 03:40:29 kernel: xl1: watchdog timeout
    Dec 7 03:35:51 kernel: xl1: watchdog timeout
    Dec 6 20:34:35 kernel: xl1: watchdog timeout
    Dec 6 18:29:09 kernel: xl1: watchdog timeout
    Dec 6 17:15:44 kernel: xl1: watchdog timeout
    Dec 6 16:27:28 kernel: xl0: watchdog timeout
    Dec 6 13:36:52 kernel: xl1: watchdog timeout
    Dec 5 17:29:39 kernel: xl1: watchdog timeout
    Dec 5 15:12:16 kernel: xl1: watchdog timeout
    Dec 5 12:34:30 kernel: xl1: watchdog timeout
    Dec 5 10:43:28 kernel: xl1: watchdog timeout
    Dec 5 10:39:41 kernel: xl1: watchdog timeout



  • for reference, broadcom based integrated chips tend to watchdog timeout under VERY heavy loads but they reset and go on with life. I've seen this happen on AMD Opteron and Intel Xeon platforms with multiple broadcom chips. I just use Intel NICS for everything critical and I don't have issues.

    As for 3com's the 3c905b's were perhaps the best 10/100 NIC's ever built in my opinion. But when they did the die shrink and built the 3c905c's they were terrible. I even have problems with those in Windows boxes and Linux boxes!

    My recommendation would be to stay away from newer 3com cards, period. There's a reason Intel has outsold, and blown them out of the water in the NIC market. Besides, Intel continuously contributes code for their NIC drivers. So they just 'work' T.M.



  • Upgrade to http://www.pfsense.com/~sullrich/1.0.1-SNAPSHOT-12-06-2006/ and see if the problems persist.



  • I had the same problem earlier today.

    Dec 9 14:57:03 kernel: sk0: link state changed to UP
    Dec 9 14:33:46 kernel: arplookup 231.57.128.57 failed: host is not on local network
    Dec 9 14:08:17 php: : Hotplug event detected for sk0 but ignoring since interface is not set for DHCP
    Dec 9 14:08:17 check_reload_status: rc.linkup starting
    Dec 9 14:08:14 kernel: sk0: link state changed to DOWN
    Dec 9 14:08:14 kernel: sk0: watchdog timeout
    Dec 9 13:59:43 kernel: arp: 172.17.17.254 is on sk0 but got reply from 00:0b:db:65:8a:91 on em0
    Dec 9 12:58:09 kernel: arp: 172.17.17.241 is on sk0 but got reply from 00:11:43:11:92:8b on em0
    Dec 9 12:38:17 last message repeated 4 times
    Dec 9 12:37:53 kernel: arp: 172.17.17.252 is on sk0 but got reply from 00:13:20:00:cb:cf on em0
    Dec 9 12:36:57 last message repeated 9 times
    Dec 9 12:36:37 kernel: arp: 172.17.17.241 is on sk0 but got reply from 00:11:43:11:92:8b on em0
    Dec 9 12:21:55 kernel: em0: watchdog timeout – resetting
    Dec 9 12:21:33 last message repeated 9 times
    Dec 9 12:19:44 kernel: arp: 172.17.17.241 is on sk0 but got reply from 00:11:43:11:92:8b on em0
    Dec 9 12:02:09 last message repeated 4 times
    Dec 9 12:02:01 kernel: arp: 172.17.17.241 is on sk0 but got reply from 00:11:43:11:92:8b on em0
    Dec 9 11:46:48 last message repeated 5 times
    Dec 9 11:46:23 kernel: arp: 172.17.17.252 is on sk0 but got reply from 00:13:20:00:cb:cf on em0
    Dec 9 11:45:39 kernel: arp: 172.17.17.247 is on sk0 but got reply from 00:06:5b:f3:2a:4b on em0

    Intel Pro/1000 MT and 3COM 3C2000-T NIC's

    I have just updated my box with your snaphsot link and also went into BIOS and disabled all unnecessary hardware.  I hope this fixes it.



  • I am upgrade pfSense 1.0.1 to 1.0.1-SNAPSHOT-12-08-2006, new install 1.0.1-SNAPSHOT-12-08-2006, but have same problem.

    …filtered...
    Dec 10 17:48:43 kernel: xl1: watchdog timeout
    Dec 10 17:19:51 kernel: xl0: watchdog timeout
    Dec 10 17:01:23 kernel: xl1: watchdog timeout
    Dec 10 16:36:45 kernel: xl1: watchdog timeout
    Dec 10 15:13:29 kernel: xl1: watchdog timeout
    Dec 10 12:28:23 kernel: xl1: watchdog timeout
    Dec 10 12:18:24 kernel: xl1: watchdog timeout



  • If you start receiving these messages regularly, read below:

    FreeBSD network device drivers (dc, xl, sk etc.) utilize watchdog timers to keep some statistics about the device driver and to tackle deadlock situations that may arise because of hardware issues. This timer is set to some specific value by the device driver, and decremented by one once a second. If the timer expires and the network adapter
    did not finish its job, the watchdog routine for this adapter is run.

    Time watchdog timer routine simply prints a diagnostics message that is visible through /var/log/messages and restarts the network adapter.

    Briefly, you can try the following to get rid of the problem:

    Many PCI network adapters require a PCI slot that supports Bus Mastering. Some old motherboards
    have this feature on only on their first PCI slot (pci0). So, plug your network adapter to the first
    pci slot on your mainboard and see if that helps.



  • Hi, support this networl card D-Link DFE-580TX Traffic shaper?



  • don't steal a topic
    post it in the trafic shaper section of the forum
    if you had searched there you had found this post that will tell you witch interfaces suport alt q
    http://forum.pfsense.org/index.php/topic,1686.msg9789.html#msg9789



  • Hi, exists patch for FreeBSD v. 6.1 for watchdog timeouts (Bus mastering PCI)?



  • Hi, i am change 3Com card 3C905X to Planet with Realtek chipset (ENW-9503A) and disable acpi in to pfSense v. 1.0.1 (hint.acpi.0.disabled=1) and works perfectly!!!!!


Locked