Re0: Watchdog timeout ONLY on WAN interface



  • Hello,

    I have a PCIE X1 Dual Realtek 8111E card, in addition to an onboard Realtek 10/100 NIC in my PFSense router. One of the 8111E's and the 10/100 are in a bridge as LAN, the other is 8111E is WAN. Every so often the WAN interface (re0, the 1st of the dual 8111E's) has a watchdog timout, and goes down for about 30 seconds. EVERY time it happens only to re0, not to any of the other NICs, even other 8111E on the same PCB. I tried disabling all sorts of options in the gui for hardware checksums and anything that would compensate for a cruddy driver.

    Assuming a hardware issue, I swapped re0 (WAN, 8111E #1), for re1 (LAN, 8111E #2) so re1 is the new WAN and re0 is LAN. After a few hours, I get watchdog timeout ON RE1. Meaning, it's not the physical NIC, it's whatever interface I assign to WAN. I don't think it's a matter of there being too much traffic on funnelling through the WAN, since I can have 110MB/s transfers on LAN without a hitch, or 16-player LAN games. I posted the log (it's basically the same every time) when it drops. There are ~200 arpresolve: can't allocate llinfo messages in the middle, I left them out. If anybody knows what the issue is, I'll give you a kidney.

    Mar 14 21:25:01 kernel: re1: watchdog timeout
    Mar 14 21:25:01 kernel: re1: link state changed to DOWN
    Mar 14 21:25:01 check_reload_status: Linkup starting re1
    Mar 14 21:25:02 php-fpm[19392]: /rc.linkup: DEVD Ethernet detached event for wan
    Mar 14 21:25:04 kernel: arpresolve: can't allocate llinfo for 99.253.44.1 on re1
    Mar 14 21:25:04 kernel: arpresolve: can't allocate llinfo for 99.253.44.1 on re1
    Mar 14 21:25:04 kernel: arpresolve: can't allocate llinfo for 99.253.44.1 on re1
    Mar 14 21:25:04 check_reload_status: Linkup starting re1
    Mar 14 21:25:04 kernel: arpresolve: can't allocate llinfo for 99.253.44.1 on re1
    Mar 14 21:25:04 kernel: arpresolve: can't allocate llinfo for 99.253.44.1 on re1
    Mar 14 21:25:04 kernel: re1: link state changed to UP
    Mar 14 21:25:04 kernel: arpresolve: can't allocate llinfo for 99.253.44.1 on re1
    Mar 14 21:25:05 kernel: arpresolve: can't allocate llinfo for 99.253.44.1 on re1
    Mar 14 21:25:05 php-fpm[57751]: /rc.linkup: DEVD Ethernet attached event for wan
    Mar 14 21:25:05 kernel: arpresolve: can't allocate llinfo for 99.253.44.1 on re1
    Mar 14 21:25:05 kernel: arpresolve: can't allocate llinfo for 99.253.44.1 on re1
    Mar 14 21:25:05 php-fpm[57751]: /rc.linkup: HOTPLUG: Configuring interface wan
    Mar 14 21:25:05 kernel: arpresolve: can't allocate llinfo for 99.253.44.1 on re1
    Mar 14 21:25:06 kernel: arpresolve: can't allocate llinfo for 99.253.44.1 on re1
    Mar 14 21:25:06 check_reload_status: rc.newwanip starting re1
    Mar 14 21:25:06 php-fpm[57751]: /rc.linkup: ROUTING: setting default route to 99.253.44.1
    Mar 14 21:25:06 check_reload_status: Restarting ipsec tunnels
    Mar 14 21:25:07 php-fpm[66203]: /rc.newwanip: rc.newwanip: Info: starting on re1.
    Mar 14 21:25:07 php-fpm[66203]: /rc.newwanip: rc.newwanip: on (IP address: 99.253.44.162) (interface: WAN[wan]) (real interface: re1).
    Mar 14 21:25:09 php-fpm[57751]: /rc.linkup: The command '/usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid bridge0' returned exit code '1', the output was 'Internet Systems Consortium DHCP Server 4.2.6 Copyright 2004-2014 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Wrote 0 deleted host decls to leases file. Wrote 0 new dynamic host decls to leases file. Wrote 9 leases to leases file. Listening on BPF/bridge0/02:f3:40:d2:5b:00/192.168.1.0/24 Sending on BPF/bridge0/02:f3:40:d2:5b:00/192.168.1.0/24 Can't bind to dhcp address: Address already in use Please make sure there is no other dhcp server running and that there's no entry for dhcp or bootp in /etc/inetd.conf. Also make sure you are not running HP JetAdmin software, which includes a bootp server. If you did not get this software from ftp.isc.org, please get the latest from ftp.isc.org and install that before requesting help. I
    Mar 14 21:25:09 check_reload_status: updating dyndns wan
    Mar 14 21:25:09 php-fpm[66203]: /rc.newwanip: ROUTING: setting default route to 99.253.44.1
    Mar 14 21:25:11 php-fpm[66203]: /rc.newwanip: The command '/usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid bridge0' returned exit code '1', the output was 'Internet Systems Consortium DHCP Server 4.2.6 Copyright 2004-2014 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Wrote 0 deleted host decls to leases file. Wrote 0 new dynamic host decls to leases file. Wrote 9 leases to leases file. Listening on BPF/bridge0/02:f3:40:d2:5b:00/192.168.1.0/24 Sending on BPF/bridge0/02:f3:40:d2:5b:00/192.168.1.0/24 Can't bind to dhcp address: Address already in use Please make sure there is no other dhcp server running and that there's no entry for dhcp or bootp in /etc/inetd.conf. Also make sure you are not running HP JetAdmin software, which includes a bootp server. If you did not get this software from ftp.isc.org, please get the latest from ftp.isc.org and install that before requesting help.
    Mar 14 21:25:11 php-fpm[66203]: /rc.newwanip: Resyncing OpenVPN instances for interface WAN.
    Mar 14 21:25:11 php-fpm[66203]: /rc.newwanip: Creating rrd update script
    Mar 14 21:25:13 php-fpm[66203]: /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 99.253.44.162 -> 99.253.44.162 - Restarting packages.
    Mar 14 21:25:13 check_reload_status: Starting packages
    Mar 14 21:25:14 php-fpm[66584]: /rc.start_packages: Restarting/Starting all packages.



  • bump this.

    have been getting "kernel: arpresolve: can't allocate llinfo for" logs as well.
    tried reinstalling. reverted to 2.1.5. clear state table.

    all in vain

    Arthur



  • Here is an interesting thread from FreeNAS about the Realtek 8111 stuff:
    https://bugs.freenas.org/issues/1850
    At the end someonehas reported a driver put out by Realtek themselves that they say works.
    That thread has reports from various versions of FreeNAS on FreeBSD. To see if any of that applies, someone would have to get the Realtek driver code for FreeBSD 8.3 (pfSense 2.1.5) or FreeBSD 10.1 (pfSense 2.2), build it on the matching FreeBSD version/architecture, transfer the results to their pfSense and see if things improve.

    And if Realtek has provided a working driver themselves, why is it not contributed to and incorporated in FreeBSD, instead of whatever driver is there now?



  • @phil.davis:

    Here is an interesting thread from FreeNAS about the Realtek 8111 stuff:
    https://bugs.freenas.org/issues/1850
    At the end someonehas reported a driver put out by Realtek themselves that they say works.
    That thread has reports from various versions of FreeNAS on FreeBSD. To see if any of that applies, someone would have to get the Realtek driver code for FreeBSD 8.3 (pfSense 2.1.5) or FreeBSD 10.1 (pfSense 2.2), build it on the matching FreeBSD version/architecture, transfer the results to their pfSense and see if things improve.

    And if Realtek has provided a working driver themselves, why is it not contributed to and incorporated in FreeBSD, instead of whatever driver is there now?

    I also tried the ndisgen method in 12.5 of the FreeBSD handbook to convert Windows drivers to FreeBSD drivers. With the latest Windows XP x64, Vista x64, and 8.1 x64 drivers on the realtek site and it recognizes them as Windows PE drivers, then gives a syntax error at different lines when it tries to compile.

    The linked driver (which I tried before) was made from the realtek driver on the website, which only says it supports up to FreeBSD 9.x. I put the if_re.ko file in /boot/kernel and added the command to loader.conf, but I don't get the confirmation in dmesg like the post mentioned. I reinstalled, and next reboot we'll see. I tried running kldload on if_re.ko as suggested in the ndisgen guide, but it complains of a kernel mismatch.

    I wonder if PFSense is using old drivers or something, I'm installing re via pkg on my router atm, we'll see how that goes.



  • I just logged in to post about getting Watchdog timeouts. I have Intel adapters, but I am also randomly getting Watchdog Timouts. One of the NICs is onboard, one is PCIe.

    Once I get the Watchdog Timeout, my filter resets, and I temporarily lose connection. I have it set to not reset states, but it does anyhow.



  • @shanis42:

    I just logged in to post about getting Watchdog timeouts. I have Intel adapters, but I am also randomly getting Watchdog Timouts. One of the NICs is onboard, one is PCIe.

    Once I get the Watchdog Timeout, my filter resets, and I temporarily lose connection. I have it set to not reset states, but it does anyhow.

    Same deal here, if there's a checkbox or option in the gui, I've toggled it and still get the same results.

    I ran "pkg install re" and it may have installed/updated drivers for realtek AUDIO chipsets, so I don't know how effective it will be. I rebooted and didn't get anything in dmesg to indicate the if_re.ko module is running. If somebody can dig up a FreeBSD Realtek driver for version 10+ from a repo, that's the next thing I can think of.



  • Given that mine is Intel I am going to post separately, I dont want to hijack this one with my issues/logs when it could be something very different.



  • @kewiha:

    I also tried the ndisgen method

    I wouldn't bother with that for any serious usage, it's highly unlikely to be production-grade.

    @kewiha:

    I wonder if PFSense is using old drivers or something, I'm installing re via pkg on my router atm, we'll see how that goes.

    The re driver we include is what's in stock FreeBSD 10.1. It's actively maintained and current.

    The problems with Realtek NICs and issues along these lines tend to be the driver not having a workaround/"proper" handling for something broken in hardware. Or it could be something unrelated to the NIC itself, like on occasion a USB controller sharing an IRQ with a NIC can cause problems, or BIOS bugs can result in watchdog timeouts. My best guess is the same issue noted in FreeNAS bug 1850, which appears to be a re driver issue of some sort.

    The problem with "watchdog timeout" is there are a slew of old things you'll come across that were driver bugs in years-old versions. As you search on this, it's safe to disregard anything more than a year or two old. Stick with reports on FreeBSD 9.x and 10.x versions, ideally 10.x only though this dates back to earlier versions if it's the same as FreeNAS bug 1850.



  • I literally haven't had a single watchdog timeout on an interface that wasn't set to WAN. Both interfaces I tested are on the same card, but that doesn't explain why they stop misbehaving IFF they aren't WAN. I'll try another mobo when I get the chance, but it's odd that only the WAN interface complains. I'll try setting the 10/100 NIC as WAN too, 100Mbit is better than nothing!


Log in to reply