Em0 goes down, then I get Watchdog timout with filter reset



  • I saw the post with someone having Watchdog timeouts with Realtek adapters. I am getting similar issues, but with Intel NIC's. I have an onboard NIC that I am using for WAN, and a PCIe card for the LAN side.

    I am running 2.2-RELEASE and completely updated.

    This was a log from a half hour ago (newest on top)

    Mar 17 00:35:08 php-fpm[20878]: /index.php: Successful login for user 'admin' from: 192.168.0.12
    Mar 17 00:35:08 php-fpm[20878]: /index.php: Successful login for user 'admin' from: 192.168.0.12
    Mar 17 00:35:02 php-fpm[20878]: /index.php: webConfigurator authentication error for 'admin' from 192.168.0.12
    Mar 17 00:35:02 php-fpm[20878]: /index.php: webConfigurator authentication error for 'admin' from 192.168.0.12
    Mar 17 00:21:08 check_reload_status: Reloading filter
    Mar 17 00:21:08 php-fpm[74788]: /rc.newwanip: rc.newwanip: on (IP address: x.x.x.x) (interface: WAN[wan]) (real interface: em0).
    Mar 17 00:21:08 php-fpm[74788]: /rc.newwanip: rc.newwanip: Info: starting on em0.
    Mar 17 00:21:07 check_reload_status: rc.newwanip starting em0
    Mar 17 00:21:07 php-fpm[74788]: /rc.linkup: Hotplug event detected for WAN(wan) but ignoring since interface is configured with static IP (x.x.x.x )
    Mar 17 00:21:06 kernel: em0: link state changed to UP
    Mar 17 00:21:06 check_reload_status: Linkup starting em0
    Mar 17 00:21:05 php-fpm[74788]: /rc.linkup: Hotplug event detected for WAN(wan) but ignoring since interface is configured with static IP (x.x.x.x )
    Mar 17 00:21:04 check_reload_status: Linkup starting em0
    Mar 17 00:21:04 kernel: em0: link state changed to DOWN
    Mar 17 00:21:04 kernel: em0: TX(0) desc avail = 31,Next TX to Clean = 325
    Mar 17 00:21:04 kernel: em0: Queue(0) tdh = 325, hw tdt = 294
    Mar 17 00:21:04 kernel: em0: Watchdog timeout – resetting
    Mar 17 00:20:47 bandwidthd: Previouse graphing run not complete... Skipping current run
    Mar 17 00:20:47 bandwidthd: Previouse graphing run not complete... Skipping current run
    Mar 17 00:19:09 bandwidthd: DNS timeout for 192.168.0.11: This problem reduces graphing performance
    Mar 17 00:19:08 bandwidthd: DNS timeout for 192.168.0.11: This problem reduces graphing performance
    Mar 17 00:17:37 bandwidthd: DNS timeout for 192.168.0.11: This problem reduces graphing performance
    Mar 17 00:17:36 bandwidthd: DNS timeout for 192.168.0.11: This problem reduces graphing performance
    Mar 17 00:16:44 php-fpm[24944]: /interfaces.php: Creating rrd update script
    Mar 17 00:16:44 check_reload_status: Reloading filter
    Mar 17 00:16:43 check_reload_status: Reloading filter
    Mar 17 00:16:43 php-fpm[74788]: /rc.newwanip: rc.newwanip: on (IP address: x.x.x.x) (interface: WAN[wan]) (real interface: em0).
    Mar 17 00:16:43 php-fpm[74788]: /rc.newwanip: rc.newwanip: Info: starting on em0.
    Mar 17 00:16:42 check_reload_status: rc.newwanip starting em0
    Mar 17 00:16:42 php-fpm[74788]: /rc.linkup: Hotplug event detected for WAN(wan) but ignoring since interface is configured with static IP (x.x.x.x )
    Mar 17 00:16:41 kernel: em0: link state changed to UP
    Mar 17 00:16:41 check_reload_status: Linkup starting em0
    Mar 17 00:16:40 check_reload_status: updating dyndns wan
    Mar 17 00:16:39 php-fpm[74788]: /rc.linkup: Hotplug event detected for WAN(wan) but ignoring since interface is configured with static IP (x.x.x.x )
    Mar 17 00:16:38 check_reload_status: Restarting ipsec tunnels
    Mar 17 00:16:38 php-fpm[24944]: /interfaces.php: ROUTING: setting default route to x.x.x.y
    Mar 17 00:16:38 kernel: em0: link state changed to DOWN
    Mar 17 00:16:38 check_reload_status: Linkup starting em0

    Then while trying to post this I got (again newest on top):

    Mar 17 00:50:56 check_reload_status: Reloading filter
    Mar 17 00:50:56 php-fpm[2495]: /rc.newwanip: rc.newwanip: on (IP address: 97.76.50.156) (interface: WAN[wan]) (real interface: em0).
    Mar 17 00:50:56 php-fpm[2495]: /rc.newwanip: rc.newwanip: Info: starting on em0.
    Mar 17 00:50:55 check_reload_status: rc.newwanip starting em0
    Mar 17 00:50:55 php-fpm[2495]: /rc.linkup: Hotplug event detected for WAN(wan) but ignoring since interface is configured with static IP (97.76.50.156 )
    Mar 17 00:50:54 kernel: em0: link state changed to UP
    Mar 17 00:50:54 check_reload_status: Linkup starting em0
    Mar 17 00:50:53 php-fpm[2495]: /rc.linkup: Hotplug event detected for WAN(wan) but ignoring since interface is configured with static IP (97.76.50.156 )
    Mar 17 00:50:52 check_reload_status: Linkup starting em0
    Mar 17 00:50:52 kernel: em0: link state changed to DOWN
    Mar 17 00:50:52 kernel: em0: TX(0) desc avail = 31,Next TX to Clean = 968
    Mar 17 00:50:52 kernel: em0: Queue(0) tdh = 968, hw tdt = 937
    Mar 17 00:50:52 kernel: em0: Watchdog timeout – resetting

    If it is like the other post is it possible there is a driver issue, or some tweaks that need to be done? Cables? The hardware is brand new, so I don't think I have bad hardware.


  • Netgate Administrator

    What hardware are you running?

    Check the various em sysctl error counters. Try disabling MSI/MSI-X.

    Steve



  • Excuse my newb-atood. I have never created a loader.conf.local file but I am sure I can figure it out. In regards to the sysctrl error counters, where are they?

    I have a small formfactor desktop running it, 4GB RAM, Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz.

    I just did an update to 2.2.1 and now I am noticing it is running the i386 package. So perhaps a move over to the x64 version is in order.



  • I was able to add the two lines to my loader.conf file. Then restarted.

    Is there any way to check if MSI is disable correctly?

    I just added:
    hw.pci.enable_msix=0
    hw.pci.enable_msi=0

    to the loader.conf file in Diagnostics->Edit File, then restarted.



  • Did a fresh install of the x64 version, upgraded to 2.2.1 and disabled MSI/MSIX in loader.conf. So far so good, its only been a couple of hours though.


  • Netgate Administrator

    Yes, run 64bit if your system is capable of it.
    The file loader.conf is overwritten on a firmware update and is changed by various settings in the gui so you should use loader.conf.local. It doesn't exist so you need to create it. You could for example do:

    echo 'hw.pci.enable_msix=0' > /boot/loader.conf.local
    

    Either at the command line or in the Diagnostics > Command prompt box. Then edit it to add the other line(s).

    The em counters are accessed using sysctl from the command line. For example:

    [2.2-RELEASE][root@xtm5.localdomain]/root: sysctl dev.em.0
    dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.4.2
    dev.em.0.%driver: em
    dev.em.0.%location: slot=0 function=0
    dev.em.0.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x8086 subdevice=0x0000 class=0x020000
    dev.em.0.%parent: pci2
    dev.em.0.nvm: -1
    dev.em.0.debug: -1
    dev.em.0.fc: 3
    dev.em.0.rx_int_delay: 0
    dev.em.0.tx_int_delay: 66
    dev.em.0.rx_abs_int_delay: 66
    dev.em.0.tx_abs_int_delay: 66
    dev.em.0.itr: 488
    dev.em.0.rx_processing_limit: 100
    dev.em.0.eee_control: 1
    dev.em.0.link_irq: 0
    dev.em.0.mbuf_alloc_fail: 0
    dev.em.0.cluster_alloc_fail: 0
    dev.em.0.dropped: 0
    dev.em.0.tx_dma_fail: 0
    dev.em.0.rx_overruns: 0
    dev.em.0.watchdog_timeouts: 0
    dev.em.0.device_control: 1049160
    dev.em.0.rx_control: 0
    dev.em.0.fc_high_water: 18432
    dev.em.0.fc_low_water: 16932
    dev.em.0.queue0.txd_head: 0
    dev.em.0.queue0.txd_tail: 0
    dev.em.0.queue0.tx_irq: 0
    dev.em.0.queue0.no_desc_avail: 0
    dev.em.0.queue0.rxd_head: 0
    dev.em.0.queue0.rxd_tail: 0
    dev.em.0.queue0.rx_irq: 0
    dev.em.0.mac_stats.excess_coll: 0
    dev.em.0.mac_stats.single_coll: 0
    dev.em.0.mac_stats.multiple_coll: 0
    dev.em.0.mac_stats.late_coll: 0
    dev.em.0.mac_stats.collision_count: 0
    dev.em.0.mac_stats.symbol_errors: 0
    dev.em.0.mac_stats.sequence_errors: 0
    dev.em.0.mac_stats.defer_count: 0
    dev.em.0.mac_stats.missed_packets: 0
    dev.em.0.mac_stats.recv_no_buff: 0
    dev.em.0.mac_stats.recv_undersize: 0
    dev.em.0.mac_stats.recv_fragmented: 0
    dev.em.0.mac_stats.recv_oversize: 0
    dev.em.0.mac_stats.recv_jabber: 0
    dev.em.0.mac_stats.recv_errs: 0
    dev.em.0.mac_stats.crc_errs: 0
    dev.em.0.mac_stats.alignment_errs: 0
    dev.em.0.mac_stats.coll_ext_errs: 0
    dev.em.0.mac_stats.xon_recvd: 0
    dev.em.0.mac_stats.xon_txd: 0
    dev.em.0.mac_stats.xoff_recvd: 0
    dev.em.0.mac_stats.xoff_txd: 0
    dev.em.0.mac_stats.total_pkts_recvd: 0
    dev.em.0.mac_stats.good_pkts_recvd: 0
    dev.em.0.mac_stats.bcast_pkts_recvd: 0
    dev.em.0.mac_stats.mcast_pkts_recvd: 0
    dev.em.0.mac_stats.rx_frames_64: 0
    dev.em.0.mac_stats.rx_frames_65_127: 0
    dev.em.0.mac_stats.rx_frames_128_255: 0
    dev.em.0.mac_stats.rx_frames_256_511: 0
    dev.em.0.mac_stats.rx_frames_512_1023: 0
    dev.em.0.mac_stats.rx_frames_1024_1522: 0
    dev.em.0.mac_stats.good_octets_recvd: 0
    dev.em.0.mac_stats.good_octets_txd: 0
    dev.em.0.mac_stats.total_pkts_txd: 0
    dev.em.0.mac_stats.good_pkts_txd: 0
    dev.em.0.mac_stats.bcast_pkts_txd: 0
    dev.em.0.mac_stats.mcast_pkts_txd: 0
    dev.em.0.mac_stats.tx_frames_64: 0
    dev.em.0.mac_stats.tx_frames_65_127: 0
    dev.em.0.mac_stats.tx_frames_128_255: 0
    dev.em.0.mac_stats.tx_frames_256_511: 0
    dev.em.0.mac_stats.tx_frames_512_1023: 0
    dev.em.0.mac_stats.tx_frames_1024_1522: 0
    dev.em.0.mac_stats.tso_txd: 0
    dev.em.0.mac_stats.tso_ctx_fail: 0
    dev.em.0.interrupts.asserts: 0
    dev.em.0.interrupts.rx_pkt_timer: 0
    dev.em.0.interrupts.rx_abs_timer: 0
    dev.em.0.interrupts.tx_pkt_timer: 0
    dev.em.0.interrupts.tx_abs_timer: 0
    dev.em.0.interrupts.tx_queue_empty: 0
    dev.em.0.interrupts.tx_queue_min_thresh: 0
    dev.em.0.interrupts.rx_desc_min_thresh: 0
    dev.em.0.interrupts.rx_overrun: 0
    
    

    Steve



  • Hello Stephen,

    That are good instructions for someone a little green like I am.

    I created the loader.conf.local file with the script you provided and the went into "Edit File" and added the second line. I removed the lines from loader.conf then rebooted

    I also ran sysctrl dev.em.0 from the command prompt. The packet information is different for somewhat obvious reasons. But other than that, the only section the is radically different is pasted below.

    dev.em.0.watchdog_timeouts: 9
    dev.em.0.device_control: 1477444160
    dev.em.0.rx_control: 67141634
    dev.em.0.fc_high_water: 8192
    dev.em.0.fc_low_water: 6692
    dev.em.0.queue0.txd_head: 466
    dev.em.0.queue0.txd_tail: 466
    dev.em.0.queue0.tx_irq: 0
    dev.em.0.queue0.no_desc_avail: 0
    dev.em.0.queue0.rxd_head: 601
    dev.em.0.queue0.rxd_tail: 600
    

    You have all zeros there. I dont know if this is just different usage in our configurations, it must clear out when I clear my logs out. I know I have had more than 9 watchdog timeouts.



  • I was able to get this from pciconf -lvbc:

    em0@pci0:0:25:0:	class=0x020000 card=0xb049144d chip=0x10bd8086 rev=0x02 hdr=0x00
        class      = network
        subclass   = ethernet
        bar   [10] = type Memory, range 32, base 0xfc480000, size 131072, enabled
        bar   [14] = type Memory, range 32, base 0xfc4a5000, size 4096, enabled
        bar   [18] = type I/O Port, range 32, base 0x1820, size 32, enabled
        cap 01[c8] = powerspec 2  supports D0 D3  current D0
        cap 05[d0] = MSI supports 1 message, 64 bit 
        cap 09[e0] = vendor (length 6) Intel cap 2 version 0
    

    Looks like card supports MSI but not MSI-X

    From my LAN side adapter:

    em1@pci0:5:0:0:	class=0x020000 card=0xa01f8086 chip=0x10d38086 rev=0x00 hdr=0x00
        class      = network
        subclass   = ethernet
        bar   [10] = type Memory, range 32, base 0xfc120000, size 131072, enabled
        bar   [14] = type Memory, range 32, base 0xfc180000, size 524288, enabled
        bar   [18] = type I/O Port, range 32, base 0x2000, size 32, enabled
        bar   [1c] = type Memory, range 32, base 0xfc100000, size 16384, enabled
        cap 01[c8] = powerspec 2  supports D0 D3  current D0
        cap 05[d0] = MSI supports 1 message, 64 bit 
        cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
                     speed 2.5(2.5) ASPM disabled(L0s/L1)
        cap 11[a0] = MSI-X supports 5 messages
                     Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
        ecap 0001[100] = AER 1 0 fatal 1 non-fatal 3 corrected
        ecap 0003[140] = Serial 1 6805caffff2cea4b
    

    Showing it supports MSI and MSI-X.

    If the problem is MSI-X it would explain why I only get all of my issues (watchdog, hotplug, flap) on the WAN side.

    Perhaps I should enable MSI and disable MSI-X? Maybe I will try switching the cables and then reassigning the interfaces. I would then have the card that supports MSI-X on the WAN side, and to see if the errors jump over to the LAN side.

    I currently have both MSI and MSI-X disabled in loader.conf.local. I had them disabled in loader.conf, but moved them and a MBUF reassignment over to the local file and restarted. I did check by running sysctl hw.pci and they are both disabled.


  • Netgate Administrator

    Try swapping the NIC assignments so the msi-x capable card is WAN.

    Steve



  • I have ordered a dual PRO/1000 Pcie card to eliminate the onboard controller which must be older.

    In the meantime, while I wait for it to arrive, I will switch the interfaces when I get to work in the morning. It is still throwing watchdog errors right as of now, with both MSI an MSI-X disabled. But it is only throwing them on the WAN side. We will see what happens when I put the newer adapter on the WAN side. I suspect something is wrong with either the onboard NIC or the driver for the NIC.



  • Swapping the interfaces didn't work, actually got a little worse. I now know it is specifically that onboard adapter. When you get the timeouts and the interface is assigned to LAN, you get locked out of the web admin until you either restart web admin via the firewalls command menu, or restart the whole box.

    I flipped them back and will have to wait for my dual nic PCI-e card to come in.

    I think the onboard adapter is a little older than the PCI-e card I put in. They both want to use the em driver.

    Any other ideas for future searchers? I read somewhere perhaps connections to Gigabit devices can overwhelm certain adapters, I don't know if there is any truth to that though.


  • Netgate Administrator

    Not off hand. I would be searching the FreeBSD mailing list and forum using the details from the specific adapter given by pciconf.

    Steve


Log in to reply