Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Em0 goes down, then I get Watchdog timout with filter reset

    Scheduled Pinned Locked Moved General pfSense Questions
    12 Posts 2 Posters 3.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S Offline
      stephenw10 Netgate Administrator
      last edited by

      What hardware are you running?

      Check the various em sysctl error counters. Try disabling MSI/MSI-X.

      Steve

      1 Reply Last reply Reply Quote 0
      • S Offline
        shanis42
        last edited by

        Excuse my newb-atood. I have never created a loader.conf.local file but I am sure I can figure it out. In regards to the sysctrl error counters, where are they?

        I have a small formfactor desktop running it, 4GB RAM, Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz.

        I just did an update to 2.2.1 and now I am noticing it is running the i386 package. So perhaps a move over to the x64 version is in order.

        1 Reply Last reply Reply Quote 0
        • S Offline
          shanis42
          last edited by

          I was able to add the two lines to my loader.conf file. Then restarted.

          Is there any way to check if MSI is disable correctly?

          I just added:
          hw.pci.enable_msix=0
          hw.pci.enable_msi=0

          to the loader.conf file in Diagnostics->Edit File, then restarted.

          1 Reply Last reply Reply Quote 0
          • S Offline
            shanis42
            last edited by

            Did a fresh install of the x64 version, upgraded to 2.2.1 and disabled MSI/MSIX in loader.conf. So far so good, its only been a couple of hours though.

            1 Reply Last reply Reply Quote 0
            • stephenw10S Offline
              stephenw10 Netgate Administrator
              last edited by

              Yes, run 64bit if your system is capable of it.
              The file loader.conf is overwritten on a firmware update and is changed by various settings in the gui so you should use loader.conf.local. It doesn't exist so you need to create it. You could for example do:

              echo 'hw.pci.enable_msix=0' > /boot/loader.conf.local
              

              Either at the command line or in the Diagnostics > Command prompt box. Then edit it to add the other line(s).

              The em counters are accessed using sysctl from the command line. For example:

              [2.2-RELEASE][root@xtm5.localdomain]/root: sysctl dev.em.0
              dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.4.2
              dev.em.0.%driver: em
              dev.em.0.%location: slot=0 function=0
              dev.em.0.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x8086 subdevice=0x0000 class=0x020000
              dev.em.0.%parent: pci2
              dev.em.0.nvm: -1
              dev.em.0.debug: -1
              dev.em.0.fc: 3
              dev.em.0.rx_int_delay: 0
              dev.em.0.tx_int_delay: 66
              dev.em.0.rx_abs_int_delay: 66
              dev.em.0.tx_abs_int_delay: 66
              dev.em.0.itr: 488
              dev.em.0.rx_processing_limit: 100
              dev.em.0.eee_control: 1
              dev.em.0.link_irq: 0
              dev.em.0.mbuf_alloc_fail: 0
              dev.em.0.cluster_alloc_fail: 0
              dev.em.0.dropped: 0
              dev.em.0.tx_dma_fail: 0
              dev.em.0.rx_overruns: 0
              dev.em.0.watchdog_timeouts: 0
              dev.em.0.device_control: 1049160
              dev.em.0.rx_control: 0
              dev.em.0.fc_high_water: 18432
              dev.em.0.fc_low_water: 16932
              dev.em.0.queue0.txd_head: 0
              dev.em.0.queue0.txd_tail: 0
              dev.em.0.queue0.tx_irq: 0
              dev.em.0.queue0.no_desc_avail: 0
              dev.em.0.queue0.rxd_head: 0
              dev.em.0.queue0.rxd_tail: 0
              dev.em.0.queue0.rx_irq: 0
              dev.em.0.mac_stats.excess_coll: 0
              dev.em.0.mac_stats.single_coll: 0
              dev.em.0.mac_stats.multiple_coll: 0
              dev.em.0.mac_stats.late_coll: 0
              dev.em.0.mac_stats.collision_count: 0
              dev.em.0.mac_stats.symbol_errors: 0
              dev.em.0.mac_stats.sequence_errors: 0
              dev.em.0.mac_stats.defer_count: 0
              dev.em.0.mac_stats.missed_packets: 0
              dev.em.0.mac_stats.recv_no_buff: 0
              dev.em.0.mac_stats.recv_undersize: 0
              dev.em.0.mac_stats.recv_fragmented: 0
              dev.em.0.mac_stats.recv_oversize: 0
              dev.em.0.mac_stats.recv_jabber: 0
              dev.em.0.mac_stats.recv_errs: 0
              dev.em.0.mac_stats.crc_errs: 0
              dev.em.0.mac_stats.alignment_errs: 0
              dev.em.0.mac_stats.coll_ext_errs: 0
              dev.em.0.mac_stats.xon_recvd: 0
              dev.em.0.mac_stats.xon_txd: 0
              dev.em.0.mac_stats.xoff_recvd: 0
              dev.em.0.mac_stats.xoff_txd: 0
              dev.em.0.mac_stats.total_pkts_recvd: 0
              dev.em.0.mac_stats.good_pkts_recvd: 0
              dev.em.0.mac_stats.bcast_pkts_recvd: 0
              dev.em.0.mac_stats.mcast_pkts_recvd: 0
              dev.em.0.mac_stats.rx_frames_64: 0
              dev.em.0.mac_stats.rx_frames_65_127: 0
              dev.em.0.mac_stats.rx_frames_128_255: 0
              dev.em.0.mac_stats.rx_frames_256_511: 0
              dev.em.0.mac_stats.rx_frames_512_1023: 0
              dev.em.0.mac_stats.rx_frames_1024_1522: 0
              dev.em.0.mac_stats.good_octets_recvd: 0
              dev.em.0.mac_stats.good_octets_txd: 0
              dev.em.0.mac_stats.total_pkts_txd: 0
              dev.em.0.mac_stats.good_pkts_txd: 0
              dev.em.0.mac_stats.bcast_pkts_txd: 0
              dev.em.0.mac_stats.mcast_pkts_txd: 0
              dev.em.0.mac_stats.tx_frames_64: 0
              dev.em.0.mac_stats.tx_frames_65_127: 0
              dev.em.0.mac_stats.tx_frames_128_255: 0
              dev.em.0.mac_stats.tx_frames_256_511: 0
              dev.em.0.mac_stats.tx_frames_512_1023: 0
              dev.em.0.mac_stats.tx_frames_1024_1522: 0
              dev.em.0.mac_stats.tso_txd: 0
              dev.em.0.mac_stats.tso_ctx_fail: 0
              dev.em.0.interrupts.asserts: 0
              dev.em.0.interrupts.rx_pkt_timer: 0
              dev.em.0.interrupts.rx_abs_timer: 0
              dev.em.0.interrupts.tx_pkt_timer: 0
              dev.em.0.interrupts.tx_abs_timer: 0
              dev.em.0.interrupts.tx_queue_empty: 0
              dev.em.0.interrupts.tx_queue_min_thresh: 0
              dev.em.0.interrupts.rx_desc_min_thresh: 0
              dev.em.0.interrupts.rx_overrun: 0
              
              

              Steve

              1 Reply Last reply Reply Quote 0
              • S Offline
                shanis42
                last edited by

                Hello Stephen,

                That are good instructions for someone a little green like I am.

                I created the loader.conf.local file with the script you provided and the went into "Edit File" and added the second line. I removed the lines from loader.conf then rebooted

                I also ran sysctrl dev.em.0 from the command prompt. The packet information is different for somewhat obvious reasons. But other than that, the only section the is radically different is pasted below.

                dev.em.0.watchdog_timeouts: 9
                dev.em.0.device_control: 1477444160
                dev.em.0.rx_control: 67141634
                dev.em.0.fc_high_water: 8192
                dev.em.0.fc_low_water: 6692
                dev.em.0.queue0.txd_head: 466
                dev.em.0.queue0.txd_tail: 466
                dev.em.0.queue0.tx_irq: 0
                dev.em.0.queue0.no_desc_avail: 0
                dev.em.0.queue0.rxd_head: 601
                dev.em.0.queue0.rxd_tail: 600
                

                You have all zeros there. I dont know if this is just different usage in our configurations, it must clear out when I clear my logs out. I know I have had more than 9 watchdog timeouts.

                1 Reply Last reply Reply Quote 0
                • S Offline
                  shanis42
                  last edited by

                  I was able to get this from pciconf -lvbc:

                  em0@pci0:0:25:0:	class=0x020000 card=0xb049144d chip=0x10bd8086 rev=0x02 hdr=0x00
                      class      = network
                      subclass   = ethernet
                      bar   [10] = type Memory, range 32, base 0xfc480000, size 131072, enabled
                      bar   [14] = type Memory, range 32, base 0xfc4a5000, size 4096, enabled
                      bar   [18] = type I/O Port, range 32, base 0x1820, size 32, enabled
                      cap 01[c8] = powerspec 2  supports D0 D3  current D0
                      cap 05[d0] = MSI supports 1 message, 64 bit 
                      cap 09[e0] = vendor (length 6) Intel cap 2 version 0
                  

                  Looks like card supports MSI but not MSI-X

                  From my LAN side adapter:

                  em1@pci0:5:0:0:	class=0x020000 card=0xa01f8086 chip=0x10d38086 rev=0x00 hdr=0x00
                      class      = network
                      subclass   = ethernet
                      bar   [10] = type Memory, range 32, base 0xfc120000, size 131072, enabled
                      bar   [14] = type Memory, range 32, base 0xfc180000, size 524288, enabled
                      bar   [18] = type I/O Port, range 32, base 0x2000, size 32, enabled
                      bar   [1c] = type Memory, range 32, base 0xfc100000, size 16384, enabled
                      cap 01[c8] = powerspec 2  supports D0 D3  current D0
                      cap 05[d0] = MSI supports 1 message, 64 bit 
                      cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
                                   speed 2.5(2.5) ASPM disabled(L0s/L1)
                      cap 11[a0] = MSI-X supports 5 messages
                                   Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
                      ecap 0001[100] = AER 1 0 fatal 1 non-fatal 3 corrected
                      ecap 0003[140] = Serial 1 6805caffff2cea4b
                  

                  Showing it supports MSI and MSI-X.

                  If the problem is MSI-X it would explain why I only get all of my issues (watchdog, hotplug, flap) on the WAN side.

                  Perhaps I should enable MSI and disable MSI-X? Maybe I will try switching the cables and then reassigning the interfaces. I would then have the card that supports MSI-X on the WAN side, and to see if the errors jump over to the LAN side.

                  I currently have both MSI and MSI-X disabled in loader.conf.local. I had them disabled in loader.conf, but moved them and a MBUF reassignment over to the local file and restarted. I did check by running sysctl hw.pci and they are both disabled.

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S Offline
                    stephenw10 Netgate Administrator
                    last edited by

                    Try swapping the NIC assignments so the msi-x capable card is WAN.

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • S Offline
                      shanis42
                      last edited by

                      I have ordered a dual PRO/1000 Pcie card to eliminate the onboard controller which must be older.

                      In the meantime, while I wait for it to arrive, I will switch the interfaces when I get to work in the morning. It is still throwing watchdog errors right as of now, with both MSI an MSI-X disabled. But it is only throwing them on the WAN side. We will see what happens when I put the newer adapter on the WAN side. I suspect something is wrong with either the onboard NIC or the driver for the NIC.

                      1 Reply Last reply Reply Quote 0
                      • S Offline
                        shanis42
                        last edited by

                        Swapping the interfaces didn't work, actually got a little worse. I now know it is specifically that onboard adapter. When you get the timeouts and the interface is assigned to LAN, you get locked out of the web admin until you either restart web admin via the firewalls command menu, or restart the whole box.

                        I flipped them back and will have to wait for my dual nic PCI-e card to come in.

                        I think the onboard adapter is a little older than the PCI-e card I put in. They both want to use the em driver.

                        Any other ideas for future searchers? I read somewhere perhaps connections to Gigabit devices can overwhelm certain adapters, I don't know if there is any truth to that though.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S Offline
                          stephenw10 Netgate Administrator
                          last edited by

                          Not off hand. I would be searching the FreeBSD mailing list and forum using the details from the specific adapter given by pciconf.

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.