Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    VLAN Interrupt storm solutions? pf 2.03 / X7SPA-HF / Intel 82574L

    Scheduled Pinned Locked Moved Hardware
    10 Posts 3 Posters 4.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ?
      Guest
      last edited by

      I have 2x Intel 82574L NICs on a Supermicro D510.

      em0 WAN1 (2 VLANS)
      em1 WAN2 + LAN (3 VLANS)

      em1 gets interrupt storms and freezes the entire box.
      This happens with at low throughput but high number of connections.

      No IRQs are shared, BIOS manages IRQs and unused devices are off.

      Checking or unchecking the following makes no difference
      Device polling
      Hardware Checksum Offloading
      Hardware TCP Segmentation Offloading
      Hardware Large Receive Offloading

      Is there a software change that would fix this?

      1 Reply Last reply Reply Quote 0
      • W
        wallabybob
        last edited by

        @themisa:

        em1 gets interrupt storms and freezes the entire box.
        . . .
        No IRQs are shared, BIOS manages IRQs and unused devices are off.

        If em1 truly is the sole user of its irq then the interrupt storm report suggests the em driver isn't correctly clearing an interrupt condition on em1. PERHAPS the newer device driver in pfSense 2.1 snapshot builds will work better.

        1 Reply Last reply Reply Quote 0
        • ?
          Guest
          last edited by

          i use a big chunk of pf features, chances are 2.1 will just introduce new issues
          this is an older server grade motherboard with an intel driver, if it's not compatible i don't know what is

          are there any useful commands i can run to narrow down the issue to something more specific?

          1 Reply Last reply Reply Quote 0
          • W
            wallabybob
            last edited by

            @themisa:

            this is an older server grade motherboard with an intel driver, if it's not compatible i don't know what is

            New devices are not always "compatible" with old device drivers. pfSense 2.0.x is based on FreeBSD 8.1 which is about two years older than FreeBSD 8.3 used in the pfSense 2.1 snapshot builds.

            @themisa:

            are there any useful commands i can run to narrow down the issue to something more specific?

            Some of the em devices have a feature called "interrupt moderation" - interrupts are delayed a programmable period to reduce the overhead of the interrupt by giving a single interrupt more work to do. For example, on a busy interface by delaying an interrupt request by (say) 100 microseconds there might be 10 receive packets available for processing rather than one. Please post the output of pfSense shell command```
            sysctl -a | grep em1

            PERHAPS tweaking the interrupt moderation will reduce the interrupt storm reports.
            
            It could also be useful to post the output of pfSense shell commands:```
            ifconfig
            vmstat -i
            
            1 Reply Last reply Reply Quote 0
            • ?
              Guest
              last edited by

              interrupt                          total            rate
              irq18: ehci0 uhci5            2                  0
              irq19: uhci2 uhci4+        19                0
              irq23: uhci3 ehci1            97485          2
              cpu0: timer                    17324399    399
              irq256: em0:rx 0            5371292    124
              irq257: em0:tx 0            5694835    131
              irq258: em0:link              15281          0
              irq259: em1:rx 0            2296709      53
              irq260: em1:tx 0            2130004      49
              irq261: em1:link              19916          0
              cpu1: timer                    17320395    399
              cpu2: timer                    17320395    399
              cpu3: timer                    17320395    399
              Total                              84911127    1960

              "sysctl -a | grep em1" doesn't find anything
              thanks for trying wallabybob
              time to start thinking new hardware

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Try em.1:

                [2.0.3-RELEASE][root@pfsense.fire.box]/root(1): sysctl -a|grep em1
                [2.0.3-RELEASE][root@pfsense.fire.box]/root(2): sysctl -a | grep em.1
                dev.em.1.%desc: Intel(R) PRO/1000 Legacy Network Connection 1.0.4
                dev.em.1.%driver: em
                dev.em.1.%location: slot=14 function=0
                dev.em.1.%pnpinfo: vendor=0x8086 device=0x1079 subvendor=0x8086 subdevice=0x1179 class=0x020000
                dev.em.1.%parent: pci3
                dev.em.1.nvm: -1
                dev.em.1.rx_int_delay: 0
                dev.em.1.tx_int_delay: 66
                dev.em.1.rx_abs_int_delay: 66
                dev.em.1.tx_abs_int_delay: 66
                dev.em.1.rx_processing_limit: 100
                dev.em.1.flow_control: 3
                dev.em.1.mbuf_alloc_fail: 0
                dev.em.1.cluster_alloc_fail: 0
                dev.em.1.dropped: 0
                dev.em.1.tx_dma_fail: 0
                dev.em.1.tx_desc_fail1: 0
                dev.em.1.tx_desc_fail2: 0
                dev.em.1.rx_overruns: 0
                dev.em.1.watchdog_timeouts: 0
                dev.em.1.device_control: 1492124233
                dev.em.1.rx_control: 32770
                dev.em.1.fc_high_water: 47104
                dev.em.1.fc_low_water: 45604
                dev.em.1.fifo_workaround: 0
                dev.em.1.fifo_reset: 0
                dev.em.1.txd_head: 83
                dev.em.1.txd_tail: 84
                dev.em.1.rxd_head: 191
                dev.em.1.rxd_tail: 190
                dev.em.1.mac_stats.excess_coll: 0
                dev.em.1.mac_stats.single_coll: 0
                dev.em.1.mac_stats.multiple_coll: 0
                dev.em.1.mac_stats.late_coll: 0
                dev.em.1.mac_stats.collision_count: 0
                dev.em.1.mac_stats.symbol_errors: 0
                dev.em.1.mac_stats.sequence_errors: 0
                dev.em.1.mac_stats.defer_count: 0
                dev.em.1.mac_stats.missed_packets: 0
                dev.em.1.mac_stats.recv_no_buff: 0
                dev.em.1.mac_stats.recv_undersize: 0
                dev.em.1.mac_stats.recv_fragmented: 0
                dev.em.1.mac_stats.recv_oversize: 0
                dev.em.1.mac_stats.recv_jabber: 0
                dev.em.1.mac_stats.recv_errs: 0
                dev.em.1.mac_stats.crc_errs: 0
                dev.em.1.mac_stats.alignment_errs: 0
                dev.em.1.mac_stats.coll_ext_errs: 0
                dev.em.1.mac_stats.xon_recvd: 0
                dev.em.1.mac_stats.xon_txd: 0
                dev.em.1.mac_stats.xoff_recvd: 0
                dev.em.1.mac_stats.xoff_txd: 0
                dev.em.1.mac_stats.total_pkts_recvd: 34413999
                dev.em.1.mac_stats.good_pkts_recvd: 34413999
                dev.em.1.mac_stats.bcast_pkts_recvd: 32180
                dev.em.1.mac_stats.mcast_pkts_recvd: 0
                dev.em.1.mac_stats.rx_frames_64: 6363096
                dev.em.1.mac_stats.rx_frames_65_127: 17326141
                dev.em.1.mac_stats.rx_frames_128_255: 6554914
                dev.em.1.mac_stats.rx_frames_256_511: 1041309
                dev.em.1.mac_stats.rx_frames_512_1023: 1386613
                dev.em.1.mac_stats.rx_frames_1024_1522: 1741926
                dev.em.1.mac_stats.good_octets_recvd: 6621995344
                dev.em.1.mac_stats.good_octets_txd: 28106065051
                dev.em.1.mac_stats.total_pkts_txd: 40379356
                dev.em.1.mac_stats.good_pkts_txd: 40379356
                dev.em.1.mac_stats.bcast_pkts_txd: 26373
                dev.em.1.mac_stats.mcast_pkts_txd: 5
                dev.em.1.mac_stats.tx_frames_64: 1438817
                dev.em.1.mac_stats.tx_frames_65_127: 12731412
                dev.em.1.mac_stats.tx_frames_128_255: 6641964
                dev.em.1.mac_stats.tx_frames_256_511: 1181055
                dev.em.1.mac_stats.tx_frames_512_1023: 1178482
                dev.em.1.mac_stats.tx_frames_1024_1522: 17207626
                dev.em.1.mac_stats.tso_txd: 0
                dev.em.1.mac_stats.tso_ctx_fail: 0
                
                

                Steve

                1 Reply Last reply Reply Quote 0
                • ?
                  Guest
                  last edited by

                  good catch

                  em0 gets interrupt storms not em1, sorry, i had them reversed, it's like this

                  em0 WAN2 + LAN (3 VLANS)
                  em1 WAN1 (2 VLANS) 
                  so em0 has more traffic due to lan.

                  
                  dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.3.2
                  dev.em.0.%driver: em
                  dev.em.0.%location: slot=0 function=0
                  dev.em.0.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x060a class=0x020000
                  dev.em.0.%parent: pci2
                  dev.em.0.nvm: -1
                  dev.em.0.debug: -1
                  dev.em.0.fc: 3
                  dev.em.0.rx_int_delay: 0
                  dev.em.0.tx_int_delay: 66
                  dev.em.0.rx_abs_int_delay: 66
                  dev.em.0.tx_abs_int_delay: 66
                  dev.em.0.rx_processing_limit: 100
                  dev.em.0.eee_control: 0
                  dev.em.0.link_irq: 76
                  dev.em.0.mbuf_alloc_fail: 0
                  dev.em.0.cluster_alloc_fail: 0
                  dev.em.0.dropped: 0
                  dev.em.0.tx_dma_fail: 0
                  dev.em.0.rx_overruns: 0
                  dev.em.0.watchdog_timeouts: 0
                  dev.em.0.device_control: 1074790984
                  dev.em.0.rx_control: 67403778
                  dev.em.0.fc_high_water: 18432
                  dev.em.0.fc_low_water: 16932
                  dev.em.0.queue0.txd_head: 14
                  dev.em.0.queue0.txd_tail: 14
                  dev.em.0.queue0.tx_irq: 106328859
                  dev.em.0.queue0.no_desc_avail: 0
                  dev.em.0.queue0.rxd_head: 541
                  dev.em.0.queue0.rxd_tail: 540
                  dev.em.0.queue0.rx_irq: 120560838
                  dev.em.0.mac_stats.excess_coll: 0
                  dev.em.0.mac_stats.single_coll: 0
                  dev.em.0.mac_stats.multiple_coll: 0
                  dev.em.0.mac_stats.late_coll: 0
                  dev.em.0.mac_stats.collision_count: 0
                  dev.em.0.mac_stats.symbol_errors: 0
                  dev.em.0.mac_stats.sequence_errors: 0
                  dev.em.0.mac_stats.defer_count: 0
                  dev.em.0.mac_stats.missed_packets: 309717
                  dev.em.0.mac_stats.recv_no_buff: 119595
                  dev.em.0.mac_stats.recv_undersize: 0
                  dev.em.0.mac_stats.recv_fragmented: 0
                  dev.em.0.mac_stats.recv_oversize: 0
                  dev.em.0.mac_stats.recv_jabber: 0
                  dev.em.0.mac_stats.recv_errs: 0
                  dev.em.0.mac_stats.crc_errs: 0
                  dev.em.0.mac_stats.alignment_errs: 0
                  dev.em.0.mac_stats.coll_ext_errs: 0
                  dev.em.0.mac_stats.xon_recvd: 0
                  dev.em.0.mac_stats.xon_txd: 0
                  dev.em.0.mac_stats.xoff_recvd: 0
                  dev.em.0.mac_stats.xoff_txd: 0
                  dev.em.0.mac_stats.total_pkts_recvd: 193440710
                  dev.em.0.mac_stats.good_pkts_recvd: 193130993
                  dev.em.0.mac_stats.bcast_pkts_recvd: 32757
                  dev.em.0.mac_stats.mcast_pkts_recvd: 467191
                  dev.em.0.mac_stats.rx_frames_64: 16348866
                  dev.em.0.mac_stats.rx_frames_65_127: 96904487
                  dev.em.0.mac_stats.rx_frames_128_255: 47080072
                  dev.em.0.mac_stats.rx_frames_256_511: 12219416
                  dev.em.0.mac_stats.rx_frames_512_1023: 3441920
                  dev.em.0.mac_stats.rx_frames_1024_1522: 17136232
                  dev.em.0.mac_stats.good_octets_recvd: 48932482221
                  dev.em.0.mac_stats.good_octets_txd: 86853271375
                  dev.em.0.mac_stats.total_pkts_txd: 159394546
                  dev.em.0.mac_stats.good_pkts_txd: 159394546
                  dev.em.0.mac_stats.bcast_pkts_txd: 11344
                  dev.em.0.mac_stats.mcast_pkts_txd: 114483
                  dev.em.0.mac_stats.tx_frames_64: 2459808
                  dev.em.0.mac_stats.tx_frames_65_127: 59291145
                  dev.em.0.mac_stats.tx_frames_128_255: 25944268
                  dev.em.0.mac_stats.tx_frames_256_511: 22065346
                  dev.em.0.mac_stats.tx_frames_512_1023: 5965944
                  dev.em.0.mac_stats.tx_frames_1024_1522: 43668035
                  dev.em.0.mac_stats.tso_txd: 0
                  dev.em.0.mac_stats.tso_ctx_fail: 0
                  dev.em.0.interrupts.asserts: 77
                  dev.em.0.interrupts.rx_pkt_timer: 0
                  dev.em.0.interrupts.rx_abs_timer: 0
                  dev.em.0.interrupts.tx_pkt_timer: 0
                  dev.em.0.interrupts.tx_abs_timer: 0
                  dev.em.0.interrupts.tx_queue_empty: 0
                  dev.em.0.interrupts.tx_queue_min_thresh: 0
                  dev.em.0.interrupts.rx_desc_min_thresh: 0
                  dev.em.0.interrupts.rx_overrun: 0
                  
                  
                  1 Reply Last reply Reply Quote 0
                  • W
                    wallabybob
                    last edited by

                    The vmstat output suggests the interrupt storm was short lived. It also shows em1 has three distinct interrupt vectors. Please post the exact text of the interrupt storm message.

                    Thanks Steve for correcting the grep parameter. It appears that em1 supports interrupt moderation with capability of delaying receive interrupts and transmit interrupts by up to 66 microseconds.

                    1 Reply Last reply Reply Quote 0
                    • ?
                      Guest
                      last edited by

                      edited the previous post, em0 is the culprit
                      'interrupt storm detected on irq256' is what it says.

                      more often there is no message but cpu gets maxed out handling irqs not long enough for me to even observe this in top but long enough to break both gateways since pf can't ping them when this happens

                      it looks like "dev.em.0.rx_int_delay: 0" is the settings that would affect the problematic "irq256: em0:rx"
                      but according to intel messing with "RxIntDelay" has the potential of hanging the adapter
                      http://www.intel.com/support/network/adapter/pro100/sb/cs-032516.htm

                      anyone have experience with this setting?

                      1 Reply Last reply Reply Quote 0
                      • W
                        wallabybob
                        last edited by

                        Perhaps a suitable workaround would be to go to System -> Routing click on Gateways tab and edit your gateways to increase the Frequency Probe (more correctly called the Probe Interval) and the Down time so the gateway monitoring is a bit more robust over busy periods.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.