Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    High traffic irq problem (no storm)

    Scheduled Pinned Locked Moved General pfSense Questions
    13 Posts 5 Posters 4.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K
      kejianshi
      last edited by

      I suppose you have already looked at the NIC optimization page about MBUFs and queues and you have changed your bios so that its not doing plug and play aware?

      1 Reply Last reply Reply Quote 0
      • W
        wallabybob
        last edited by

        @bsd3000:

        em0 is a mng interface

        mng = management?

        @bsd3000:

        Yesterday  we had a peak of 70Mbit and pfsense  hang/freeze!

        70Mbit (presumably 70Mbps) between WAN and servers? Reported by pfSense RRD graph, Traffic -> WAN? How long did the peak appear to last?

        what did you observe that you now describe as hang/freeze? GUI doesn't respond? ssh session stalls? console keyboard doesn't respond to Enter key? Console keyboard Caps Lock indicator doesn't change with presses of Caps Lock key? etc

        @bsd3000:

        in particular the percentage of irq went up around  to  50%

        as reported by top? If so, what was identified as major CPU user? (and if the system was truly "frozen" what would be reporting?)

        @bsd3000:

        vmstat -i
        interrupt total rate
        IRQ1: atkbd0 18 0
        IRQ14: ATA0 68 0
        irq16: uhci0 17 0
        irq23: ehci0 2 0
        irq24: 58807455 ciss0 4
        irq25: 1667747890 bge0 123
        irq26: 1694685519 bge1 125
        irq48: em0 777403204 57
        cpu0: timer 27034549131 2000
        Total 31233193304 2310

        These interrupt rates are not significant, but because they are averaged since boot time spikes won't show up here.

        @bsd3000:

        polling/tco and some other advanced tunning options does not provide improvements
        [/quotes]
        Improvements as in doesn't hang/freeze?

        @bsd3000:

        How can I do to handling 100Mbit of real internet traffic? :)

        The CPU should be able to forward 100Mbps without much effort at all. I suspect the problem might be a resource exhaustion problem. Perhaps you don't have enough firewall states for the UDP traffic. You can view state use history at Status -> RRD Graphs, System tab, States graphs.

        Have you read http://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards

        1 Reply Last reply Reply Quote 0
        • B
          bsd3000
          last edited by

          @wallabybob:

          mng = management?

          Yes

          @wallabybob:

          70Mbit (presumably 70Mbps) between WAN and servers? Reported by pfSense RRD graph, Traffic -> WAN? How long did the peak appear to last?

          30 mins of 70Mbits (normally I have 50Mbits)

          @wallabybob:

          what did you observe that you now describe as hang/freeze? GUI doesn't respond? ssh session stalls? console keyboard doesn't respond to Enter key? Console keyboard Caps Lock indicator doesn't change with presses of Caps Lock key? etc

          more than 40% of pkg loss from WAN to VLANS and from VLANS to WAN

          console keyboard doesn't respond to Enter key ore respond after some seconds

          GUI doesn't respond or respond after some seconds

          @wallabybob:

          as reported by top? If so, what was identified as major CPU user? (and if the system was truly "frozen" what would be reporting?)

          somthing like:
          50.0% system (I use device polling)
          50.0% interrupt

          50.58% idlepoll
            20.00% {irq26: bge1}
            40.00% {irq25: bge0}

          @wallabybob:

          The CPU should be able to forward 100Mbps without much effort at all. I suspect the problem might be a resource exhaustion problem. Perhaps you don't have enough firewall states for the UDP traffic. You can view state use history at Status -> RRD Graphs, System tab, States graphs.

          I see 100K of peak states, I normal have:
          Show states  15123/385000
          MBUF Usage 7714/25600

          @wallabybob:

          Have you read http://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards

          yess

          Thanks in advance I will try to tune better :/

          1 Reply Last reply Reply Quote 0
          • I
            ilaurens
            last edited by

            Most obvious awnser would be that the server is to slow to handle that amount of traffic, take a look at CPU and memory usage when the problem is happening.

            I assume this is a single core system? It's always recommanded to have a dual core at least.

            1 Reply Last reply Reply Quote 0
            • K
              kejianshi
              last edited by

              How can it be a HP DL 360 on one core? Unless its running in VM?
              In which case I'm going to ask - Why do people keep forgetting to mention virtualization layers in their freaken specs?

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                No way that machine should be struggling with <100Mbps.
                Do not use device polling. Make sure polling is not active, I've found it can be a bit 'sticky' when I've tried it.

                Steve

                1 Reply Last reply Reply Quote 0
                • K
                  kejianshi
                  last edited by

                  2 integrate broadcom gigabit (bge) and 1 pci intel (em)

                  At the risk of sounding like a simpleton…  Can you fit a couple Dual Port Intel PCIe Nics in there?

                  Go all Intel?

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Re-reading this if you've tried all the tuning option for your NICs I'd next check the CARP interface.
                    If you failover to the other box does the situation change.

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • I
                      ilaurens
                      last edited by

                      @stephenw10:

                      No way that machine should be struggling with <100Mbps.
                      Do not use device polling. Make sure polling is not active, I've found it can be a bit 'sticky' when I've tried it.

                      Steve

                      Well, enable some plugins and gone is the 100mbit speed, also single core is not really good for tasks like this since there are always other things to do and putting them in wait is not helping either. Disabling polling won't help much unless the card does not support it.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Device polling is not enabled by default and there is very little advantage to enabling it in almost every case. In most cases it makes things worse, sometimes a lot worse!
                        A bit old now but see: http://blog.pfsense.org/?p=115
                        And more recently: http://blog.pfsense.org/?p=115#comment-21378

                        I just noticed your traffic is almost all DNS. In that case the total bandwidth is probably less significant than the packets per second. A very high number of small packets will cause a high interrupt load.

                        Steve

                        1 Reply Last reply Reply Quote 0
                        • B
                          bsd3000
                          last edited by

                          Hi & thanks!

                          unfortunately I can't reproduce the situation high load (high DNS query/sec)

                          but

                          Probably I need to upgrade my hardware (I read all document about tunning)

                          So, instead of my hp DL360 server with embedded 2xBroadcom, what hardware do you recommend?

                          Integrated Intel or PCI-E addonn card?

                          What the best Nic? (model/chipset)

                          AMD 16x core Proc or Intel Quad Core Xeon?

                          Kind regards !!!

                          1 Reply Last reply Reply Quote 0
                          • W
                            wallabybob
                            last edited by

                            @bsd3000:

                            Probably I need to upgrade my hardware (I read all document about tunning)

                            So, instead of my hp DL360 server with embedded 2xBroadcom, what hardware do you recommend?

                            Integrated Intel or PCI-E addonn card?

                            What the best Nic? (model/chipset)

                            AMD 16x core Proc or Intel Quad Core Xeon?

                            You can throw some more hardware at the problem in the hope it might make a difference but you really need to get more information on what was going on in order to correctly determine the solution. For example, if you have a rogue system (or systems) issuing floods of DNS requests it is unlikely that adding more cores or "server quality" NICs or more RAM will allow you to give "good" DNS response to other systems.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.