Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Is managing the state table taking up all of my CPU?

    Scheduled Pinned Locked Moved General pfSense Questions
    9 Posts 4 Posters 3.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S Offline
      stvboyle
      last edited by

      One of my firewalls, running 2.1-RC1, is under high CPU load today.  I can see that the number of states in the state table is maxed out.  I'm seeing the following:

      top -SH
      last pid: 15560;  load averages:  8.02,  7.74,  6.92                                                      up 0+07:59:57  15:34:36
      204 processes: 11 running, 129 sleeping, 54 waiting, 10 lock
      CPU:  0.0% user,  1.9% nice,  3.3% system, 78.8% interrupt, 16.0% idle
      Mem: 718M Active, 52M Inact, 1297M Wired, 68K Cache, 134M Buf, 3807M Free
      Swap: 16G Total, 16G Free

      PID USERNAME PRI NICE  SIZE    RES STATE  C  TIME  WCPU COMMAND
        12 root    -44    -    0K  1008K *pf ta  1 294:17 80.86% intr{swi1: netisr 1}
        12 root    -44    -    0K  1008K *pf ta  7 293:34 80.66% intr{swi1: netisr 7}
        12 root    -44    -    0K  1008K *pf ta  5 293:02 79.79% intr{swi1: netisr 5}
        12 root    -44    -    0K  1008K *pf ta  2 294:46 79.05% intr{swi1: netisr 2}
        12 root    -44    -    0K  1008K *pf ta  3 289:17 79.05% intr{swi1: netisr 3}
        12 root    -44    -    0K  1008K *pf ta  6 294:15 78.76% intr{swi1: netisr 6}
        12 root    -44    -    0K  1008K *pf ta  0 265:07 78.08% intr{swi1: netisr 0}
        12 root    -44    -    0K  1008K *pf ta  4 263:36 76.95% intr{swi1: netisr 4}

      pfctl -si
      State Table                          Total            Rate
        current entries                  1400004
        searches                      3972651339      137657.3/s
        inserts                        742211929        25718.6/s
        removals                      740811925        25670.0/s
      Counters
        match                          773239769        26793.7/s
        bad-offset                            0            0.0/s
        fragment                              0            0.0/s
        short                                  0            0.0/s
        normalize                              0            0.0/s
        memory                          28656396          993.0/s
        bad-timestamp                          0            0.0/s
        congestion                            0            0.0/s
        ip-option                              0            0.0/s
        proto-cksum                            1            0.0/s
        state-mismatch                  5102495          176.8/s
        state-insert                          0            0.0/s
        state-limit                            0            0.0/s
        src-limit                              0            0.0/s
        synproxy                              0            0.0/s
        divert                                0            0.0/s

      I've configured a maximum of 1400000 firewal states.  Kind of looks like the system is just busy dealing with the state table, is that right?  Any suggestions as to how pfSense can better handle this situation?

      Thanks!

      1 Reply Last reply Reply Quote 0
      • J Offline
        jasonlitka
        last edited by

        How much traffic runs through this box?

        I can break anything.

        1 Reply Last reply Reply Quote 0
        • H Offline
          heper
          last edited by

          is there a reason you are still running RC1 ? if not start: please upgrade to 2.1 - stable | chances are the problem is gone - if not , then its a thousand times easier to debug

          1 Reply Last reply Reply Quote 0
          • S Offline
            stvboyle
            last edited by

            @Jason

            In terms of data this firewall peaks at about 425mbps In+Out.  In terms of packets, it typically peaks at about 70kpps In+Out, yesterday it was hitting close to 100kpps In+Out.

            1 Reply Last reply Reply Quote 0
            • S Offline
              stvboyle
              last edited by

              @heper

              I've got several instances of pfSense ranging from 2.0.1-release up through 2.1-release.  My experience has been that I don't see much difference between 2.1-RC1 and 2.1-release.  I just need to find a time to upgrade, our business runs high volume 24x7.

              I tend to see the problem most when there is high state churn, no matter the version of pfSense.

              1 Reply Last reply Reply Quote 0
              • stephenw10S Offline
                stephenw10 Netgate Administrator
                last edited by

                1400000 is a lot of states.  ;)
                Do you still have the firewall optimisation set to 'normal'? You could try setting it to 'aggressive' so that it times out firewall states quicker. However as the warning says you may end up dropping some legitimate states.
                You could try the adaptive timeout settings. Though I have no experience of using those at all they seem relevant here.

                Steve

                1 Reply Last reply Reply Quote 0
                • S Offline
                  stvboyle
                  last edited by

                  Thanks, Steve.  I've always had to use the 'aggressive' setting.  Pre-2.1 versions had the adaptive settings by default, they used to kick in at 60%.  In 2.1, the adaptive settings are off by default.  I always found that the adaptive settings kicked in too soon, I kind of like not having them - it puts off the pain a little longer in my case.

                  Yes, 1.4M is a lot of states.  Having that many is unusual and undesirable in my case.  In addition to the firewall, I use the load balancer in pfSense.  A single connection from the WAN, through the load balancer, to a web server sets up a bunch of states.

                  We have an application with an interface that gets tons of html queries, over and over.  We encourage connection reuse but sometimes we get hit with thousands of queries per second, each a new and short-lived connection.  Setting up and tearing down thousands of connections a second seems to drive CPU use - I'm guessing related to managing the state table.

                  I guess I've never tried setting a ridiculous number of maximum states and just let the table grow.  Seems like if I set it too high then I get 100% CPU and I've had to power cycle the box in the past (no remote KVM access).

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S Offline
                    stephenw10 Netgate Administrator
                    last edited by

                    So is this actually causing a problem?
                    You seem to have plenty of RAM in that box so you could increase the state table size. I wouldn't have thought the size of the table would increase CPU usage as much as the rate of states added and removed which would stay the same. There would come a point where the table could not be maxed out due to state decay matching the new state rate. That point might be ridiculously large though! I would have thought you could achieve a balance using the adaptive timeout settings. If you set the initial number quite low (perhaps half the table size, total guess) and the maximum number to some value larger than the table you could get a very gradual roll off as the table filled.

                    This is way outside my experience though.  ;)

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • S Offline
                      stvboyle
                      last edited by

                      I've never been clear if I'm dealing with a pure packets-per-second problem (incoming packets driving a lot of interrupts) or a state table problem (too much state churn) or a combination of both.  The key part for me in this post is what I see under STATE in the output from top, it shows "*pf ta" - as I understand it this means that the CPU is waiting on the pf process for something.  I'm guessing the "ta" part relates to the state table.

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.