Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Connections dropping under heavy load

    Scheduled Pinned Locked Moved General pfSense Questions
    18 Posts 3 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      pedreter
      last edited by pedreter

      We have a carp cluster with pfsense 2.4.4.-p3 using intel gigabit cards i350-P2 (SuperMicro with CPU 12 cores, 16GB Ram)

      When the networks load goes over 30 Mb/s aprox... connections crossing the FWs are dropped randomly... shh disconnects, etc... if we try to reconnect, they work without problem for some seconds or minutes and drops again...

      We have tried all combinations of checksum offloading, TSO, LRO... enabling disabling, etc..., with no better results...

      We have followed the guide tuning-and-troubleshooting-network-cards.html from pfsense with no better results...

      By the way, this problem was not present in old pfsense 2.2.x in this very hardware.

      Any idea, please???

      Thanks!

      P.

      1 Reply Last reply Reply Quote 0
      • DerelictD
        Derelict LAYER 8 Netgate
        last edited by

        That probably has nothing to do with the amount of traffic (30Mb/sec is pretty much nothing) but the number of states.

        What do the state levels look like?

        Chattanooga, Tennessee, USA
        A comprehensive network diagram is worth 10,000 words and 15 conference calls.
        DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
        Do Not Chat For Help! NO_WAN_EGRESS(TM)

        P 1 Reply Last reply Reply Quote 0
        • P
          pedreter @Derelict
          last edited by

          Thx @Derelict..

          Values are on average...

          State table size: 2% (30513/1630000)
          and
          MBUF Usage: 1% (28616/2000000)

          i agree 309 Mbps is nothing... :-(

          Thanks!

          1 Reply Last reply Reply Quote 0
          • DerelictD
            Derelict LAYER 8 Netgate
            last edited by

            Do you have State Killing on Gateway Failure checked in System > Advanced, Miscellaneous on either node?

            i agree 309 Mbps is nothing... :-(

            Your OP said 30Mb/sec.

            Chattanooga, Tennessee, USA
            A comprehensive network diagram is worth 10,000 words and 15 conference calls.
            DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
            Do Not Chat For Help! NO_WAN_EGRESS(TM)

            P 1 Reply Last reply Reply Quote 0
            • P
              pedreter @Derelict
              last edited by

              @Derelict

              Sorry, a typo... 30Mbps... not 309

              Thanks..

              1 Reply Last reply Reply Quote 0
              • DerelictD
                Derelict LAYER 8 Netgate
                last edited by

                Do you have State Killing on Gateway Failure checked in System > Advanced, Miscellaneous on either node?

                Chattanooga, Tennessee, USA
                A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                Do Not Chat For Help! NO_WAN_EGRESS(TM)

                P 1 Reply Last reply Reply Quote 0
                • P
                  pedreter @Derelict
                  last edited by

                  @Derelict

                  State Killing on Gateway Failure is unchecked

                  Thanks for you kindness and help Derelict!

                  1 Reply Last reply Reply Quote 0
                  • DerelictD
                    Derelict LAYER 8 Netgate
                    last edited by

                    On both nodes?

                    Well, something is killing the states. The default expiration of an ESTABLISHED:ESTABLISHED TCP connection is 24-hours of zero traffic.

                    People sometimes see this when adaptive pruning kicks in but at those state table levels that certainly should not be the case.

                    Again, this would have nothing to do with traffic load but something killing the state.

                    Chattanooga, Tennessee, USA
                    A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                    DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                    Do Not Chat For Help! NO_WAN_EGRESS(TM)

                    P 2 Replies Last reply Reply Quote 0
                    • P
                      pedreter @Derelict
                      last edited by

                      @Derelict

                      Your words make sense to me... i will dig out in that direction...

                      Thanks again!

                      1 Reply Last reply Reply Quote 0
                      • P
                        pedreter @Derelict
                        last edited by

                        @Derelict said in Connections dropping under heavy load:

                        ptive pruning kicks in but at those state table levels that certai

                        Derelict,

                        Currently i have this values:

                        Firewall Maximum States: 1630000

                        but

                        net.pf.source_nodes_hashsize: 8192
                        net.pf.states_hashsize: 32768

                        are they correct? should not they be bigger?

                        Thanks!

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          That's the default size and they are never normally an issue.

                          One thing you might try here is to disable pfSync on the secondary. It's possible you have an interface mismatch and the secondary is syncing back states onto the wrong interface breaking them.
                          If you no longer lose connections with that disabled check the config of both firewalls match exactly.
                          Though that would not normally be load related.

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • P
                            pedreter
                            last edited by

                            @stephenw10 said in Connections dropping under heavy load:

                            That's the default size and they are never normally an issue.

                            Thanks Stephen..

                            When i do what you suggest the state table grows hugely. and very quickly... is that normal? and gets back to normal if i reactivate pfsync in Secondary.

                            i am trying t dig our it it does make any difference....

                            Thanks!

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              How huge? It might be the secondary was killing most states and now it is not...

                              How many clients are behind it?

                              Steve

                              P 1 Reply Last reply Reply Quote 0
                              • P
                                pedreter @stephenw10
                                last edited by pedreter

                                @stephenw10

                                UAU... Stephenw01 very interesting your remark... huge means (on average)... from 25.0000 sudden grow to 150.0000 entries... yes that huge! and back to 25.000 if secondary pfsync is enabled again.

                                There are 15 clients behind the pfsense-cluster.

                                Why the secondary would want to kill states?

                                Thanks!

                                1 Reply Last reply Reply Quote 0
                                • DerelictD
                                  Derelict LAYER 8 Netgate
                                  last edited by

                                  Besides looking at the numbers of states (what is 150.0000 anyway? Is that one hundred fifty thousand or one million five hundred thousand?) does the issue with your states being killed (ssh sessions dying, etc) go away with pfsync disabled?

                                  As Steve mentioned the first thing to do is verify all of your interfaces match up.

                                  I use Diagnostics > Interfaces for this. The internal interface name (wan, lan, opt1, opt2, etc), the physical interface name (igb0, ix1, re2, vxnet4) all need to match exactly between primary and secondary. The description should not need to match but for consistency I would make them match.

                                  What you are seeing is not normal. There is obviously something wrong with your configuration. What that is is still unknown. Don't think either of us have ever see this exact behavior before.

                                  Chattanooga, Tennessee, USA
                                  A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                                  DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                                  Do Not Chat For Help! NO_WAN_EGRESS(TM)

                                  1 Reply Last reply Reply Quote 0
                                  • P
                                    pedreter
                                    last edited by

                                    Thanks Derelict...

                                    Sorry again for my typo: correct figure is 150.000

                                    I agree this does not look normal.

                                    I migrated from 2.1.5 (worked so good!)to 2.4.4-p3 by installing 2.4.4 from iso and then importing config from XML file.

                                    There was no error importing the old config (there were no packages installed) and the interfaces names, description, phisical device match exactly. In fact CARP is working.

                                    May the XML import have done anything in 2.4.4 to generate this problem? maybe something has been corrupted?

                                    Thanks again!

                                    1 Reply Last reply Reply Quote 0
                                    • DerelictD
                                      Derelict LAYER 8 Netgate
                                      last edited by Derelict

                                      Doubtful.

                                      You still have not answered the question: does the issue with your states being killed (ssh sessions dying, etc) go away with pfsync disabled?

                                      Perhaps you should post your settings instead of just saying they match. Cannot count the times a poster has said things are one way when they, in fact, are not.

                                      You did update both nodes to 2.4.4-p3 correct?

                                      Chattanooga, Tennessee, USA
                                      A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                                      DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                                      Do Not Chat For Help! NO_WAN_EGRESS(TM)

                                      1 Reply Last reply Reply Quote 0
                                      • stephenw10S
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        I mean 10k states per client does seem..... high! But it depends what those clients are doing. If those are all legitimate states then you could be hitting something else more quickly than we would otherwise expect.

                                        But, yeah, did disabling pfSync on the secondary correct the connection drops you were seeing?

                                        Steve

                                        1 Reply Last reply Reply Quote 0
                                        • First post
                                          Last post
                                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.