Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Growing states table, some leak?

    Scheduled Pinned Locked Moved Plus 23.09 Development Snapshots (Retired)
    41 Posts 7 Posters 5.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • w0wW
      w0w
      last edited by w0w

      9a9e16d6-a9c4-430c-bbbf-cb0b6c0d7e79-image.png
      at the same time
      f43da337-7f14-48d3-987e-58ba398795e2-image.png

      pfctl -ss>/root/dump.txt produces the file with 3441 lines not 1000213 as it theoretically should be. So what are those “ghost” states, @jimp?

      1 Reply Last reply Reply Quote 0
      • jimpJ
        jimp Rebel Alliance Developer Netgate
        last edited by

        Interesting that it doesn't line up. I checked two HA pairs in my lab (one Plus, one CE) and neither of them show that kind of behavior on either node.

        What does pfctl -si show in its data for states? What about other info there?

        That's the kind of weirdness I might expect to see if somehow the world and kernel don't match up, like either the kernel isn't being updated and it's running an older kernel, or the kernel updated but the base system didn't.

        Check the output of pkg -x pfSense and uname -a and see what the output shows on there.

        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        w0wW 2 Replies Last reply Reply Quote 0
        • w0wW
          w0w @jimp
          last edited by w0w

          @jimp said in Growing states table, some leak?:

          pfctl -si

          7db4f057-59da-499c-b3b6-9acfe7ce40e3-image.png

          @jimp said in Growing states table, some leak?:

          pkg -x pfSense

          pkg info? -x does not recognized as command

          af08d0c9-9174-4725-98a5-4d8819c25981-image.png

          @jimp said in Growing states table, some leak?:

          uname -a

          14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400094 #1 plus-devel-main-n256131-b9588f9fb62: Sat Aug 26 18:08:03 UTC 2023     root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-master-main/obj/amd64/iYn5SZ87/var/jenkins/workspace/pfSense-Plus-snapshots-master-main/sources/FreeBSD-src-plus-devel-main/amd64.amd64/sys/pfSense amd64
          

          @jimp said in Growing states table, some leak?:

          Interesting that it doesn't line up.

          I switched from primary to secondary by putting primary into maintenance mode. May be I did it several times. At the same time I have fatal traps sometimes, it's on the other topic. And there is PPPoE on the WAN1, WAN2 is DHCP. MultiWAN gateway is used

          1 Reply Last reply Reply Quote 0
          • w0wW
            w0w @jimp
            last edited by

            This post is deleted!
            1 Reply Last reply Reply Quote 0
            • jimpJ
              jimp Rebel Alliance Developer Netgate
              last edited by

              Hmm, a lot of the info in pfctl -si looks suspiciously large.

              Maintenance mode shouldn't have had any effect on pfsync / state data.

              The base and kernel seem to match up, though.

              All that said, PPPoE isn't supported with HA, nor is DHCP, so we don't do any testing in that regard. It's designed for, and only suitable for, static WANs, so who knows what kind of unpredictable results you might have.

              Doing state sync is worthless in your case without a proper static WAN setup so you may as well disable and see if things stabilize. Without static WANs and proper CARP VIPs on all WANs there is no hope of seamless connection failover so pfsync is just complicating matters.

              Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

              Need help fast? Netgate Global Support!

              Do not Chat/PM for help!

              w0wW 1 Reply Last reply Reply Quote 0
              • w0wW
                w0w @jimp
                last edited by

                @jimp said in Growing states table, some leak?:

                All that said, PPPoE isn't supported with HA, nor is DHCP, so we don't do any testing in that regard. It's designed for, and only suitable for, static WANs, so who knows what kind of unpredictable results you might have.

                Yes, I know that PPPoE is not supported. I don't use it in CARP anyway. It is controlled by script, when firewall is not primary it just put it down and vise versa, my ISP also bans if I use more than 1 session for a long time.
                WAN2 is used in CARP. Actually, those DHCPs means static lease on upstream router, so it actually the same all the way. So yes, I use WAN2 CARP IP, this configuration works fine since 2.6 alpha… So what next? Try to disable WAN2 CARP?

                1 Reply Last reply Reply Quote 0
                • jimpJ
                  jimp Rebel Alliance Developer Netgate
                  last edited by

                  Not sure what to suggest since even if they are static in DHCP that's still not a supported configuration. No dynamic WANs are supported.

                  If it worked, consider yourself lucky as it worked by pure luck.

                  The top suspect would be any custom scripts and PPPoE since those are very far from standard.

                  Ideally if anyone else could reproduce it you could find something in common and track down how to reproduce it from a bare minimum configuration. Without more leads, it's difficult to speculate about what might be happening.

                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                  Need help fast? Netgate Global Support!

                  Do not Chat/PM for help!

                  w0wW 1 Reply Last reply Reply Quote 0
                  • w0wW
                    w0w @jimp
                    last edited by

                    @jimp
                    Ok. Already changed to static. Will try to stop the script also.

                    1 Reply Last reply Reply Quote 0
                    • w0wW
                      w0w @Raul Ramos
                      last edited by w0w

                      @Raul-Ramos
                      What HA configuration are you using? Do you have some custom scripts enabled?

                      Raul RamosR 1 Reply Last reply Reply Quote 0
                      • Raul RamosR
                        Raul Ramos @w0w
                        last edited by Raul Ramos

                        @w0w I do not have any custom scripts. Some packages: HAproxy, FreeRadius, acme, zabbix, wiregard, more two or three, nothing fancy.

                        My systems are not standard: Two virtualized instances of Proxmox, diferente boxes, similar hardware, each one have dedicated raw device networks (one em(0/1) other igb(0/1)) with 4 or five Vlans.
                        HA is configured to use Multicast on a VLANs created for SYNC, I tested using a IP but states continue to grow until nothin is usable.

                        At this moment the main instance is on the 23.05.1, backup is on the 23.09, all good, don't know if the stats are synced correctly, backup instance is 7000 down in stats count.

                        I change WAN setting to static. I'll test it later.

                        Raul RamosR 1 Reply Last reply Reply Quote 1
                        • w0wW
                          w0w
                          last edited by w0w

                          CARP is configured to use static IPs, custom script is disabled. States are keep growing and NOT cleaned by
                          c9f283ee-600c-4766-a5eb-1bb9695ecba6-1693246957966-330904f8-1185-4587-88af-ef984b77799b-image.png
                          it does not log anything also.
                          If I do

                          pfctl -F states
                          

                          Then it clears a few thousands of states ex 4300, but GUI shows that overall states is 48000 currently. This is happening on both firewalls. Even if it is lower than critical limit.

                          1 Reply Last reply Reply Quote 0
                          • M
                            marcosm Netgate
                            last edited by marcosm

                            Have you tried to reproduce this with a clean install / minimal config? pfctl not clearing all of the states is certainly unexpected.

                            w0wW 1 Reply Last reply Reply Quote 0
                            • Raul RamosR
                              Raul Ramos @Raul Ramos
                              last edited by

                              This post is deleted!
                              1 Reply Last reply Reply Quote 0
                              • w0wW
                                w0w @marcosm
                                last edited by w0w

                                @marcosm
                                What you mean minimal? It does affect only CARP configurations. Growing stops immediately when CARP is disabled temporary. Tried clean installation with config restoration, tried a lot of other things with no luck. Trying to replicate this with Vbox machines, but currently have some stupid NAT problems…

                                Another fun
                                f1905687-6222-484d-9d8f-98516704d7f5-image.png
                                WTF? Definitely, it has never been enabled. Is it OK? It is happening when I try to uncheck "enable" and press save button.

                                Ok, found some old opt1 (SYNC) config for radvd in config file that does not show up in GUI, deleted it and now I can purely disable SYNC interface.

                                When I disable interface and reboot machine, growing is gone, when I enable interface it starts growing immediately by thousands.

                                1 Reply Last reply Reply Quote 0
                                • w0wW
                                  w0w
                                  last edited by

                                  Subtotals.

                                  Primary firewall               Secondary firewall                   Ghost states
                                  
                                  23.09                               23.05                                    NO
                                  23.09                               23.09                                    YES
                                  23.05                               23.09                                    YES
                                  23.05                               23.05                                    NO
                                  
                                  

                                  Additional tests:
                                  I connected a virtual machine as a second firewall through a real interface (SYNC), where everything is configured minimally, the number of states immediately starts to increase in virtual machine too.
                                  Two virtual machines on 23.09 between each other—no problem, but the test cannot be considered complete, since there is very little real traffic there, perhaps some kind of trigger is missing. Later, there is an idea to replace the main firewall with a virtual machine.
                                  Also tried to replace SYNC interface on both machines with USB card, this changed nothing so at least NIC driver is not suspected, because it is absolutely different vendors all the way.

                                  w0wW 1 Reply Last reply Reply Quote 0
                                  • w0wW
                                    w0w @w0w
                                    last edited by

                                    Virtual machine configured as primary showed this in the CARP maintenance
                                    94eb4e22-28f2-434c-af0f-40e4d2c4824e-image.png
                                    Minimal config. Just one WAN, LAN and SYNC.
                                    When it becomes “master”, not “backup” then states looks normal on both, not growing. 😑

                                    1 Reply Last reply Reply Quote 0
                                    • jimpJ
                                      jimp Rebel Alliance Developer Netgate
                                      last edited by

                                      I ran this by Kristof and he has a couple different theories about what might be happening, but it's not clear exactly based just on the information in the thread, especially since we can't seem to reproduce it in a lab setup.

                                      There are a few ways to help gather info:

                                      • Install a regular kernel from https://www.codepro.be/files/pfSense-kernel-pfSense-23.09.a.20230905.1950.pkg
                                      • Install a debug kernel from https://www.codepro.be/files/pfSense-kernel-debug-pfSense-23.09.a.20230905.1950.pkg

                                      There are potentially some issues with error handling in pf_create_state() that could cause states to be allocated and then lost before they’re connected in the state table. That matches the problem description, although it's not clear how or why this would suddenly start manifesting, and doing so on an inconsistent basis.

                                      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                      Need help fast? Netgate Global Support!

                                      Do not Chat/PM for help!

                                      w0wW 1 Reply Last reply Reply Quote 1
                                      • w0wW
                                        w0w @jimp
                                        last edited by

                                        @jimp
                                        Thank you for your time and attention.
                                        Installed debug kernel. What should I do next?

                                        1 Reply Last reply Reply Quote 0
                                        • jimpJ
                                          jimp Rebel Alliance Developer Netgate
                                          last edited by

                                          Once you are booted into the debug kernel, see if you can reproduce the problem.

                                          If you can, see if anything additional shows up in the system log.

                                          If you don't see the problem on the debug kernel, then that may also confirm that what Kristof attempted to fix there was the actual problem.

                                          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                          Need help fast? Netgate Global Support!

                                          Do not Chat/PM for help!

                                          w0wW 1 Reply Last reply Reply Quote 0
                                          • w0wW
                                            w0w @jimp
                                            last edited by w0w

                                            @jimp
                                            I see it's growing, but not reached its limit. When should I expect anything additional in logs? Can you provide some keywords?

                                            EDIT: Ahh wait... forgot to select the right kernel :)

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.