• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Growing states table, some leak?

Plus 23.09 Development Snapshots (Retired)
7
41
5.9k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • W
    w0w @marcosm
    last edited by w0w Sep 2, 2023, 9:25 AM Sep 2, 2023, 9:06 AM

    @marcosm
    What you mean minimal? It does affect only CARP configurations. Growing stops immediately when CARP is disabled temporary. Tried clean installation with config restoration, tried a lot of other things with no luck. Trying to replicate this with Vbox machines, but currently have some stupid NAT problems…

    Another fun
    login-to-view
    WTF? Definitely, it has never been enabled. Is it OK? It is happening when I try to uncheck "enable" and press save button.

    Ok, found some old opt1 (SYNC) config for radvd in config file that does not show up in GUI, deleted it and now I can purely disable SYNC interface.

    When I disable interface and reboot machine, growing is gone, when I enable interface it starts growing immediately by thousands.

    1 Reply Last reply Reply Quote 0
    • W
      w0w
      last edited by Sep 3, 2023, 10:59 AM

      Subtotals.

      Primary firewall               Secondary firewall                   Ghost states
      
      23.09                               23.05                                    NO
      23.09                               23.09                                    YES
      23.05                               23.09                                    YES
      23.05                               23.05                                    NO
      
      

      Additional tests:
      I connected a virtual machine as a second firewall through a real interface (SYNC), where everything is configured minimally, the number of states immediately starts to increase in virtual machine too.
      Two virtual machines on 23.09 between each other—no problem, but the test cannot be considered complete, since there is very little real traffic there, perhaps some kind of trigger is missing. Later, there is an idea to replace the main firewall with a virtual machine.
      Also tried to replace SYNC interface on both machines with USB card, this changed nothing so at least NIC driver is not suspected, because it is absolutely different vendors all the way.

      W 1 Reply Last reply Sep 3, 2023, 3:52 PM Reply Quote 0
      • W
        w0w @w0w
        last edited by Sep 3, 2023, 3:52 PM

        Virtual machine configured as primary showed this in the CARP maintenance
        login-to-view
        Minimal config. Just one WAN, LAN and SYNC.
        When it becomes “master”, not “backup” then states looks normal on both, not growing. 😑

        1 Reply Last reply Reply Quote 0
        • J
          jimp Rebel Alliance Developer Netgate
          last edited by Sep 5, 2023, 7:12 PM

          I ran this by Kristof and he has a couple different theories about what might be happening, but it's not clear exactly based just on the information in the thread, especially since we can't seem to reproduce it in a lab setup.

          There are a few ways to help gather info:

          • Install a regular kernel from https://www.codepro.be/files/pfSense-kernel-pfSense-23.09.a.20230905.1950.pkg
          • Install a debug kernel from https://www.codepro.be/files/pfSense-kernel-debug-pfSense-23.09.a.20230905.1950.pkg

          There are potentially some issues with error handling in pf_create_state() that could cause states to be allocated and then lost before they’re connected in the state table. That matches the problem description, although it's not clear how or why this would suddenly start manifesting, and doing so on an inconsistent basis.

          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

          Need help fast? Netgate Global Support!

          Do not Chat/PM for help!

          W 1 Reply Last reply Sep 6, 2023, 2:59 PM Reply Quote 1
          • W
            w0w @jimp
            last edited by Sep 6, 2023, 2:59 PM

            @jimp
            Thank you for your time and attention.
            Installed debug kernel. What should I do next?

            1 Reply Last reply Reply Quote 0
            • J
              jimp Rebel Alliance Developer Netgate
              last edited by Sep 6, 2023, 3:03 PM

              Once you are booted into the debug kernel, see if you can reproduce the problem.

              If you can, see if anything additional shows up in the system log.

              If you don't see the problem on the debug kernel, then that may also confirm that what Kristof attempted to fix there was the actual problem.

              Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

              Need help fast? Netgate Global Support!

              Do not Chat/PM for help!

              W 1 Reply Last reply Sep 6, 2023, 3:29 PM Reply Quote 0
              • W
                w0w @jimp
                last edited by w0w Sep 6, 2023, 6:43 PM Sep 6, 2023, 3:29 PM

                @jimp
                I see it's growing, but not reached its limit. When should I expect anything additional in logs? Can you provide some keywords?

                EDIT: Ahh wait... forgot to select the right kernel :)

                1 Reply Last reply Reply Quote 0
                • W
                  w0w
                  last edited by Sep 7, 2023, 3:39 PM

                  I regret to inform you that the firewall is not available
                  because of
                  login-to-view

                  I don't see anything useful in the log files, nothing new, nothing related to the problem.

                  1 Reply Last reply Reply Quote 0
                  • J
                    jimp Rebel Alliance Developer Netgate
                    last edited by Sep 7, 2023, 6:56 PM

                    Just to confirm, in each of your tests these systems have state synchronization (pfsync) configured and enabled, right?

                    Is it enabled with an IP address for the peer filled in or just enabled and left blank?

                    Is the sync interface private between the two HA nodes alone? Or could there be something else on the sync segment also doing pfsync?

                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                    Need help fast? Netgate Global Support!

                    Do not Chat/PM for help!

                    W 1 Reply Last reply Sep 8, 2023, 3:21 AM Reply Quote 0
                    • W
                      w0w @jimp
                      last edited by Sep 8, 2023, 3:21 AM

                      @jimp said in Growing states table, some leak?:

                      Just to confirm, in each of your tests these systems have state synchronization (pfsync) configured and enabled, right?

                      Yes, exactly

                      @jimp said in Growing states table, some leak?:

                      Is it enabled with an IP address for the peer filled in or just enabled and left blank?

                      'pfsync Synchronize Peer IP', 'Synchronize Config to IP' are filled on the primary node with peer IP address, on the secondary 'pfsync Synchronize Peer IP' only filled.
                      'Set a custom Filter Host ID' is left blank, but I see those generated IDs on both nodes.

                      @jimp said in Growing states table, some leak?:

                      Is the sync interface private between the two HA nodes alone? Or could there be something else on the sync segment also doing pfsync?

                      Direct connection between firewalls, 10.0.88.0 network, not used anywhere else.

                      K J 2 Replies Last reply Sep 8, 2023, 8:18 AM Reply Quote 0
                      • K
                        kprovost @w0w
                        last edited by Sep 8, 2023, 8:18 AM

                        @w0w Can you try turning up pf's debugging? (pfctl -x loud)

                        On a system that's not yet run out of states I suspect you're going to see "pfsync_state_import: unknown route interface: <if name>". That'd confirm my current theory.

                        1 Reply Last reply Reply Quote 1
                        • J
                          jimp Rebel Alliance Developer Netgate @w0w
                          last edited by Sep 8, 2023, 12:22 PM

                          @w0w said in Growing states table, some leak?:

                          'pfsync Synchronize Peer IP', 'Synchronize Config to IP' are filled on the primary node with peer IP address, on the secondary 'pfsync Synchronize Peer IP' only filled.
                          'Set a custom Filter Host ID' is left blank, but I see those generated IDs on both nodes.

                          Maybe I'm reading this wrong but this should be matching on both systems for State Synchronization (pfsync). You should have it enabled on both and have both set with the address of the peer (or both blank) -- it's not like XMLRPC, state sync wants to work in both directions.

                          And related to what Kristof asked above, also check the output of ifconfig -l on both systems, it should (ideally) match so they all have the same interfaces in the OS. What he's noting is that it may be tossing an error if there is a state for a certain type of rule on an interface that does not exist on the peer node. (e.g. PPPoE WAN, maybe a VPN interface in certain cases, that sort of thing)

                          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                          Need help fast? Netgate Global Support!

                          Do not Chat/PM for help!

                          W 1 Reply Last reply Sep 8, 2023, 1:59 PM Reply Quote 0
                          • W
                            w0w @jimp
                            last edited by Sep 8, 2023, 1:59 PM

                            @jimp
                            No this not your reading, it was me writing early morning 😞.
                            State Synchronization enabled on both and have peer IP filled.

                            @jimp said in Growing states table, some leak?:

                            And related to what Kristof asked above, also check the output of ifconfig -l on both systems, it should (ideally) match so they all have the same interfaces in the OS. What he's noting is that it may be tossing an error if there is a state for a certain type of rule on an interface that does not exist on the peer node. (e.g. PPPoE WAN, maybe a VPN interface in certain cases, that sort of thing)

                            Yes, this can be the cause of the problem. WAN2 have different interfaces.

                            @kprovost said in Growing states table, some leak?:

                            Can you try turning up pf's debugging? (pfctl -x loud)

                            Will do that ASAP. Thanks.

                            1 Reply Last reply Reply Quote 0
                            • W
                              w0w
                              last edited by Sep 8, 2023, 2:43 PM

                              @jimp, @kprovost said in Growing states table, some leak?:

                              pfctl -x loud

                              pfsync_state_import: unknown route interface: igb0 
                              

                              Hmm… Yes. That's it.

                              K 1 Reply Last reply Sep 8, 2023, 3:23 PM Reply Quote 0
                              • K
                                kprovost @w0w
                                last edited by Sep 8, 2023, 3:23 PM

                                @w0w Cool. The fix is under review in https://reviews.freebsd.org/D41779 and should easily make 23.09.

                                The only way I see to work around this is to make sure that every interface used as a route-to target exists on all members of the HA setup. You could create dummy interfaces with ifconfig epair name <ifname> if that's difficult for some reason.

                                W 1 Reply Last reply Sep 8, 2023, 5:42 PM Reply Quote 1
                                • W
                                  w0w @kprovost
                                  last edited by w0w Sep 8, 2023, 5:44 PM Sep 8, 2023, 5:42 PM

                                  @kprovost
                                  Thanks a lot!
                                  Not sure if ifconfig epair name <ifname> can be used on pfSense, I think I am going with VLANs instead...
                                  EDIT: No, VLANs can't be used, for the same reason :)

                                  J 1 Reply Last reply Sep 8, 2023, 6:35 PM Reply Quote 0
                                  • J
                                    jimp Rebel Alliance Developer Netgate @w0w
                                    last edited by Sep 8, 2023, 6:35 PM

                                    @w0w In the past, users have used a single interface lagg to work around similar problems. Then the interfaces end up as laggX.<vlan> on both systems.

                                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                    Need help fast? Netgate Global Support!

                                    Do not Chat/PM for help!

                                    W 1 Reply Last reply Sep 8, 2023, 7:02 PM Reply Quote 1
                                    • W
                                      w0w @jimp
                                      last edited by Sep 8, 2023, 7:02 PM

                                      @jimp said in Growing states table, some leak?:

                                      @w0w In the past, users have used a single interface lagg to work around similar problems. Then the interfaces end up as laggX.<vlan> on both systems.

                                      Thank you, sir! 😳
                                      This is definitely a good idea, I've reconfigured both firewalls to use LAGGs, will see…

                                      What about adding this workaround to https://docs.netgate.com/pfsense/en/latest/recipes/high-availability.html ?

                                      1 Reply Last reply Reply Quote 0
                                      • J
                                        jimp Rebel Alliance Developer Netgate
                                        last edited by Sep 8, 2023, 7:07 PM

                                        We only claim to support systems with matching hardware and so on officially, so while it may work it's not something we want to document that much and imply it should be relied upon regularly.

                                        That said, I think it may be mentioned somewhere already in there since it used to be a problem with state matching and pfsync if the interfaces didn't line up in the OS. Different problem but similar workarounds.

                                        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                        Need help fast? Netgate Global Support!

                                        Do not Chat/PM for help!

                                        W 1 Reply Last reply Sep 8, 2023, 7:36 PM Reply Quote 1
                                        • W
                                          w0w @jimp
                                          last edited by Sep 8, 2023, 7:36 PM

                                          @jimp
                                          Ok, great.
                                          It's funny that this leak didn't make itself known earlier.

                                          1 Reply Last reply Reply Quote 0
                                          31 out of 41
                                          • First post
                                            31/41
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.