Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Pfsync kernel panic after 2.1.5 to 2.2 to Upgrade - pfsync_undefer_state

    Scheduled Pinned Locked Moved Problems Installing or Upgrading pfSense Software
    73 Posts 13 Posters 23.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T Offline
      tdale
      last edited by

      @stephenw10 Is there something i can provide to you to help out? We are using two physical machines running two Dell CS24s with dual cpus and 16GB ram each and these are our front line. This is a production environment so i wont be able to make a ton of changes but i can give you information. Just let me know what you need me to post.

      1 Reply Last reply Reply Quote 0
      • stephenw10S Offline
        stephenw10 Netgate Administrator
        last edited by

        If you see any crash reports then we want to see those.
        Other than that it's odd that not everyone running is seeing the same thing. If there's something unusual in your config then maybe we can try to see a pattern.

        Steve

        1 Reply Last reply Reply Quote 0
        • F Offline
          flofogl
          last edited by

          I repeated the upgrade process today after having disabled my floating limiter rules and it worked.

          However, as soon as I enabled any of them the console didn’t stop printing "pfsync_undefer_state: unable to find deferred state". The first time I wasn’t quick enough in disabling the rules and the machine got unresponsive with no CPU load (no crash report).

          After a reboot I was able to disable the rules and the machine stayed responsive. I then deactivated “Synchronize States” and enabled the floating limiter rules. Apart from the known “Bump sched buckets to 256 (was 0)” the console remained unchanged. As soon as I activated state synchronization "pfsync_undefer_state: unable to find deferred state" was back again.

          In contrast to Mathiew I can choose whether to have either HA or limiters.

          This was done on a backup node, the master still runs 2.1.5.

          1 Reply Last reply Reply Quote 0
          • stephenw10S Offline
            stephenw10 Netgate Administrator
            last edited by

            Thanks for that report flofogal. All data is helpful.

            Steve

            1 Reply Last reply Reply Quote 0
            • M Offline
              Mathiew
              last edited by

              I removed my limiters rules from the config and it's working, no more psync error…

              1 Reply Last reply Reply Quote 0
              • stephenw10S Offline
                stephenw10 Netgate Administrator
                last edited by

                Mathiew,
                I have seen one other incidence of this in a single box (not part of a HA setup). IN that case the box previously had a CARP config of some sort and had stray tags in the config file that had not been translated correctly across an update.
                In that instance it was fixed by enabling HA sync, saving, and the disabling HA sync again. Limiters could then be used.

                Steve

                1 Reply Last reply Reply Quote 0
                • F Offline
                  flofogl
                  last edited by

                  Steve,

                  if you say "fixed" it means that limiters could be used without HA afterwards not together with HA. It is a solution to Mathiew's issue only. Correct?

                  Cheers,

                  Florian

                  1 Reply Last reply Reply Quote 0
                  • M Offline
                    Mathiew
                    last edited by

                    @stephenw10:

                    Mathiew,
                    I have seen one other incidence of this in a single box (not part of a HA setup). IN that case the box previously had a CARP config of some sort and had stray tags in the config file that had not been translated correctly across an update.
                    In that instance it was fixed by enabling HA sync, saving, and the disabling HA sync again. Limiters could then be used.

                    Steve

                    I can try, but I never touch any HA/CARP services on this machine.

                    Thanks for your work.

                    EDIT : I reactivated limiters after doing that and no problem so far.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S Offline
                      stephenw10 Netgate Administrator
                      last edited by

                      Yes, still not fixed (though a problem hasn't yet been found) for HA+Limiters. But we had one other case where a stray HA tag in the config was causing this on a standalone box. Which may be a useful clue in itself because the pfsync interface was not actually configured on that box.

                      Steve

                      1 Reply Last reply Reply Quote 0
                      • F Offline
                        flofogl
                        last edited by

                        Steve,

                        I would also like to thank you for inspecting the issue and I hope Mathiew's efforts will prove valuable. However I don't really unserstand what you mean by "a problem hasn't yet been found"? You wrote you were able to reproduce the behavior but the machine stayed responsive. The question is for how long? Once I was on 2.2.1(upgrade from 2.1.5 wihtout limiters enabled), it stayed responsive in my case too after having re-enabled the limiters, but only for a couple of minutes. After that, there was nothing left to do other than "physically" shutting down the machine (no web UI, no SSH, no console). I would consider this a problem…

                        The upgrade process with HA and limiters never worked for me, the box didn't come back up again. As mentioned before, I can test things if needed.

                        Is this something specific to my setup or are limiters in combination with HA not as common as I thought they would be?

                        Thanks,

                        Florian

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S Offline
                          stephenw10 Netgate Administrator
                          last edited by

                          I mean we are not, yet, able to replicate the crashes that you are seeing. We tested for hours with a variety of limiter setups and just saw continuous log spamming. Which itself is not great.  ;)
                          If you have any ability to run this and deliberately cause it to crash and get us the crash report then we have something solid to go on. Right now it looks like the crashes may be secondary to the log spamming in some way.
                          I appreciate all the testing that you guys are doing.

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • M Offline
                            Marlenio
                            last edited by

                            Ho Steve,
                            i made a new test, installing 2.2.1 on my double CARP front firewall. It's a simple configuration, with a IPSec VPN (with four phase 2) and only watchdog as installed package. It seems to be ok. But in this config i don'y use any type of limiter as i do in my back firewall CARP config. Could it be limiter the problem in 2.2.1?

                            Marlenio

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S Offline
                              stephenw10 Netgate Administrator
                              last edited by

                              This is definitely a conflict between Limiters and pfsync removing either of those will solve it. That's not really a solution though.

                              Steve

                              1 Reply Last reply Reply Quote 0
                              • M Offline
                                Marlenio
                                last edited by

                                @stephenw10:

                                This is definitely a conflict between Limiters and pfsync removing either of those will solve it. That's not really a solution though.

                                Steve

                                Yes, i think so. Today i installed my back pfsense CARP configuration, the one with the sync problem. First i uninstalled all limiters and all package, then install 2.2.1. It run vithout problem.

                                Marlenio

                                1 Reply Last reply Reply Quote 0
                                • F Offline
                                  Fira
                                  last edited by

                                  Same problem. Upgrade from 2.1.5 to 2.2.1. Heavily using limiters.

                                  However there's a few differences here:

                                  • No HA/CARP configuration, yet we get the pfsync errors
                                  • These messages occur on traffic from one VLAN whose configuration was changed after upgrade,
                                  • No errors on VLAN whose configuration was not changed since upgrade

                                  I'm not sure to understand why pfsync would trigger at all on one internal VLAN but not another ?
                                  Good luck troubleshooting this..

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S Offline
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Hi Fira,
                                    As I advised Mathiew try enabling HA sync, saving and disabling again.

                                    Steve

                                    1 Reply Last reply Reply Quote 0
                                    • F Offline
                                      Fira
                                      last edited by

                                      Sorry, i must completely have missed that !
                                      Anyway, yeah, i tried to change the HA interface without enabling it (since people suggested enabling HA caused panics), and this solved the problem.

                                      Thanks :)

                                      1 Reply Last reply Reply Quote 0
                                      • stephenw10S Offline
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        Good to hear. Some consistency there at least.  :)

                                        Steve

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S Offline
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          So some patches have gone in to resolve this. They are into the pfsync source which is compiled at build so you can't easily apply them separately.
                                          They should be in recent 2.2.2 snapshots though if anyone is able to test that: http://snapshots.pfsense.org/

                                          Steve

                                          1 Reply Last reply Reply Quote 0
                                          • M Offline
                                            Marlenio
                                            last edited by

                                            Thanks Steve, i try it soon on my back firewall carp config. The 2.2.2 version is "pfSense-Full-Update-2.2.2-DEVELOPMENT-amd64-20150406-0824".

                                            Marlenio

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.