Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Pfsync kernel panic after 2.1.5 to 2.2 to Upgrade - pfsync_undefer_state

    Scheduled Pinned Locked Moved Problems Installing or Upgrading pfSense Software
    73 Posts 13 Posters 22.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F Offline
      fragged
      last edited by

      @flofogl:

      Hi fragged,

      thank you for the information. That means that in theory I could do an upgrade on both machines with state snychronization and limiter rules disabled and turn them back on atferwards, correct?

      Florian

      From the blog post / bug report it looks like limiters will cause a kernel panic when CARP HA is used with 2.2. I have tested CARP without limiters.

      1 Reply Last reply Reply Quote 0
      • L Offline
        lowprofile
        last edited by

        @fragged:

        @flofogl:

        Hi fragged,

        thank you for the information. That means that in theory I could do an upgrade on both machines with state snychronization and limiter rules disabled and turn them back on atferwards, correct?

        Florian

        From the blog post / bug report it looks like limiters will cause a kernel panic when CARP HA is used with 2.2. I have tested CARP without limiters.

        Looks pretty much as what i experienced 2 days ago. I have another thread where discussing it.
        Can you link to the bug report?

        EDIT: bug report found: https://redmine.pfsense.org/issues/4310

        1 Reply Last reply Reply Quote 0
        • F Offline
          flofogl
          last edited by

          From the blog post / bug report it looks like limiters will cause a kernel panic when CARP HA is used with 2.2. I have tested CARP without limiters.

          Sorry, I didn't get it at first. I thought it was only if one node was on version 2.2 and the other one still on version 2.1.5. According to the bug report it now seems that synchronizing limiter rules is simply broken in version 2.2 and will cause a panic regardless of the version of the other nodes.

          1 Reply Last reply Reply Quote 0
          • B Offline
            bernardo
            last edited by

            Hi All,

            I do have limiters enabled, thank you very much, @fragged, I will try to disable the limiters and see how it goes. Let's hope the great pfsense team is able to fix this soon.

            @flofogl: At first I upgraded one host only, but then the problem persisted after both hosts were upgraded to 2.2. One host (at 2.2) would crash even if the other was turned off.

            Disabling "Synchronize States" is a workaround, though, as connections won't be maintained when master and backup change roles.

            Best, Bernardo

            1 Reply Last reply Reply Quote 0
            • B Offline
              bernardo
              last edited by

              Following up, I disabled my limiters (didn't delete them, just disabled) and then enabled "Synchronize States", and the kernel panics stopped. Right after I enabled the "Synchronize States" back I got another panic reboot on my master, which got me worried. But after it came back both machines have been stable for a couple of hours now, in production (routing about 50 employees to 3 Wans totalling 280Mbits of bandwidth).

              Besides HA + Carp, I use Multi Wan with failover (3 Wan links), policy based routing on my firewall rules,  traffic shapper (with HFSC), IPSec VPN, DNS Forwarder. Everything works apparently so far. But I miss my limiters… :(

              I will report if I find anything else.

              Best, Bernardo

              1 Reply Last reply Reply Quote 0
              • stephenw10S Offline
                stephenw10 Netgate Administrator
                last edited by

                Have any of you tried a 2.2.1 snapshot to confirm this is fixed?
                The bug reported listed above is marked resolved but it doesn't match the symptoms described here exactly.

                Steve

                1 Reply Last reply Reply Quote 0
                • F Offline
                  flofogl
                  last edited by

                  Hi Steve,

                  it might be a little late now and I don't know whether it is related to the original issue but there seem to be still issues related to CARP. I tried an upgrade from 2.1.5 to 2.2.1 (RELEASE) as described in my post here with 2.2 (RELEASE): https://forum.pfsense.org/index.php?topic=87485.msg480549#msg480549

                  I get "pfsync_undefer_state: unable to find deferred state" printed in the console and the it just hangs after the upgrade process (after reboot). Since it is a virtual machine (a backup node) I can easily revert and try again if you you want me to test something. I even tried to restore the configuration on a fresh install with 2.2.1. I got the same error message printed all over the screen.

                  Florian

                  1 Reply Last reply Reply Quote 0
                  • M Offline
                    Marlenio
                    last edited by

                    Same error on my CARP installation upgrade from 2.1.5 to 2.2.1. Back on 2.1.5  :(

                    Marlenio

                    1 Reply Last reply Reply Quote 0
                    • L Offline
                      lowprofile
                      last edited by

                      @Marlenio:

                      Same error on my CARP installation upgrade from 2.1.5 to 2.2.1. Back on 2.1.5  :(

                      Oh, that was some very sad news. I was really looking forward to have this carp kernel thing fixes  :(

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S Offline
                        stephenw10 Netgate Administrator
                        last edited by

                        If any of you have a chance to test this in 2.2.1-rel and submit a crash report we'd love to see it.

                        Steve

                        1 Reply Last reply Reply Quote 0
                        • M Offline
                          Marlenio
                          last edited by

                          @stephenw10:

                          If any of you have a chance to test this in 2.2.1-rel and submit a crash report we'd love to see it.

                          Steve

                          Hi Steve, yesterday i have sent 3 crash log about this error.

                          Marlenio

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S Offline
                            stephenw10 Netgate Administrator
                            last edited by

                            Awesome, can you send me the IP they came from? Use a PM if you want.

                            Steve

                            1 Reply Last reply Reply Quote 0
                            • M Offline
                              Marlenio
                              last edited by

                              'm sorry, i had switch back on 2.1.5. :(

                              Marlenio

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S Offline
                                stephenw10 Netgate Administrator
                                last edited by

                                Ok, so you don't know what IP they were sent from?

                                1 Reply Last reply Reply Quote 0
                                • M Offline
                                  Marlenio
                                  last edited by

                                  @stephenw10:

                                  Ok, so you don't know what IP they were sent from?

                                  Sure. :-) 213.215.138.68

                                  Marlenio

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S Offline
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Great. We are trying to replicate this but are just seeing continuous error messages without the crash.
                                    Do any of you have any special Limiter setup? Can you give any details?

                                    Steve

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S Offline
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Marlenio,
                                      Looks like the most recent crash report we have from that IP is Mar 3rd. Could they have come from a different IP?

                                      Steve

                                      1 Reply Last reply Reply Quote 0
                                      • M Offline
                                        Marlenio
                                        last edited by

                                        @stephenw10:

                                        Marlenio,
                                        Looks like the most recent crash report we have from that IP is Mar 3rd. Could they have come from a different IP?

                                        Steve

                                        Hi steve,
                                        213.215.138 is VIP of the first output array of pfSense (2 units HA mode). Master IP is 213.215.138.67, BACKUP 213.215.138.71. Let me know if you find it.

                                        Thanks in advance,

                                        –
                                        Mario (Marlenio)

                                        Marlenio

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S Offline
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Nothing from anything in that /24 subnet since Mar 3rd.  :-\

                                          Steve

                                          1 Reply Last reply Reply Quote 0
                                          • M Offline
                                            Marlenio
                                            last edited by

                                            @stephenw10:

                                            Nothing from anything in that /24 subnet since Mar 3rd.  :-\

                                            Steve

                                            It 's very strange. I'm sure it was sent at least three times.  :(

                                            Marlenio

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.