Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Pfsync kernel panic after 2.1.5 to 2.2 to Upgrade - pfsync_undefer_state

    Scheduled Pinned Locked Moved Problems Installing or Upgrading pfSense Software
    73 Posts 13 Posters 23.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M Offline
      Mathiew
      last edited by

      @stephenw10:

      Mathiew,
      I have seen one other incidence of this in a single box (not part of a HA setup). IN that case the box previously had a CARP config of some sort and had stray tags in the config file that had not been translated correctly across an update.
      In that instance it was fixed by enabling HA sync, saving, and the disabling HA sync again. Limiters could then be used.

      Steve

      I can try, but I never touch any HA/CARP services on this machine.

      Thanks for your work.

      EDIT : I reactivated limiters after doing that and no problem so far.

      1 Reply Last reply Reply Quote 0
      • stephenw10S Offline
        stephenw10 Netgate Administrator
        last edited by

        Yes, still not fixed (though a problem hasn't yet been found) for HA+Limiters. But we had one other case where a stray HA tag in the config was causing this on a standalone box. Which may be a useful clue in itself because the pfsync interface was not actually configured on that box.

        Steve

        1 Reply Last reply Reply Quote 0
        • F Offline
          flofogl
          last edited by

          Steve,

          I would also like to thank you for inspecting the issue and I hope Mathiew's efforts will prove valuable. However I don't really unserstand what you mean by "a problem hasn't yet been found"? You wrote you were able to reproduce the behavior but the machine stayed responsive. The question is for how long? Once I was on 2.2.1(upgrade from 2.1.5 wihtout limiters enabled), it stayed responsive in my case too after having re-enabled the limiters, but only for a couple of minutes. After that, there was nothing left to do other than "physically" shutting down the machine (no web UI, no SSH, no console). I would consider this a problem…

          The upgrade process with HA and limiters never worked for me, the box didn't come back up again. As mentioned before, I can test things if needed.

          Is this something specific to my setup or are limiters in combination with HA not as common as I thought they would be?

          Thanks,

          Florian

          1 Reply Last reply Reply Quote 0
          • stephenw10S Offline
            stephenw10 Netgate Administrator
            last edited by

            I mean we are not, yet, able to replicate the crashes that you are seeing. We tested for hours with a variety of limiter setups and just saw continuous log spamming. Which itself is not great.  ;)
            If you have any ability to run this and deliberately cause it to crash and get us the crash report then we have something solid to go on. Right now it looks like the crashes may be secondary to the log spamming in some way.
            I appreciate all the testing that you guys are doing.

            Steve

            1 Reply Last reply Reply Quote 0
            • M Offline
              Marlenio
              last edited by

              Ho Steve,
              i made a new test, installing 2.2.1 on my double CARP front firewall. It's a simple configuration, with a IPSec VPN (with four phase 2) and only watchdog as installed package. It seems to be ok. But in this config i don'y use any type of limiter as i do in my back firewall CARP config. Could it be limiter the problem in 2.2.1?

              Marlenio

              1 Reply Last reply Reply Quote 0
              • stephenw10S Offline
                stephenw10 Netgate Administrator
                last edited by

                This is definitely a conflict between Limiters and pfsync removing either of those will solve it. That's not really a solution though.

                Steve

                1 Reply Last reply Reply Quote 0
                • M Offline
                  Marlenio
                  last edited by

                  @stephenw10:

                  This is definitely a conflict between Limiters and pfsync removing either of those will solve it. That's not really a solution though.

                  Steve

                  Yes, i think so. Today i installed my back pfsense CARP configuration, the one with the sync problem. First i uninstalled all limiters and all package, then install 2.2.1. It run vithout problem.

                  Marlenio

                  1 Reply Last reply Reply Quote 0
                  • F Offline
                    Fira
                    last edited by

                    Same problem. Upgrade from 2.1.5 to 2.2.1. Heavily using limiters.

                    However there's a few differences here:

                    • No HA/CARP configuration, yet we get the pfsync errors
                    • These messages occur on traffic from one VLAN whose configuration was changed after upgrade,
                    • No errors on VLAN whose configuration was not changed since upgrade

                    I'm not sure to understand why pfsync would trigger at all on one internal VLAN but not another ?
                    Good luck troubleshooting this..

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S Offline
                      stephenw10 Netgate Administrator
                      last edited by

                      Hi Fira,
                      As I advised Mathiew try enabling HA sync, saving and disabling again.

                      Steve

                      1 Reply Last reply Reply Quote 0
                      • F Offline
                        Fira
                        last edited by

                        Sorry, i must completely have missed that !
                        Anyway, yeah, i tried to change the HA interface without enabling it (since people suggested enabling HA caused panics), and this solved the problem.

                        Thanks :)

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S Offline
                          stephenw10 Netgate Administrator
                          last edited by

                          Good to hear. Some consistency there at least.  :)

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S Offline
                            stephenw10 Netgate Administrator
                            last edited by

                            So some patches have gone in to resolve this. They are into the pfsync source which is compiled at build so you can't easily apply them separately.
                            They should be in recent 2.2.2 snapshots though if anyone is able to test that: http://snapshots.pfsense.org/

                            Steve

                            1 Reply Last reply Reply Quote 0
                            • M Offline
                              Marlenio
                              last edited by

                              Thanks Steve, i try it soon on my back firewall carp config. The 2.2.2 version is "pfSense-Full-Update-2.2.2-DEVELOPMENT-amd64-20150406-0824".

                              Marlenio

                              1 Reply Last reply Reply Quote 0
                              • M Offline
                                Marlenio
                                last edited by

                                Crash on backup server, no webgui (503 - Service Not Available), no SSL shell, server unresponsive. I need a total reinstall.  :'(

                                Marlenio

                                1 Reply Last reply Reply Quote 0
                                • M Offline
                                  Marlenio
                                  last edited by

                                  Try new 2.2.2 20150412 snaphot. Carp configuration, limiter on HTTP and HTTPS (4 rules on two internal LAN). 2.2.1 on MASTER, 2.2.2 on SLAVE. Try to sync master to slave: crash on SLAVE, no shell, no Webgui, new installation needed.  :(

                                  Marlenio

                                  1 Reply Last reply Reply Quote 0
                                  • M Offline
                                    Marlenio
                                    last edited by

                                    Hi Steve,
                                    the only way i found to have both Carp AND limiters working is uncheck on my four limited rules (HTTP/HTTPS on two internal lan) the flag "State Type" to "NO pfsync".
                                    This is the only way i found to have both active limiters AND Carp. I lost the sync on rules state, but i can sync rule in case of
                                    change. On HTTP/HTTPS, states are not really important for me.

                                    The only problem now is on "Status: System logs: General" -> System, where i found a lot of error like that:

                                    kernel: er_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred state

                                    Please, let me know if you have found a solution.

                                    Marlenio

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S Offline
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Hmm, interesting. The fixes that went in should at the very least have removed the log spamming. Others have reported that it did remove the log spamming and that it ran initially and then crashed after some time with both limiters and pfsync enabled. We still haven't seen a crash report to confirm if this is some new bug though. Are you sure you were running a 20150412 snaphot? Are you running 64bit?

                                      Steve

                                      1 Reply Last reply Reply Quote 0
                                      • M Offline
                                        Marlenio
                                        last edited by

                                        Hi Steve,
                                        actually i'm on 2.2.1 64 bit. My CARPs ar in a production environment, so i cannot make too much test with snapshot. After the problem on last beta (Reply #52), i reinstalled official 2.2.1.

                                        Marlenio

                                        1 Reply Last reply Reply Quote 0
                                        • M Offline
                                          Marlenio
                                          last edited by

                                          This is the error log.

                                          Crash report begins.  Anonymous machine information:

                                          amd64
                                          10.1-RELEASE-p6
                                          FreeBSD 10.1-RELEASE-p6 #0 b69ba8f(releng/10.1)-dirty: Fri Mar 13 08:37:46 CDT 2015    root@pfs22-amd64-builder:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.10

                                          Crash report details:

                                          Filename: /var/crash/bounds
                                          1

                                          Filename: /var/crash/info.0
                                          Dump header from device /dev/da0s1b
                                            Architecture: amd64
                                            Architecture Version: 1
                                            Dump Length: 156160B (0 MB)
                                            Blocksize: 512
                                            Dumptime: Wed Apr 15 14:08:59 2015
                                            Hostname: fw-ld01a.soft.it
                                            Magic: FreeBSD Text Dump
                                            Version String: FreeBSD 10.1-RELEASE-p6 #0 b69ba8f(releng/10.1)-dirty: Fri Mar 13 08:37:46 CDT 2015
                                              root@pfs22-amd64-builder:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.10
                                            Panic String:
                                            Dump Parity: 3106681438
                                            Bounds: 0
                                            Dump Status: good

                                          Filename: /var/crash/info.last
                                          Dump header from device /dev/da0s1b
                                            Architecture: amd64
                                            Architecture Version: 1
                                            Dump Length: 156160B (0 MB)
                                            Blocksize: 512
                                            Dumptime: Wed Apr 15 14:08:59 2015
                                            Hostname: fw-ld01a.soft.it
                                            Magic: FreeBSD Text Dump
                                            Version String: FreeBSD 10.1-RELEASE-p6 #0 b69ba8f(releng/10.1)-dirty: Fri Mar 13 08:37:46 CDT 2015
                                              root@pfs22-amd64-builder:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.10
                                            Panic String:
                                            Dump Parity: 3106681438
                                            Bounds: 0
                                            Dump Status: good

                                          Filename: /var/crash/textdump.tar.0
                                          ddb.txt06000014000012513452333  7073 ustarrootwheeldb:0:kdb.enter.default>  run lockinfo
                                          db:1:lockinfo> show locks
                                          No such command
                                          db:1:locks>  show alllocks
                                          No such command
                                          db:1:alllocks>  show lockedvnods
                                          Locked vnodes
                                          db:0:kdb.enter.default>  show pcpu
                                          cpuid        = 3
                                          dynamic pcpu = 0xfffffe00994fd780
                                          curthread    = 0xfffff800034c9490: pid 12 "irq261: bce0"
                                          curpcb      = 0xfffffe0000373cc0
                                          fpcurthread  = none
                                          idlethread  = 0xfffff80003291000: tid 100006 "idle: cpu3"
                                          curpmap      = 0xffffffff8217a048
                                          tssp        = 0xffffffff821950c8
                                          commontssp  = 0xffffffff821950c8
                                          rsp0        = 0xfffffe0000373cc0
                                          gs32p        = 0xffffffff82196b20
                                          ldt          = 0xffffffff82196b60
                                          tss          = 0xffffffff82196b50
                                          db:0:kdb.enter.default>  bt
                                          Tracing pid 12 tid 100044 td 0xfffff800034c9490
                                          bce_intr() at bce_intr+0x405/frame 0xfffffe0000373b20
                                          intr_event_execute_handlers() at intr_event_execute_handlers+0xab/frame 0xfffffe0000373b60
                                          ithread_loop() at ithread_loop+0x96/frame 0xfffffe0000373bb0
                                          fork_exit() at fork_exit+0x9a/frame 0xfffffe0000373bf0
                                          fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000373bf0
                                          –- trap 0, rip = 0, rsp = 0xfffffe0000373cb0, rbp = 0 ---
                                          db:0:kdb.enter.default>  ps
                                            pid  ppid  pgrp  uid  state  wmesg        wchan        cmd
                                            59    0    0    0  DL      mdwait  0xfffff80003d7b800 [md0]
                                            23    0    0    0  DL      kpsusp  0xfffff80003da35e8 [syncer]
                                            22    0    0    0  DL      kpsusp  0xfffff80003da3aa8 [vnlru]
                                            21    0    0    0  DL      kpsusp  0xfffff80003da4128 [bufdaemon]
                                            20    0    0    0  DL      pgzero  0xffffffff82108e0c [pagezero]
                                            19    0    0    0  DL      pollid  0xffffffff81f64870 [idlepoll]
                                              9    0    0    0  DL      psleep  0xffffffff82108540 [vmdaemon]
                                              8    0    0    0  DL      psleep  0xffffffff82194304 [pagedaemon]
                                              7    0    0    0  DL      waiting_ 0xffffffff82184d70 [sctp_iterator]
                                              6    0    0    0  DL      pftm    0xffffffff80d055e0 [pf purge]
                                            18    0    0    0  DL      cooling  0xfffff8000350c558 [acpi_cooling0]
                                            17    0    0    0  DL      tzpoll  0xffffffff81ea6aa0 [acpi_thermal]
                                            16    0    0    0  DL      (threaded)                  [usb]
                                          100074                  D      -        0xfffffe0000b31ef0 [usbus5]
                                          100073                  D      -        0xfffffe0000b31e98 [usbus5]
                                          100072                  D      -        0xfffffe0000b31e40 [usbus5]
                                          100071                  D      -        0xfffffe0000b31de8 [usbus5]
                                          100069                  D      -        0xfffffe0000b21e18 [usbus4]
                                          100068                  D      -        0xfffffe0000b21dc0 [usbus4]
                                          100067                  D      -        0xfffffe0000b21d68 [usbus4]
                                          100066                  D      -        0xfffffe0000b21d10 [usbus4]
                                          100065                  D      -        0xfffffe0000b11ef0 [usbus3]
                                          100064                  D      -        0xfffffe0000b11e98 [usbus3]
                                          100063                  D      -        0xfffffe0000b11e40 [usbus3]
                                          100062                  D      -        0xfffffe0000b11de8 [usbus3]
                                          100060                  D      -        0xfffffe0000b01ef0 [usbus2]
                                          100059                  D      -        0xfffffe0000b01e98 [usbus2]
                                          100058                  D      -        0xfffffe0000b01e40 [usbus2]
                                          100057                  D      -        0xfffffe0000b01de8 [usbus2]
                                          100055                  D      -        0xfffffe0000af1ef0 [usbus1]
                                          100054                  D      -        0xfffffe0000af1e98 [usbus1]
                                          100053                  D      -        0xfffffe0000af1e40 [usbus1]
                                          100052                  D      -        0xfffffe0000af1de8 [usbus1]
                                          100050                  D      -        0xfffffe0000a64ef0 [usbus0]
                                          100049                  D      -        0xfffffe0000a64e98 [usbus0]
                                          100048                  D      -        0xfffffe0000a64e40 [usbus0]
                                          100047                  D      -        0xfffffe0000a64de8 [usbus0]
                                              5    0    0    0  DL      idle    0xfffffe0000a860f0 [ciss_notify0]
                                              4    0    0    0  DL      (threaded)                  [cam]
                                          100086                  D      -        0xffffffff81e9ea40 [scanner]
                                          100033                  D      -        0xffffffff81e9ec00 [doneq0]
                                              3    0    0    0  DL      crypto_r 0xffffffff82106a10 [crypto returns]
                                              2    0    0    0  DL      crypto_w 0xffffffff821068b8 [crypto]
                                            15    0    0    0  DL      -        0xffffffff81ebc100 [rand_harvestq]
                                            14    0    0    0  DL      (threaded)                  [geom]
                                          100019                  D      -        0xffffffff821794e0 [g_down]
                                          100018                  D      -        0xffffffff821794d8 [g_up]
                                          100017                  D      -        0xffffffff821794d0 [g_event]
                                            13    0    0    0  DL      (threaded)                  [ng_queue]
                                          100016                  D      sleep    0xffffffff81e5cf48 [ng_queue3]
                                          100015                  D      sleep    0xffffffff81e5cf48 [ng_queue2]
                                          100014                  D      sleep    0xffffffff81e5cf48 [ng_queue1]
                                          100013                  D      sleep    0xffffffff81e5cf48 [ng_queue0]
                                            12    0    0    0  RL      (threaded)                  [intr]
                                          100096                  I                                  [swi1: netisr 3]
                                          100095                  I                                  [swi1: netisr 2]
                                          100094                  I                                  [swi1: netisr 1]
                                          100084                  I                                  [swi1: pfsync]
                                          100082                  I                                  [swi1: pf send]
                                          100079                  I                                  [swi0: uart]
                                          100078                  I                                  [irq1: atkbd0]
                                          100075                  I                                  [irq14: ata0]
                                          100070                  I                                  [irq22: uhci4]
                                          100061                  I                                  [irq19: uhci3]
                                          100056                  I                                  [irq18: uhci2]
                                          100051                  I                                  [irq17: uhci1]
                                          100046                  I                                  [irq16: uhci0 ehci0]
                                          100045                  I                                  [irq262: bce1]
                                          100044                  Run    CPU 3                      [irq261: bce0]
                                          100042                  I                                  [irq260: ciss0]
                                          100031                  I                                  [swi5: fast taskq]
                                          100029                  I                                  [swi6: Giant taskq]
                                          100027                  I                                  [swi6: task queue]
                                          100012                  I                                  [swi3: vm]
                                          100011                  I                                  [swi4: clock]
                                          100010                  I                                  [swi4: clock]
                                          100009                  I                                  [swi4: clock]
                                          100008                  L      *Giant    0xfffff80003eb8240 [swi4: clock]
                                          100007                  I                                  [swi1: netisr 0]
                                            11    0    0    0  RL      (threaded)                  [idle]
                                          100006                  CanRun                              [idle: cpu3]
                                          100005                  Run    CPU 2                      [idle: cpu2]
                                          100004                  Run    CPU 1                      [idle: cpu1]
                                          100003                  CanRun                              [idle: cpu0]
                                              1    0    1    0  RLs    CPU 0                      [init]
                                            10    0    0    0  DL      audit_wo 0xffffffff8218b8f0 [audit]
                                              0    0    0    0  DLs    (threaded)                  [kernel]
                                          100138                  D      -        0xfffff80003d50d00 [dummynet]
                                          100085                  D      -        0xfffff800032b2300 [CAM taskq]
                                          100080                  D      -        0xfffff800035da700 [mca taskq]
                                          100041                  D      -        0xfffff800034bd100 [em3 txq]
                                          100040                  D      -        0xfffff800034bd300 [em3 que]
                                          100039                  D      -        0xfffff800034bda00 [em2 txq]
                                          100038                  D      -        0xfffff800034bdc00 [em2 que]
                                          100037                  D      -        0xfffff800034be700 [em1 txq]
                                          100036                  D      -        0xfffff800034be900 [em1 que]
                                          100035                  D      -        0xfffff800034bf100 [em0 txq]
                                          100034                  D      -        0xfffff800034bf300 [em0 que]
                                          100032                  D      -        0xfffff800032b2500 [kqueue taskq]
                                          100030                  D      -        0xfffff800032b2a00

                                          Marlenio

                                          1 Reply Last reply Reply Quote 0
                                          • M Offline
                                            Marlenio
                                            last edited by

                                            Another strange error: i disabled carp sync on MASTER (uncheck "Synchronize States"), but NOT deleting my Configuration Synchronization Settings. Now MASTER sync with SLAVE without any kind of error log.

                                            Apr 15 16:25:13 php-fpm[65408]: /rc.filter_synchronize: Filter sync successfully completed with https://192.168.183.2:443.
                                            Apr 15 16:25:11 php-fpm[65408]: /rc.filter_synchronize: XMLRPC sync successfully completed with https://192.168.183.2:443.
                                            Apr 15 16:25:09 check_reload_status: Reloading filter
                                            Apr 15 16:25:08 php-fpm[65408]: /rc.filter_synchronize: Beginning XMLRPC sync to https://192.168.183.2:443.
                                            Apr 15 16:25:07 check_reload_status: Syncing firewall
                                            Apr 15 16:24:47 php-fpm[18627]: /rc.filter_synchronize: Filter sync successfully completed with https://192.168.183.2:443.
                                            Apr 15 16:24:45 php-fpm[18627]: /rc.filter_synchronize: XMLRPC sync successfully completed with https://192.168.183.2:443.
                                            Apr 15 16:24:43 check_reload_status: Reloading filter
                                            Apr 15 16:24:42 php-fpm[18627]: /rc.filter_synchronize: Beginning XMLRPC sync to https://192.168.183.2:443.
                                            Apr 15 16:24:41 check_reload_status: Syncing firewall
                                            Apr 15 16:23:00 kernel: carp: demoted by 0 to 0 (pfsync bulk fail)
                                            Apr 15 16:23:00 kernel:
                                            Apr 15 16:22:38 php-fpm[70216]: /rc.filter_synchronize: Filter sync successfully completed with https://192.168.183.2:443.
                                            Apr 15 16:22:36 php-fpm[70216]: /rc.filter_synchronize: XMLRPC sync successfully completed with https://192.168.183.2:443.
                                            Apr 15 16:22:33 php-fpm[70216]: /rc.filter_synchronize: Beginning XMLRPC sync to https://192.168.183.2:443.
                                            Apr 15 16:22:32 check_reload_status: Syncing firewall
                                            Apr 15 16:22:11 php-fpm[69480]: /rc.filter_synchronize: Nothing has been configured to be synched. Skipping….

                                            Here i uncheck "Synchronize States"

                                            Apr 15 16:22:10 kernel: pfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred state

                                            Marlenio

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.