Pfsync kernel panic after 2.1.5 to 2.2 to Upgrade - pfsync_undefer_state
-
Thanks Steve, i try it soon on my back firewall carp config. The 2.2.2 version is "pfSense-Full-Update-2.2.2-DEVELOPMENT-amd64-20150406-0824".
-
Crash on backup server, no webgui (503 - Service Not Available), no SSL shell, server unresponsive. I need a total reinstall. :'(
-
Try new 2.2.2 20150412 snaphot. Carp configuration, limiter on HTTP and HTTPS (4 rules on two internal LAN). 2.2.1 on MASTER, 2.2.2 on SLAVE. Try to sync master to slave: crash on SLAVE, no shell, no Webgui, new installation needed. :(
-
Hi Steve,
the only way i found to have both Carp AND limiters working is uncheck on my four limited rules (HTTP/HTTPS on two internal lan) the flag "State Type" to "NO pfsync".
This is the only way i found to have both active limiters AND Carp. I lost the sync on rules state, but i can sync rule in case of
change. On HTTP/HTTPS, states are not really important for me.The only problem now is on "Status: System logs: General" -> System, where i found a lot of error like that:
kernel: er_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred state
Please, let me know if you have found a solution.
-
Hmm, interesting. The fixes that went in should at the very least have removed the log spamming. Others have reported that it did remove the log spamming and that it ran initially and then crashed after some time with both limiters and pfsync enabled. We still haven't seen a crash report to confirm if this is some new bug though. Are you sure you were running a 20150412 snaphot? Are you running 64bit?
Steve
-
Hi Steve,
actually i'm on 2.2.1 64 bit. My CARPs ar in a production environment, so i cannot make too much test with snapshot. After the problem on last beta (Reply #52), i reinstalled official 2.2.1. -
This is the error log.
Crash report begins. Anonymous machine information:
amd64
10.1-RELEASE-p6
FreeBSD 10.1-RELEASE-p6 #0 b69ba8f(releng/10.1)-dirty: Fri Mar 13 08:37:46 CDT 2015 root@pfs22-amd64-builder:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.10Crash report details:
Filename: /var/crash/bounds
1Filename: /var/crash/info.0
Dump header from device /dev/da0s1b
Architecture: amd64
Architecture Version: 1
Dump Length: 156160B (0 MB)
Blocksize: 512
Dumptime: Wed Apr 15 14:08:59 2015
Hostname: fw-ld01a.soft.it
Magic: FreeBSD Text Dump
Version String: FreeBSD 10.1-RELEASE-p6 #0 b69ba8f(releng/10.1)-dirty: Fri Mar 13 08:37:46 CDT 2015
root@pfs22-amd64-builder:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.10
Panic String:
Dump Parity: 3106681438
Bounds: 0
Dump Status: goodFilename: /var/crash/info.last
Dump header from device /dev/da0s1b
Architecture: amd64
Architecture Version: 1
Dump Length: 156160B (0 MB)
Blocksize: 512
Dumptime: Wed Apr 15 14:08:59 2015
Hostname: fw-ld01a.soft.it
Magic: FreeBSD Text Dump
Version String: FreeBSD 10.1-RELEASE-p6 #0 b69ba8f(releng/10.1)-dirty: Fri Mar 13 08:37:46 CDT 2015
root@pfs22-amd64-builder:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.10
Panic String:
Dump Parity: 3106681438
Bounds: 0
Dump Status: goodFilename: /var/crash/textdump.tar.0
ddb.txt06000014000012513452333 7073 ustarrootwheeldb:0:kdb.enter.default> run lockinfo
db:1:lockinfo> show locks
No such command
db:1:locks> show alllocks
No such command
db:1:alllocks> show lockedvnods
Locked vnodes
db:0:kdb.enter.default> show pcpu
cpuid = 3
dynamic pcpu = 0xfffffe00994fd780
curthread = 0xfffff800034c9490: pid 12 "irq261: bce0"
curpcb = 0xfffffe0000373cc0
fpcurthread = none
idlethread = 0xfffff80003291000: tid 100006 "idle: cpu3"
curpmap = 0xffffffff8217a048
tssp = 0xffffffff821950c8
commontssp = 0xffffffff821950c8
rsp0 = 0xfffffe0000373cc0
gs32p = 0xffffffff82196b20
ldt = 0xffffffff82196b60
tss = 0xffffffff82196b50
db:0:kdb.enter.default> bt
Tracing pid 12 tid 100044 td 0xfffff800034c9490
bce_intr() at bce_intr+0x405/frame 0xfffffe0000373b20
intr_event_execute_handlers() at intr_event_execute_handlers+0xab/frame 0xfffffe0000373b60
ithread_loop() at ithread_loop+0x96/frame 0xfffffe0000373bb0
fork_exit() at fork_exit+0x9a/frame 0xfffffe0000373bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000373bf0
–- trap 0, rip = 0, rsp = 0xfffffe0000373cb0, rbp = 0 ---
db:0:kdb.enter.default> ps
pid ppid pgrp uid state wmesg wchan cmd
59 0 0 0 DL mdwait 0xfffff80003d7b800 [md0]
23 0 0 0 DL kpsusp 0xfffff80003da35e8 [syncer]
22 0 0 0 DL kpsusp 0xfffff80003da3aa8 [vnlru]
21 0 0 0 DL kpsusp 0xfffff80003da4128 [bufdaemon]
20 0 0 0 DL pgzero 0xffffffff82108e0c [pagezero]
19 0 0 0 DL pollid 0xffffffff81f64870 [idlepoll]
9 0 0 0 DL psleep 0xffffffff82108540 [vmdaemon]
8 0 0 0 DL psleep 0xffffffff82194304 [pagedaemon]
7 0 0 0 DL waiting_ 0xffffffff82184d70 [sctp_iterator]
6 0 0 0 DL pftm 0xffffffff80d055e0 [pf purge]
18 0 0 0 DL cooling 0xfffff8000350c558 [acpi_cooling0]
17 0 0 0 DL tzpoll 0xffffffff81ea6aa0 [acpi_thermal]
16 0 0 0 DL (threaded) [usb]
100074 D - 0xfffffe0000b31ef0 [usbus5]
100073 D - 0xfffffe0000b31e98 [usbus5]
100072 D - 0xfffffe0000b31e40 [usbus5]
100071 D - 0xfffffe0000b31de8 [usbus5]
100069 D - 0xfffffe0000b21e18 [usbus4]
100068 D - 0xfffffe0000b21dc0 [usbus4]
100067 D - 0xfffffe0000b21d68 [usbus4]
100066 D - 0xfffffe0000b21d10 [usbus4]
100065 D - 0xfffffe0000b11ef0 [usbus3]
100064 D - 0xfffffe0000b11e98 [usbus3]
100063 D - 0xfffffe0000b11e40 [usbus3]
100062 D - 0xfffffe0000b11de8 [usbus3]
100060 D - 0xfffffe0000b01ef0 [usbus2]
100059 D - 0xfffffe0000b01e98 [usbus2]
100058 D - 0xfffffe0000b01e40 [usbus2]
100057 D - 0xfffffe0000b01de8 [usbus2]
100055 D - 0xfffffe0000af1ef0 [usbus1]
100054 D - 0xfffffe0000af1e98 [usbus1]
100053 D - 0xfffffe0000af1e40 [usbus1]
100052 D - 0xfffffe0000af1de8 [usbus1]
100050 D - 0xfffffe0000a64ef0 [usbus0]
100049 D - 0xfffffe0000a64e98 [usbus0]
100048 D - 0xfffffe0000a64e40 [usbus0]
100047 D - 0xfffffe0000a64de8 [usbus0]
5 0 0 0 DL idle 0xfffffe0000a860f0 [ciss_notify0]
4 0 0 0 DL (threaded) [cam]
100086 D - 0xffffffff81e9ea40 [scanner]
100033 D - 0xffffffff81e9ec00 [doneq0]
3 0 0 0 DL crypto_r 0xffffffff82106a10 [crypto returns]
2 0 0 0 DL crypto_w 0xffffffff821068b8 [crypto]
15 0 0 0 DL - 0xffffffff81ebc100 [rand_harvestq]
14 0 0 0 DL (threaded) [geom]
100019 D - 0xffffffff821794e0 [g_down]
100018 D - 0xffffffff821794d8 [g_up]
100017 D - 0xffffffff821794d0 [g_event]
13 0 0 0 DL (threaded) [ng_queue]
100016 D sleep 0xffffffff81e5cf48 [ng_queue3]
100015 D sleep 0xffffffff81e5cf48 [ng_queue2]
100014 D sleep 0xffffffff81e5cf48 [ng_queue1]
100013 D sleep 0xffffffff81e5cf48 [ng_queue0]
12 0 0 0 RL (threaded) [intr]
100096 I [swi1: netisr 3]
100095 I [swi1: netisr 2]
100094 I [swi1: netisr 1]
100084 I [swi1: pfsync]
100082 I [swi1: pf send]
100079 I [swi0: uart]
100078 I [irq1: atkbd0]
100075 I [irq14: ata0]
100070 I [irq22: uhci4]
100061 I [irq19: uhci3]
100056 I [irq18: uhci2]
100051 I [irq17: uhci1]
100046 I [irq16: uhci0 ehci0]
100045 I [irq262: bce1]
100044 Run CPU 3 [irq261: bce0]
100042 I [irq260: ciss0]
100031 I [swi5: fast taskq]
100029 I [swi6: Giant taskq]
100027 I [swi6: task queue]
100012 I [swi3: vm]
100011 I [swi4: clock]
100010 I [swi4: clock]
100009 I [swi4: clock]
100008 L *Giant 0xfffff80003eb8240 [swi4: clock]
100007 I [swi1: netisr 0]
11 0 0 0 RL (threaded) [idle]
100006 CanRun [idle: cpu3]
100005 Run CPU 2 [idle: cpu2]
100004 Run CPU 1 [idle: cpu1]
100003 CanRun [idle: cpu0]
1 0 1 0 RLs CPU 0 [init]
10 0 0 0 DL audit_wo 0xffffffff8218b8f0 [audit]
0 0 0 0 DLs (threaded) [kernel]
100138 D - 0xfffff80003d50d00 [dummynet]
100085 D - 0xfffff800032b2300 [CAM taskq]
100080 D - 0xfffff800035da700 [mca taskq]
100041 D - 0xfffff800034bd100 [em3 txq]
100040 D - 0xfffff800034bd300 [em3 que]
100039 D - 0xfffff800034bda00 [em2 txq]
100038 D - 0xfffff800034bdc00 [em2 que]
100037 D - 0xfffff800034be700 [em1 txq]
100036 D - 0xfffff800034be900 [em1 que]
100035 D - 0xfffff800034bf100 [em0 txq]
100034 D - 0xfffff800034bf300 [em0 que]
100032 D - 0xfffff800032b2500 [kqueue taskq]
100030 D - 0xfffff800032b2a00 -
Another strange error: i disabled carp sync on MASTER (uncheck "Synchronize States"), but NOT deleting my Configuration Synchronization Settings. Now MASTER sync with SLAVE without any kind of error log.
Apr 15 16:25:13 php-fpm[65408]: /rc.filter_synchronize: Filter sync successfully completed with https://192.168.183.2:443.
Apr 15 16:25:11 php-fpm[65408]: /rc.filter_synchronize: XMLRPC sync successfully completed with https://192.168.183.2:443.
Apr 15 16:25:09 check_reload_status: Reloading filter
Apr 15 16:25:08 php-fpm[65408]: /rc.filter_synchronize: Beginning XMLRPC sync to https://192.168.183.2:443.
Apr 15 16:25:07 check_reload_status: Syncing firewall
Apr 15 16:24:47 php-fpm[18627]: /rc.filter_synchronize: Filter sync successfully completed with https://192.168.183.2:443.
Apr 15 16:24:45 php-fpm[18627]: /rc.filter_synchronize: XMLRPC sync successfully completed with https://192.168.183.2:443.
Apr 15 16:24:43 check_reload_status: Reloading filter
Apr 15 16:24:42 php-fpm[18627]: /rc.filter_synchronize: Beginning XMLRPC sync to https://192.168.183.2:443.
Apr 15 16:24:41 check_reload_status: Syncing firewall
Apr 15 16:23:00 kernel: carp: demoted by 0 to 0 (pfsync bulk fail)
Apr 15 16:23:00 kernel:
Apr 15 16:22:38 php-fpm[70216]: /rc.filter_synchronize: Filter sync successfully completed with https://192.168.183.2:443.
Apr 15 16:22:36 php-fpm[70216]: /rc.filter_synchronize: XMLRPC sync successfully completed with https://192.168.183.2:443.
Apr 15 16:22:33 php-fpm[70216]: /rc.filter_synchronize: Beginning XMLRPC sync to https://192.168.183.2:443.
Apr 15 16:22:32 check_reload_status: Syncing firewall
Apr 15 16:22:11 php-fpm[69480]: /rc.filter_synchronize: Nothing has been configured to be synched. Skipping….Here i uncheck "Synchronize States"
Apr 15 16:22:10 kernel: pfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred state
-
The log spam should be fixed in 2.2.2. It's pointless to test further on 2.2.1.
-
As i said in my post, i tried more than four snapshot, and i found a lot of problems. I made last test today with last snapshot, same problem, except log spam.
Crash report is for Steve. -
So this issue has not been fixed in 2.2.2 ? :(
-
Yes, it seems to be fixed. CARP working, limiter too, and cpu load in MASTER server is very low. :)
-
Interesting. I'm not aware of any specific fixes that have gone in for this since the earlier snapshots. Perhaps something else was causing the issue.
I'm sure we'll see many more reports in the next few days to condom or deny this.Steve
-
Two CARP upgraded (each one with two server), the first yesterday, the last one today. Works without problems: low CPU, CARP without error and log spam, and the back CARP with limiter on bandwidth. I don't know why, but 2.2.2 really works. :)
-
In 2.2.2 there is still a problem i found in 2.2.1, as i said before: on limited rules, i need to unchek the flag "State Type" to "NO pfsync". If i don't, CARP Slave randomly crash.
As i said before, in my CARP configuration limited rules are only HTTP and HTTPS: the state is not really important for me. However, 2.2.2 it is much better than 2.2.1. :) -
Fixed all my issues! I made my limiter rules, set them to 10Mbps, applied the firewall and did a test :)
[omitted@omitted ~]# wget cachefly.cachefly.net/100mb.test
–2015-04-16 14:02:53-- http://cachefly.cachefly.net/100mb.test
Resolving cachefly.cachefly.net... 205.234.175.175
Connecting to cachefly.cachefly.net|205.234.175.175|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104857600 (100M) [application/octet-stream]
Saving to: “100mb.test.7”100%[======================================>] 104,857,600 1.18M/s in 89s
2015-04-16 14:04:21 (1.13 MB/s) - “100mb.test.7” saved [104857600/104857600]
I tested it a few times :)
THANK YOU!!
-
Please is there any progress? I am also still having problems with this bug.
Thanks -
Have you tried a 2.2.3 snapshot, if you are able to?
There have been fixes gone in to address this though it seems not all cases have been resolved. Some here have reported success.Steve
-
I sent a couple crash logs for this yesterday. This is on version 2.2.4.
-
Interesting to find this thread. We didn't have this issue until upgrading from 2.2.6 to 2.3.1_5. It does look like it is logged during a nightly copy that triggers a limiter, ending when the traffic stops.