Pfsync kernel panic after 2.1.5 to 2.2 to Upgrade - pfsync_undefer_state

bernardo

Hi,

I have a 2 real machines configured on typical a failover setup, with a dedicated pfsync connection between them. After the upgrade, both machines started having kernel panics a few minutes after boot completion. They indicate the following panic message: "pfsync_undefer_state: unable to find deferred state". The complete crash dump is here (can't attach here, as it is too large): https://gist.github.com/anonymous/16fce0e2fa29ea6dd53a

I figured out the kernel panics disappear if I disable the "Synchronize States" option under "High Availability Sync". Even if only one machine has the "Synchronize States" enabled it will crash (but the other won't). XMLRPC sync and CARP are still enabled, and work as expected. pfsync was working fine before the upgrade, synchronising the firewall states between the servers.

If anyone successfully using pfsync on a 2.2 machine? I've seem others are reporting the same error on similar failover setups (their setups were virtualised, mine isn't). Guess I will have to downgrade…

Best, Bernardo

Update: I tried a fresh install of 2.2, uploaded the config backup, and got the same (bad) results when "Synchronize States" was enabled.

flofogl

Hi bernardo,

I posted a similar problem in a virtualized environment yesterday but haven't got an answer yet. Thank you for the information that disabling "Synchronize States" is enough to make the panic disappear.

Have you tried upgrading both machines without snychronization or was one machine always on version 2.1.5 when you reenabled "Synchronize States"? Maybe synchronizing between two machines running version 2.2 works!? I am currently not on-site and I don't want to test it remotely, but maybe at the end of the week.

Cheers

Florian

fragged

I just set up a CARP HA setup in a VM (VirtualBox) and it works fine with 2.2 <> 2.2. I'm fairly sure that syncing between 2.1.X and 2.2 will cause trouble as the base operating system has changed from FreeBSD 8.3 to 10.1.

Edit:
The blog post about 2.2 has been updated with this:
@pfSense:

Limiters not working with High Availability

If you’re using limiters and high availability (CARP+pfsync+config sync), do not upgrade at this time. We have an open bug on a crash in this circumstance.

Bug: https://redmine.pfsense.org/issues/4310
Blog: https://blog.pfsense.org/?p=1546

flofogl

Hi fragged,

thank you for the information. That means that in theory I could do an upgrade on both machines with state snychronization and limiter rules disabled and turn them back on atferwards, correct?

Florian

fragged

@flofogl:

Hi fragged,

thank you for the information. That means that in theory I could do an upgrade on both machines with state snychronization and limiter rules disabled and turn them back on atferwards, correct?

Florian

From the blog post / bug report it looks like limiters will cause a kernel panic when CARP HA is used with 2.2. I have tested CARP without limiters.

lowprofile

@fragged:

@flofogl:

Hi fragged,

thank you for the information. That means that in theory I could do an upgrade on both machines with state snychronization and limiter rules disabled and turn them back on atferwards, correct?

Florian

From the blog post / bug report it looks like limiters will cause a kernel panic when CARP HA is used with 2.2. I have tested CARP without limiters.

Looks pretty much as what i experienced 2 days ago. I have another thread where discussing it.
Can you link to the bug report?

EDIT: bug report found: https://redmine.pfsense.org/issues/4310

flofogl

From the blog post / bug report it looks like limiters will cause a kernel panic when CARP HA is used with 2.2. I have tested CARP without limiters.

Sorry, I didn't get it at first. I thought it was only if one node was on version 2.2 and the other one still on version 2.1.5. According to the bug report it now seems that synchronizing limiter rules is simply broken in version 2.2 and will cause a panic regardless of the version of the other nodes.

bernardo

Hi All,

I do have limiters enabled, thank you very much, @fragged, I will try to disable the limiters and see how it goes. Let's hope the great pfsense team is able to fix this soon.

@flofogl: At first I upgraded one host only, but then the problem persisted after both hosts were upgraded to 2.2. One host (at 2.2) would crash even if the other was turned off.

Disabling "Synchronize States" is a workaround, though, as connections won't be maintained when master and backup change roles.

Best, Bernardo

bernardo

Following up, I disabled my limiters (didn't delete them, just disabled) and then enabled "Synchronize States", and the kernel panics stopped. Right after I enabled the "Synchronize States" back I got another panic reboot on my master, which got me worried. But after it came back both machines have been stable for a couple of hours now, in production (routing about 50 employees to 3 Wans totalling 280Mbits of bandwidth).

Besides HA + Carp, I use Multi Wan with failover (3 Wan links), policy based routing on my firewall rules, traffic shapper (with HFSC), IPSec VPN, DNS Forwarder. Everything works apparently so far. But I miss my limiters… :(

I will report if I find anything else.

Best, Bernardo

stephenw10

Have any of you tried a 2.2.1 snapshot to confirm this is fixed?
The bug reported listed above is marked resolved but it doesn't match the symptoms described here exactly.

Steve

flofogl

Hi Steve,

it might be a little late now and I don't know whether it is related to the original issue but there seem to be still issues related to CARP. I tried an upgrade from 2.1.5 to 2.2.1 (RELEASE) as described in my post here with 2.2 (RELEASE): https://forum.pfsense.org/index.php?topic=87485.msg480549#msg480549

I get "pfsync_undefer_state: unable to find deferred state" printed in the console and the it just hangs after the upgrade process (after reboot). Since it is a virtual machine (a backup node) I can easily revert and try again if you you want me to test something. I even tried to restore the configuration on a fresh install with 2.2.1. I got the same error message printed all over the screen.

Florian

Marlenio

Same error on my CARP installation upgrade from 2.1.5 to 2.2.1. Back on 2.1.5 :(

lowprofile

@Marlenio:

Same error on my CARP installation upgrade from 2.1.5 to 2.2.1. Back on 2.1.5 :(

Oh, that was some very sad news. I was really looking forward to have this carp kernel thing fixes :(

stephenw10

If any of you have a chance to test this in 2.2.1-rel and submit a crash report we'd love to see it.

Steve

Marlenio

@stephenw10:

If any of you have a chance to test this in 2.2.1-rel and submit a crash report we'd love to see it.

Steve

Hi Steve, yesterday i have sent 3 crash log about this error.

stephenw10

Awesome, can you send me the IP they came from? Use a PM if you want.

Steve

Marlenio

'm sorry, i had switch back on 2.1.5. :(

stephenw10

Ok, so you don't know what IP they were sent from?

Marlenio

@stephenw10:

Ok, so you don't know what IP they were sent from?

Sure. :-) 213.215.138.68

stephenw10

Great. We are trying to replicate this but are just seeing continuous error messages without the crash.
Do any of you have any special Limiter setup? Can you give any details?

Steve