Pfsync kernel panic after 2.1.5 to 2.2 to Upgrade - pfsync_undefer_state
-
I removed my limiters rules from the config and it's working, no more psync error…
-
Mathiew,
I have seen one other incidence of this in a single box (not part of a HA setup). IN that case the box previously had a CARP config of some sort and had stray tags in the config file that had not been translated correctly across an update.
In that instance it was fixed by enabling HA sync, saving, and the disabling HA sync again. Limiters could then be used.Steve
-
Steve,
if you say "fixed" it means that limiters could be used without HA afterwards not together with HA. It is a solution to Mathiew's issue only. Correct?
Cheers,
Florian
-
Mathiew,
I have seen one other incidence of this in a single box (not part of a HA setup). IN that case the box previously had a CARP config of some sort and had stray tags in the config file that had not been translated correctly across an update.
In that instance it was fixed by enabling HA sync, saving, and the disabling HA sync again. Limiters could then be used.Steve
I can try, but I never touch any HA/CARP services on this machine.
Thanks for your work.
EDIT : I reactivated limiters after doing that and no problem so far.
-
Yes, still not fixed (though a problem hasn't yet been found) for HA+Limiters. But we had one other case where a stray HA tag in the config was causing this on a standalone box. Which may be a useful clue in itself because the pfsync interface was not actually configured on that box.
Steve
-
Steve,
I would also like to thank you for inspecting the issue and I hope Mathiew's efforts will prove valuable. However I don't really unserstand what you mean by "a problem hasn't yet been found"? You wrote you were able to reproduce the behavior but the machine stayed responsive. The question is for how long? Once I was on 2.2.1(upgrade from 2.1.5 wihtout limiters enabled), it stayed responsive in my case too after having re-enabled the limiters, but only for a couple of minutes. After that, there was nothing left to do other than "physically" shutting down the machine (no web UI, no SSH, no console). I would consider this a problem…
The upgrade process with HA and limiters never worked for me, the box didn't come back up again. As mentioned before, I can test things if needed.
Is this something specific to my setup or are limiters in combination with HA not as common as I thought they would be?
Thanks,
Florian
-
I mean we are not, yet, able to replicate the crashes that you are seeing. We tested for hours with a variety of limiter setups and just saw continuous log spamming. Which itself is not great. ;)
If you have any ability to run this and deliberately cause it to crash and get us the crash report then we have something solid to go on. Right now it looks like the crashes may be secondary to the log spamming in some way.
I appreciate all the testing that you guys are doing.Steve
-
Ho Steve,
i made a new test, installing 2.2.1 on my double CARP front firewall. It's a simple configuration, with a IPSec VPN (with four phase 2) and only watchdog as installed package. It seems to be ok. But in this config i don'y use any type of limiter as i do in my back firewall CARP config. Could it be limiter the problem in 2.2.1? -
This is definitely a conflict between Limiters and pfsync removing either of those will solve it. That's not really a solution though.
Steve
-
This is definitely a conflict between Limiters and pfsync removing either of those will solve it. That's not really a solution though.
Steve
Yes, i think so. Today i installed my back pfsense CARP configuration, the one with the sync problem. First i uninstalled all limiters and all package, then install 2.2.1. It run vithout problem.
-
Same problem. Upgrade from 2.1.5 to 2.2.1. Heavily using limiters.
However there's a few differences here:
- No HA/CARP configuration, yet we get the pfsync errors
- These messages occur on traffic from one VLAN whose configuration was changed after upgrade,
- No errors on VLAN whose configuration was not changed since upgrade
I'm not sure to understand why pfsync would trigger at all on one internal VLAN but not another ?
Good luck troubleshooting this.. -
Hi Fira,
As I advised Mathiew try enabling HA sync, saving and disabling again.Steve
-
Sorry, i must completely have missed that !
Anyway, yeah, i tried to change the HA interface without enabling it (since people suggested enabling HA caused panics), and this solved the problem.Thanks :)
-
Good to hear. Some consistency there at least. :)
Steve
-
So some patches have gone in to resolve this. They are into the pfsync source which is compiled at build so you can't easily apply them separately.
They should be in recent 2.2.2 snapshots though if anyone is able to test that: http://snapshots.pfsense.org/Steve
-
Thanks Steve, i try it soon on my back firewall carp config. The 2.2.2 version is "pfSense-Full-Update-2.2.2-DEVELOPMENT-amd64-20150406-0824".
-
Crash on backup server, no webgui (503 - Service Not Available), no SSL shell, server unresponsive. I need a total reinstall. :'(
-
Try new 2.2.2 20150412 snaphot. Carp configuration, limiter on HTTP and HTTPS (4 rules on two internal LAN). 2.2.1 on MASTER, 2.2.2 on SLAVE. Try to sync master to slave: crash on SLAVE, no shell, no Webgui, new installation needed. :(
-
Hi Steve,
the only way i found to have both Carp AND limiters working is uncheck on my four limited rules (HTTP/HTTPS on two internal lan) the flag "State Type" to "NO pfsync".
This is the only way i found to have both active limiters AND Carp. I lost the sync on rules state, but i can sync rule in case of
change. On HTTP/HTTPS, states are not really important for me.The only problem now is on "Status: System logs: General" -> System, where i found a lot of error like that:
kernel: er_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred statepfsync_undefer_state: unable to find deferred state
Please, let me know if you have found a solution.
-
Hmm, interesting. The fixes that went in should at the very least have removed the log spamming. Others have reported that it did remove the log spamming and that it ran initially and then crashed after some time with both limiters and pfsync enabled. We still haven't seen a crash report to confirm if this is some new bug though. Are you sure you were running a 20150412 snaphot? Are you running 64bit?
Steve