Strange error: There were error(s) loading the rules: pfctl: pfctl_rules
-
Sure I've added
pfctl -x loud
to be executed before the main pfctl execution. So just waiting for it to happen again.This router does have the third-party adam:ONE package which does create pf rules in the
userrules
anchor. So it's possible that they are both attempting to add a rule at the same time.
Although this wasn't an issue prior to pfSense Plus 22.05 as previously pfSense would detect the device busy error and attempt it again. But now pfctl locks up for lack of a more technical term and additional attempts fail.I've also seen someone mention they had snort installed which may have the same symptom if snort is adding a block rule while pfctl runs.
-
@artooro Yes that's me who has snort installed.
-
@kprovost said in Strange error: There were error(s) loading the rules: pfctl: pfctl_rules:
EBUSY almost certainly means we've failed in pf_ioctl_addrule(), when checking ticket numbers.
Is there any reason why there is absolutely no debug output when things fail? If there would be a log message like "ticket number x is causing issues" into dmesg it would make this a lot easier to debug. It's not like it's often executed so that additional log output wouldn't really slow things down, especially if it's only executed during failure scenarios.
-
The Snort custom blocking plugin does not add rules when blocking. Instead, it simply adds the offending IP address (or addresses if both source and destination are to be blocked) to the pre-existing snort2c
pf
table.Here is the call Snort makes when blocking:
ioctl(data->fd, DIOCRADDADDRS, &io)
-
@flole We don't log such errors by default, because otherwise the kernel would spend more time logging errors than actually doing useful work.
It's not just the ticket check, there are dozens of checks in DIOCADDRULENV alone. We do have Dtrace probe points for a lot of those, for when we're debugging things, and there is optional debug output at the ticket check, that's why I asked for the
pfctl -x loud
. -
@bmeeks That could potentially also be causing a ticket mismatch in DIOCADDRULENV. There's a ticket number check on the ruleset, and another one for the address list. The
pfctl -x loud
enabled kernel output should let us figure out which one.Given the presence of the DIOCRADDADDRS call from snort I'm 95% confident it'll be the pool ticket, but it'd be good to confirm anyway.
-
If people are feeling adventurous there's an experimental (amd64) kernel here: http://pfsense-build-01.netgate.com/~kp/kernel.tar.bz2
It should prevent pf from getting stuck, where it's unable to update rules.
Extract that in /boot/. It will overwrite the existing kernel there, so ideally test this on test machines, or snapshotted VM's.
(To be clear: this is an amd64 kernel, do not try it on 3100 or 1100 or 2100, because it will not work there.) -
@kprovost said in Strange error: There were error(s) loading the rules: pfctl: pfctl_rules:
feeling adventurous
Always.
@kprovost said in Strange error: There were error(s) loading the rules: pfctl: pfctl_rules:
http://pfsense-build-01.netgate.com/~kp/kernel.tar.bz2
http ? Thought that was ditched some time ago.
pfsense-build-01.netgate.com
is not accessible for the common morsels. -
@gertjan Sorry, I thought that one was public.
Let's try https://people.freebsd.org/~kp/kernel.tar.bz2 then.
-
@kprovost I did get one today with
pfctl -x loud
executed first. Attached here in case it's still helpful.I see the kernel you uploaded, I'll install it on the slave node in the HA cluster, it doesn't happen as often on the slave but at least it's not as risky to test it.
-
I believe the "real" errors can be seen in
dmesg
now, you should at least save the dmesg output somewhere in case it's needed. -
@flole in dmesg right now all I'm seeing is
pf: wire key attach failed on all
messages. Not sure whether it's related at all.
If it's still helpful I can write some more code to capture it at the time of incident. -
This is happening to me too on 22.05. Same "busy" message:
/root: pfctl -Fa
pfctl: pfctl_clear_eth_rules: Device busyMine popped up when trying to modify OpenPVN client settings.
Mine's as close to a virgin install as you can get on self-supplied hardware (2.6.0->22.01->22.05). It ran most of the day on 22.01 with no problem, then I upgraded to latest.
No custom packages, have not touched the file system other than to load one script back in.
-
@kprovost is it possible to get the kernel patch for armv7 (for the SG-3100) as most installs I have exhibiting the issue are using that platform.
-
@artooro Here's a kernel for the 3100. https://people.freebsd.org/~kp/kernel-3100.tar.bz2
I have NOT tested this kernel as I don't have a 3100. Be careful to ensure you don't break your device.
-
@kprovost after installing this kernel patch I was able to observe a collision of pf syscalls and it did not end up in a locked state like it did previously.
So far I'd say this patch is doing the job. -
@kprovost I have also been running with the kernel patch. It seems to have resolved the problem for me as well.
-
Is this intended as "proper" fix or just as a temporary workaround? Or asked differently: Will this be merged like this or will there be a different fix? Is there a diff available somewhere so I can see what was changed?
-
Right now these are test kernels just prove we have found the issue. Now that appears to be the case we will merge it and look at what we can do in existing 22.05 installs.
Steve
-
@flole It's a real fix, not a workaround. It's gone in upstream: https://cgit.freebsd.org/src/commit/?id=6ab80e7275091c900da8d2e84a7b0bb4c34a1e41
and I'll merge it to our local branch just as soon as this test-build finishes.