Strange error: There were error(s) loading the rules: pfctl: pfctl_rules
-
@stephenw10 Maybe it would help to add additional debug output in pf's code? Is it clear where the wrong branch is taken/where the error is actually thrown? If that's unclear it's probably a good idea to figure that out first. One possible way is probably the one I described above, but is there another one? If it's clear what state the firewall ends up in then it's easier to figure out potential ways how it could end up in that state.
-
@stephenw10 thanks, what I'll attempt tomorrow is to edit https://github.com/pfsense/pfsense/blob/60a2fa6b6f1a59f3f86933265fbb48e25f652bfc/src/etc/inc/filter.inc#L527 to use truss and output to a log file, and see if we can get something helpful there.
As I have a couple of systems where it's pretty easy to reproduce. -
@artooro I'd make a copy of every rules-file that's loaded there aswell. Just to see if loading those files in the same order causes the issue.
-
After some reboots it started working again for me aswell. I noticed that my GIF tunnels did not come up this time. Maybe it is related? Are others affected by this also using (multiple) GIF tunnels?
-
@flole said in Strange error: There were error(s) loading the rules: pfctl: pfctl_rules:
After some reboots it started working again for me aswell. I noticed that my GIF tunnels did not come up this time. Maybe it is related? Are others affected by this also using (multiple) GIF tunnels?
I have a single GIF tunnel (a 6in4 for HE Tunnelbroker).
-
Hmm, that could be a clue. Though I haven't seen it on any test box I have that has GIF tunnels.So maybe something else required on that specifically.
-
I've been able to capture the initial pfctl that fails via truss which I've attached here:
1-truss_pfctl_1660664796.txtI don't have any GIF tunnels, but do use WireGuard interfaces.
And if anyone is interested in capturing this themselves, what I did is edit /etc/src/filter.inc and commented out line 527 and added the following quick and dirty code:
$_grbg = exec("/usr/bin/truss /sbin/pfctl -o basic -f {$g['tmp_path']}/rules.debug 2>&1", $rules_error, $rules_loading); $rval = 0; if ($rules_error[count($rules_error)-1] == "process exit, rval = 1") { $rval = 1; file_notice("filter_load", sprintf("pfctl process exit, rval = 1"), "Filter Reload", ""); } $date = new DateTime(); $truss_filename = "/root/" . $rval . "-truss_pfctl_" . $date->getTimestamp() . ".txt"; file_put_contents($truss_filename, implode("\n", $rules_error));
-
@artooro Maybe you should post the one that ran right before it aswell, just in case something there was already messed up. But it looks promising to me, I believe @kprovost is the pfctl and Kernel expert, maybe he can spot a potential bug based on that truss-output?
Keep in mind that if you're using the captive portal pfctl is invoked at (at least) one other location aswell.
-
Yup I pinged him. Late where he is though, I hope he's enjoying a beer by now.
-
@flole I checked and the previous run is the same as all the successful invocations.
-
@artooro Can you add an
pfctl -x loud
invocation before that truss'd pfctl?That truss output was already more illuminating, but I'm still not able to reproduce anything like this issue.
It seems to fail with
ioctl(3,DIOCADDRULENV,0xbfbfdae4) ERR#16 'Device busy'
, after it's already added a bunch of other rules without issue.
EBUSY almost certainly means we've failed inpf_ioctl_addrule()
, when checking ticket numbers.So I'm guessing we're seeing a race condition here, where something else (possibly another pfctl, possibly something else that adds rules or addresses) is running at the same time. The
pfctl -x loud
should let us work out if it's a ruleset or a pool ticket. That in turn might give us a hint.I've seen mention of pfBlockerNG, is everyone affected running that?
-
@kprovost said in Strange error: There were error(s) loading the rules: pfctl: pfctl_rules:
I've seen mention of pfBlockerNG, is everyone affected running that?
I'm not even sure what pfBlockerNG is. I'm not consciously running it (unless it is something that always runs as standard).
-
It's a package. You have to actively install it.
Steve
-
Sure I've added
pfctl -x loud
to be executed before the main pfctl execution. So just waiting for it to happen again.This router does have the third-party adam:ONE package which does create pf rules in the
userrules
anchor. So it's possible that they are both attempting to add a rule at the same time.
Although this wasn't an issue prior to pfSense Plus 22.05 as previously pfSense would detect the device busy error and attempt it again. But now pfctl locks up for lack of a more technical term and additional attempts fail.I've also seen someone mention they had snort installed which may have the same symptom if snort is adding a block rule while pfctl runs.
-
@artooro Yes that's me who has snort installed.
-
@kprovost said in Strange error: There were error(s) loading the rules: pfctl: pfctl_rules:
EBUSY almost certainly means we've failed in pf_ioctl_addrule(), when checking ticket numbers.
Is there any reason why there is absolutely no debug output when things fail? If there would be a log message like "ticket number x is causing issues" into dmesg it would make this a lot easier to debug. It's not like it's often executed so that additional log output wouldn't really slow things down, especially if it's only executed during failure scenarios.
-
The Snort custom blocking plugin does not add rules when blocking. Instead, it simply adds the offending IP address (or addresses if both source and destination are to be blocked) to the pre-existing snort2c
pf
table.Here is the call Snort makes when blocking:
ioctl(data->fd, DIOCRADDADDRS, &io)
-
@flole We don't log such errors by default, because otherwise the kernel would spend more time logging errors than actually doing useful work.
It's not just the ticket check, there are dozens of checks in DIOCADDRULENV alone. We do have Dtrace probe points for a lot of those, for when we're debugging things, and there is optional debug output at the ticket check, that's why I asked for the
pfctl -x loud
. -
@bmeeks That could potentially also be causing a ticket mismatch in DIOCADDRULENV. There's a ticket number check on the ruleset, and another one for the address list. The
pfctl -x loud
enabled kernel output should let us figure out which one.Given the presence of the DIOCRADDADDRS call from snort I'm 95% confident it'll be the pool ticket, but it'd be good to confirm anyway.
-
If people are feeling adventurous there's an experimental (amd64) kernel here: http://pfsense-build-01.netgate.com/~kp/kernel.tar.bz2
It should prevent pf from getting stuck, where it's unable to update rules.
Extract that in /boot/. It will overwrite the existing kernel there, so ideally test this on test machines, or snapshotted VM's.
(To be clear: this is an amd64 kernel, do not try it on 3100 or 1100 or 2100, because it will not work there.)