Strange error: There were error(s) loading the rules: pfctl: pfctl_rules
-
@flole said in Strange error: There were error(s) loading the rules: pfctl: pfctl_rules:
@bmeeks said in Strange error: There were error(s) loading the rules: pfctl: pfctl_rules:
You have a corrupted installation with mismatching versions of core system libraries.
Not entirely sure how that could happen by simply rebooting and why the initial loading of the rules when booting works (aswell as querying the rules/tables and so on) though. To me that sounds like something after the initial loading is causing those errors, otherwise no rules would be loaded?
Others solved the same behaviour by rebooting, that obviously doesn't fix a mismatched version.
The Snort error you posted is coming directly from the custom blocking plug-in code. When the Snort binary is implementing a "block", it does so by making a
pf
call using theioctl()
primitive within FreeBSD. It passes an Op-Code along with the system call letting the OS know what it wants. In the case of your error, it is askingpf
to add an IP address to the snort2c table. That is what the DIOCRADDADDRS op-code means.The fact that call is failing tells me that the code for that operation is not present in the OS. But it should always be there.
-
The shortest road back to normal operation may be a reinstall.
-
@stephenw10 said in Strange error: There were error(s) loading the rules: pfctl: pfctl_rules:
I assume you are running Snort?
Correct, I am running Snort.
It doesn't seem completely broken though:
[22.05-RELEASE][root@XXXXXXX]/: pfctl -d pf disabled [22.05-RELEASE][root@XXXXXXX]/: pfctl -e pf enabled
So it works to some extend apparently. And even running
pfctl -a
shows that rules are loaded. It seems to me like somehow it's not possible to add new rules to the filter.However, clearing rules also does not work:
[22.05-RELEASE][root@XXXXXX]/: pfctl -Fa pfctl: pfctl_clear_eth_rules: Device busy
Is that really the behaviour if there's a mismatch of some libraries? How do the rules end up there initially and why can I query them properly?
-
Try running:
truss pfctl -g -f /tmp/rules.debug
-
Looks like it does "prepare" them to some extend and then tries to apply(?), which fails and then rolls back:
.... ioctl(3,DIOCOSFPADD,0x7fffffffe450) = 0 (0x0) ioctl(3,DIOCOSFPADD,0x7fffffffe450) = 0 (0x0) read(4,0x8007174c0,32768) = 0 (0x0) close(4) = 0 (0x0) mmap(0x0,28672,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34367270912 (0x80072f000) mmap(0x0,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34367299584 (0x800736000) mmap(0x0,86016,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34367320064 (0x80073b000) ioctl(3,DIOCXBEGIN,0x7fffffffd568) ERR#16 'Device busy' pfctl: write(2,"pfctl: ",7) = 7 (0x7) pfctl_ruleswrite(2,"pfctl_rules",11) = 11 (0xb) write(2,"\n",1) = 1 (0x1) ioctl(3,DIOCXROLLBACK,0x7fffffffd5a8) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) exit(0x1) process exit, rval = 1
-
Hmm, not really anything very insightful there unfortunately.
I would reinstall that if that's an option for you.
-
Isn't the "Device busy" error when it comes to applying those rules showing that something causes some kind of lockup? The first rule loading works apparently, otherwise after a reboot I would end up with an empty ruleset, but something later on causes issue. For examble
pf_begin_eth
would return EBUSY if it's still waiting forNET_EPOCH_CALL(pf_rollback_eth_cb)
to finish (according to the comments in there). As the ethernet-based filtering is new in 2.6.0 afaik it could be that it introduced a bug, especially since I am not the only one having such an issue and such a lockup would also be fixed with a reboot (assuming after the reboot it isn't simply caused again), so that's consistent with what others are seeing aswell, except for me apparently the issue is automatically re-triggered after a reboot for some reason.I will try a reinstall though so I finally migrate to ZFS and have redundant disks aswell, the second HDD is in the appliance now for a few years but is completely unused so far :D
-
Yeah, it could well be a bug. I have alerted out developers.
-
I've been seeing this problem on quite a few firewalls after upgrading to pfSense 22.05
Normally a reboot resolves the issue for awhile until something triggers it again.
And I have not been able to figure out what triggers it.A router that had the problem, after a reboot the
truss
command ends withexit(0x0) process exit, rval = 0
But when the issue has been triggered prior to reboot it ends with
ioctl(3,DIOCOSFPADD,0xbfbfe5c8) = 0 (0x0) ioctl(3,DIOCOSFPADD,0xbfbfe5c8) = 0 (0x0) ioctl(3,DIOCOSFPADD,0xbfbfe5c8) = 0 (0x0) ioctl(3,DIOCOSFPADD,0xbfbfe5c8) = 0 (0x0) ioctl(3,DIOCOSFPADD,0xbfbfe5c8) = 0 (0x0) read(4,0x20401c80,32768) = 0 (0x0) close(4) = 0 (0x0) mmap(0x0,12288,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x20600008) = 541171712 (0x2041a000) mmap(0x0,12288,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x20600008) = 541184000 (0x2041d000) mmap(0x0,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x20600008) = 541196288 (0x20420000) mmap(0x0,86016,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x20600008) = 541216768 (0x20425000) ioctl(3,DIOCXBEGIN,0xbfbfd9d0) ERR#16 'Device busy' pfctl: write(2,"pfctl: ",7) = 7 (0x7) pfctl_ruleswrite(2,"pfctl_rules",11) = 11 (0xb) write(2,"\n",1) = 1 (0x1) ioctl(3,DIOCXROLLBACK,0xbfbfd9f0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) exit(0x1) process exit, rval = 1
I've verified that there are no duplicate firewall rules and NAT reflection is disabled.
The only reason it might be triggered with our installs more than others is because we use a lot of tables to filter traffic and these tables are constantly being updated. -
@artooro Looks like exactly the same thing I'm seeing, except for me it happens instantly. I'm trying to spin up a VM now so I can test without rebooting, let's hope that the VM also shows this issue.
-
Depends on if available CPU has anything to do with it. I've not switched to pfSense Plus on any VMs and I don't think I've seen this on any v2.6.0 CE installs yet.
Even disabling and re-enabling pf does not help, a full system reboot seems to be required.
-
@artooro There was a new feature introduced in the latest plus for filtering based on MAC-Addresses. If that implementation broke something it won't be visible on previous versions.
-
I tried to replicate the issue on a VM but no luck there. I was obviously not able to replicate all Interfaces there so it might have something to do with that aswell. I tried to do simultaneous rule loading by firing multiple pfctl commands at the same time just to see if they maybe tangle with each other, but that was also unsuccessful.
-
@flole I have noticed that editing interfaces can trigger it as well, but not consistently.
-
Have you heard anything back from the developers yet? Just imagine the consequences if this bug hits in a critical environment, it should be fixed ASAP.
-
I also just had this error on my SG-1100, nothing seemed to trigger it. I rebooted it, it worked for about 2 mins and then no network connection again. I could not ping anything including the router, 2nd reboot and so far it's still running.
Was running PFBlockerNg 3.1.0_4, but I've now disabled it.Theses are the errors:
There were error(s) loading the rules: pfctl: pfctl_rules - The line in question reads [0]: @ 2022-08-11 22:26:17
There were error(s) loading the rules: pfctl: pfctl_rules - The line in question reads [0]: @ 2022-08-11 22:26:20
There were error(s) loading the rules: pfctl: pfctl_rules - The line in question reads [0]: @ 2022-08-11 22:26:29
There were error(s) loading the rules: pfctl: pfctl_rules - The line in question reads [0]: @ 2022-08-11 22:37:02 -
@jacko Just be glad that it failed in a "block-all" state for you. An "allow-all" state is much worse, especially if it's unnoticed....
-
Mmm, unfortunately this is unhelpful:
ioctl(3,DIOCXBEGIN,0xbfbfd9d0) ERR#16 'Device busy'
The issue has already happened and pf is no longer responding to pfctl. What we'd need there is to see the truss output from the first invocation of pfctl after boot. But that's not easy.
We are looking at it but we've not been able to replicate it locally. Yet.I opened a bug to track it. Add any new info you have there:
https://redmine.pfsense.org/issues/13408Steve
-
@stephenw10 Maybe it would help to add additional debug output in pf's code? Is it clear where the wrong branch is taken/where the error is actually thrown? If that's unclear it's probably a good idea to figure that out first. One possible way is probably the one I described above, but is there another one? If it's clear what state the firewall ends up in then it's easier to figure out potential ways how it could end up in that state.
-
@stephenw10 thanks, what I'll attempt tomorrow is to edit https://github.com/pfsense/pfsense/blob/60a2fa6b6f1a59f3f86933265fbb48e25f652bfc/src/etc/inc/filter.inc#L527 to use truss and output to a log file, and see if we can get something helpful there.
As I have a couple of systems where it's pretty easy to reproduce.