Syslog service in pfSense v2.8.1 often stop itself

slu

@stephenw10 said in Syslog service in pfSense v2.8.1 often stop itself:

Typically it seems to be when the syslog server is cycling in some way. Archiving perhaps?

In our case we saw this multiple times after reboot and pfSense is ready before all the KVM guests are online for example the syslog server...

Not sure this also happens in 2.8.0, maybe possible.

stephenw10

Oh it absolutely shouldn't. It's a bug and it's now fixed. https://reviews.freebsd.org/D51995

stephenw10

@slu said in Syslog service in pfSense v2.8.1 often stop itself:

Not sure this also happens in 2.8.0, maybe possible.

Yes, it's in 2.8.0 and 2.8.1 and 25.07. Unfortunately.

slu

@stephenw10
mhm, I try the "Service Watchdog" at the moment, maybe a workaround?
Will see..

tsmalmbe

ChatGPT overview of the change, seems legit (of course caveat emptor for LLM...)

After D51995, all of the still-fatal cases are local to the pfSense box (syslogd/process/kernel socket). None of the remaining fatal errors are caused by the remote syslog host; the remote/network-state errors were reclassified as transient and no longer make the destination “dead.”

What can still make syslogd drop the destination (and why)

Local to the pfSense box (syslogd / socket / config):

EBADF – invalid/closed descriptor used for sendmsg(). Programming/state issue on the sender.
man.freebsd.org
EACCES – permission denied (e.g., trying to send to a broadcast address without SO_BROADCAST, or lacking permission on a UNIX-domain socket path). Sender-side socket option or filesystem perms.
ENOTSOCK – fd is not a socket. Sender bug/misconfiguration.
EFAULT – bad user-space buffer/pointer given to sendmsg(). Sender bug.
EMSGSIZE – message too large for the socket/protocol to send atomically (e.g., oversize UDP/UNIX-dgram). Sender data/MTU limits at the local stack boundary—not the remote host.
Any other unexpected errno not on the new whitelist (e.g., EINVAL, EAFNOSUPPORT, EDESTADDRREQ, ENOTCONN)—all indicate a local misuse/state problem.

Dependent on the remote syslog host or wider network?

None of the still-fatal ones. Host/network conditions like refused connection, no route, host down/unreachable, address not available, buffer pressure, or EAGAIN were explicitly moved to the “transient, keep retrying” bucket and no longer cause F_UNUSED.

tsmalmbe

@stephenw10 said in Syslog service in pfSense v2.8.1 often stop itself:

@slu said in Syslog service in pfSense v2.8.1 often stop itself:

Not sure this also happens in 2.8.0, maybe possible.

Yes, it's in 2.8.0 and 2.8.1 and 25.07. Unfortunately.

I was going to look into this and maybe a cronjob "just in case every 3 hours". Let us know the results of your investigation!

stephenw10

That patch is in the new 25.11-dev snapshots if you're able to test that. No CE snaps yet.

dennypage

This issue may warrant a 25.07.2 release.

jrey

@stephenw10

Depends - what else is in the snapshot ?

Not really interested in alpha development builds this early in the cycle just to try the one failing item. The risk of the unknown outweighs the current workaround.

-- honestly, too bad one can not just get the updated build of the syslogd code.

KOM

@jrey Why would you need a whole new snapshot for something that could be fixed via System Patches? And, for the record, I also had this issue because I had an old remote syslog defined that no longer exists. Removing the remote fixed the crashing.

dennypage

@KOM said in Syslog service in pfSense v2.8.1 often stop itself:

Why would you need a whole new snapshot for something that could be fixed via System Patches?

Binaries cannot be updated via System Patches.

stephenw10

That's a compile time fix, it can't be applied via System Patches.

jrey

@stephenw10

wasn't suggesting a "patch" in the current "pf"sense of a patch

On the other hand it is just a binary file, that could be provided and copied into place.

For those not willing to play with "Alpha" builds the release of 25.07.2 would be a great alternative, rather then having to wait for beta or even final release of 25.11

Maybe the "patch system" should have the ability to deliver a hot fix for certain binaries in the future?
somewhat surprising that since BSD at the core a patch could be created with bsdiff and bspatch both of which are available and actually installed as part of the package.

Annoying issues (and not that there are that many, this is one) could likely be fixed by providing this ability to either install a new file or run a binary patch, without having to wait for a full drop of the next version.
(the problem with patching vs copying, is as we have seen in that past, the files where different in the same "release" from one day to the next. Refresh my memory when was that 23.xx, I'd have to look it up)

point is when there is a will there is a way..

stephenw10

Yes we are looking at options.

slu

@jrey years ago there was a p1 release:
https://docs.netgate.com/pfsense/en/latest/releases/2-3-5-p1.html

aclrgt

Hello,
I'm experiencing the same problem with a client after updating to 25.07.1
I can also confirm that the problem occurs because we have a remote syslog server under maintenance.
Pf's syslogd should continue to work in this scenario.
I hope a fix is found soon.
Thank you,

KOM

@dennypage Huh. I did not know that.

stephenw10

As a workaround you can prevent the syslogd process seeing the connection rejection message from the server by adding firewall walls.

You need to pass the syslog traffic outbound with state set to 'none'. And block the incoming icmp rejection if it's not already blocked.

It then just keeps sending to the server.

vmillan69

I have the same problem but with version 25.07.1 of pfsense+ and I am in PCI non-compliance. I think it is not that the remote server is not available for me, it is a bug in the version and it is critical.

jrey

@vmillan69 said in Syslog service in pfSense v2.8.1 often stop itself:

I think it is not that the remote server is not available for me,

if it is not this specifically -- then more information is likely required to offer any suggestions --

same issue with code reference
https://forum.netgate.com/topic/198418/25.07-unbound-pfblocker-python-syslog/43?_=1758219580156