Kernel crash, could use some advice...
-
I'm posting this for @synfinatic because he was blocked by the anti-spam system or something:
First: Apologies if this is the wrong section of the forums for this... this was my guess at what might be most appropriate since I believe this to be related to the low-level networking code in the kernel.
Running: pfSense 2.4.5-p1 on Protecli FW6B. System is ~3 months old.
Symptom: After a couple of months of stability, system randomly hangs hard every 3-18hrs, but mostly middle of night while everyone is sleeping/minimal activity. Occasionally happens middle of day as well. Nothing in logs (including remote syslog) or on VGA/serial console or core dumps.
At first I assumed this was a hardware bug. I've now done 8+ full passes of
memtest
(over 12hrs worth) and found 0 errors.smartctl
passes, although my experience is systems generally don't randomly hang like this with SSD's failing. I also turned on /var & /tmp ramdisks and this had no impact.At that point I reached out to Protectli and they agreed to RMA the unit (awesome service btw). And I put the system back in (same RAM/SSD since I had bought that separately) service. Hung that night.
So now I'm racking my brain to figure out what changed about the time the hanging started. This is my home network so not like I'm following SOC2 compliance guidelines here ya know. :) Anyways, I realize that was about the time I installed a Go program I wrote: udp-proxy-2020.
So I disabled my program yesterday and the system hasn't hung again. (Part of me hopes by saying this publicly I'll jinx it into hanging because that's how these things work right?). So assuming it doesn't hang "soon" I'm going to say I've found my smoking gun.
Now full disclosure here: I totally believe this is a cursed little program. Nobody should ever have to run this- but sometimes software vendors do really stupid things like relying on UDP broadcasts for discovery and you really want devices on different L2 broadcast domains to be able to talk to each other. And maybe in really extreme cases, you need it to work on non-broadcast capable networks like those used for OpenVPN tunnels. Hence
udp-proxy-2020
was born.But it's pretty simple:
- Listen for UDP broadcasts with libpcap
- Send UDP broadcasts on other interfaces with libpcap
- Profit.
I've been using libpcap to listen to and inject packets on networks for two decades now- although honestly don't recall using FreeBSD much (mostly Linux & OSX).
At this point I'm open to suggestions... like how to force the system to generate a log/backtrace/core dump/anything to diagnose where/how this bug is happening in the kernel. Or hey, you should talk to these other people over >here<. I'm also reasonably willing to try various settings to stop it from happening, but honestly I consider this a bug worth fixing and not just "working around”.
-
Might be able to use this instead: https://github.com/marjohn56/udpbroadcastrelay
There is a thread for that, several users report success so it appears not to crash at least!
Edit: Though it looks like you may have started from that or something forked from it.
Steve
-
@stephenw10 So I actually was using a version of that, but it has a problem for my use case because it uses a UDP sockets. The issue is on Loopback interfaces (like
tun
devices used by OpenVPN) the interface does not support broadcasts and so when the client sends a packet to x.x.x.255 it is dropped on the floor since the firewall treats it like a unicast packet for another host. Why a client is sending a "broadcast" packet on a network interface which doesn't support broadcasts is anyone's guess.Anyways, good news... the box finally crashed last night. So I'm getting new RAM.
-
@synfinatic said in Kernel crash, could use some advice...:
Why a client is sending a "broadcast" packet on a network interface which doesn't support broadcasts is anyone's guess.
Ha, indeed. That's the Roon client?
Anywat, yeah, seems like RAM if it crashed out without that running. I assume you do not get a crash report when it reboots?
Steve
-
@stephenw10 Yeah, the Roon client is uh, "interesting". I'm assuming they're calling setsockopt(SO_BROADCAST) and not checking the return code, but it also might be just some stupid .NET thing.
Unfortunately, I get nothing. No logs, no crash report, no core dump, nothing on console. Just hangs and I have to reboot manually. When it comes back up, the UI doesn't indicate there is a crash report like the docs indicate it should notify me.
Anyways, RAM shows up in a few days and until then I put in my old EdgeRouter firewall back into service.