PFsense 1100 - Persistent Kernel Crashes and Connection Drops Every 30-90 Minutes

emilysinternetsucks

I have a PFsense 1100
Network Setup: Motorola Modem -> pfSense -> Switch -> Netgear Nighthawk AX1800 (RAX10)
All devices on UPS power

My internet connection drops approximately every 30-90 minutes, taking 2-3 minutes to recover. This has been happening for an extended period across multiple pfSense+ versions, including the current 24.11. Recently identified kernel crashes occurring during these drops.
Also recently had the cable to my house replaced - connection was stable for a few days then degraded back to the same pattern.

Symptoms:
Regular connection drops (30-90 minute intervals)
Kernel crash observed with Signal 6 (core dumped)
Gateway alarms during drops showing 100% packet loss
CPU usage spikes to 51% during issues (normally under 5%)
All devices lose WiFi connection completely during drops
Takes 2-3 minutes for WiFi to recover

Attempts I've made so far:
Fixed pfSense time sync issues
Created static DHCP mapping for WAP
Changed WiFi channel settings from Auto to Manual
Tested different MTU settings (1500, 1492, currently 1463)
Modified gateway monitoring
Multiple software updates over time - issue persists
New cable installation to house - temporary improvement only

The issue has remained consistent across multiple software versions, suggesting it might be hardware-related. Looking for guidance on further troubleshooting steps or potential hardware issues to investigate. Any help or direction would be greatly appreciated.

stephenw10

@emilysinternetsucks said in PFsense 1100 - Persistent Kernel Crashes and Connection Drops Every 30-90 Minutes:

Kernel crash observed with Signal 6 (core dumped)

What exactly are you seeing?

If you see a core dumped log that's probably a single process crash rather than the kernel.

If it's actually a kernel panic do you have a crash report?

emilysinternetsucks

@stephenw10 said in PFsense 1100 - Persistent Kernel Crashes and Connection Drops Every 30-90 Minutes:

Kernel crash observed with Signal 6 (core dumped)

What exactly are you seeing?

If you see a core dumped log that's probably a single process crash rather than the kernel.

If it's actually a kernel panic do you have a crash report?

What I'm seeing in the logs is:
pid 94037 (nate), jid 0, uid 0: exited on signal 6 (core dumped)

It says it is a process crash rather than a kernel panic. No crash report was found in /var/crash when I checked.

When these process crashes occur, the whole network becomes unreachable for about 2-3 minutes before recovering. Would this specific process crash explain the complete network disruption I'm experiencing?

stephenw10

@emilysinternetsucks said in PFsense 1100 - Persistent Kernel Crashes and Connection Drops Every 30-90 Minutes:

pid 94037 (nate), jid 0, uid 0: exited on signal 6 (core dumped)

Is that actually copy/pasted? I'm not familiar with that but we have seen crashes in rate

emilysinternetsucks

@stephenw10 said in PFsense 1100 - Persistent Kernel Crashes and Connection Drops Every 30-90 Minutes:

pid 94037 (nate), jid 0, uid 0: exited on signal 6 (core dumped)

Is that actually copy/pasted? I'm not familiar with that but we have seen crashes in rate

this is exactly what it says in my logs: "pid 94037 (rate), jid 0, uid 0: exited on signal 6 (core dumped)"

Around the time of this crash, I also see gateway alarms and OpenVPN gateway messages in the logs.

stephenw10

Ok 'rate' makes more sense. 'nate' was confusing.

rate is a tool used to generate traffic graphs. Do you have the graphs widget on the dashboard? Do you often have the dashboard open in the gui?

Is anything else logged at that time?

emilysinternetsucks

@stephenw10 said in PFsense 1100 - Persistent Kernel Crashes and Connection Drops Every 30-90 Minutes:

Ok 'rate' makes more sense. 'nate' was confusing.

rate is a tool used to generate traffic graphs. Do you have the graphs widget on the dashboard? Do you often have the dashboard open in the gui?

Is anything else logged at that time?

I have the traffic graphs on my dashboard. Around the time of the rate crash (Feb 20 14:11:03), I see a series of events from about 20 minutes earlier:
Copy- Gateway alarms showing 100% packet loss

OpenVPN gateway showing 'none available'
Several services restarting (IPsec tunnels, OpenVPN tunnels/interfaces)
dyndns updates and filter reloads

The dashboard is often open while I'm troubleshooting these connection drops, which happen about every 30 minutes.

stephenw10

What pkgs do you have running?

Check the RRD graphs. Do you see resource usage spike or something being exhausted?