Netgate 6100 LAN crashes
-
Define exactly how unresponsive the LAN becomes.
Can it still be pinged?
Does it stop handing out DHCP leases?
Can you ping out from pfSense to other devices on the LAN?
If the interface still shown as linked and active?
Is anything logged when this happens?
-
@Gertjan Thanks for the feedback.
I have already disabled everything and even done 3 factory resets.
Fiber is the main connection and doesn't cause me any slowness issues. 2500/1500 the 4G connection is only a backup.If I don't find a solution, I'm seriously considering this: https://www.reddit.com/r/Netgate/comments/1in8uwm/successful_emmc_replacement_in_netgate_6100/
-
@stephenw10 Thank you for your reply.
Can it still be pinged?
From a client station in the LAN the router is inaccessible and no longer responds to ping.Does it stop handing out DHCP leases?
DHCP leases are distributed by the domain controller not the router.Can you ping out from pfSense to other devices on the LAN?
No.If the interface still shown as linked and active?
Yes.Is anything logged when this happens?
The system logs are not telling me anything. The log is full of lines from syslogd that are not responding.
I have disabled syslog but I can't find the source of the problem.When this happens, the Wi-Fi interface, whose DHCP leases are distributed by the router, works.
Wi-Fi clients can access the internet, but access to the LAN is blocked.
Locally, I have to use the console to reboot the router.
Remotely, the WAN interface responds, and I can reboot via the GUI. -
@Nightwolf said in Netgate 6100 LAN crashes:
Can you ping out from pfSense to other devices on the LAN?
No.Like it fails or you're unable to test? Does the serial console still respond when this happens?
I assume it does if WiFi still works.@Nightwolf said in Netgate 6100 LAN crashes:
The log is full of lines from syslogd that are not responding.
You have an example of that?
Do you see errors on the LAN interface shown? It seems like the LAN itself just stops passing traffic rather than the firewall hangs.
I would try reassigning LAN to a different NIC if you can. -
@Nightwolf said in Netgate 6100 LAN crashes:
Is anything logged when this happens?
The system logs are not telling me anything. The log is full of lines from syslogd that are not responding.
I have disabled syslog but I can't find the source of the problem.Do you have remote syslog enabled? If so, is the syslog server on the same interface that stops working?
-
I had a crash yesterday that I was able to catch in time to retrieve the logs.
The local IP address and local domain are truncated for security reasons.
Everything else is unchanged; I intentionally left all packages running except ntopng.
Disabling/uninstalling the packages doesn't change anything anyway.
Hope this helps. -
@Nightwolf said in Netgate 6100 LAN crashes:
I had a crash yesterday that I was able to catch in time to retrieve the logs.
The local IP address and local domain are truncated for security reasons.
Everything else is unchanged; I intentionally left all packages running except ntopng.
Disabling/uninstalling the packages doesn't change anything anyway.
Hope this helps.Wow ... that's a - sorry for the word : a bit messy.
I'll start with this : I don't use it HA Proxy - have no usage for it as I lost all interest in hosting things myself decades ago. I got myself a small VPS server in some data centre, and that one took care of my web sites, mail stuff, powering, disk maintenance, cleaning the fans, paying the power bills, and all that stuff. @home and @work my pfSense is their to 'regulate' my ISP connection.
Because 99++ % is TLS these days, this means that plain text data transfers, for mail, web sites etc etc doesn't exist anymore, stuff like Suricata (clam-av) or more general, IDS/IPS, has become pure rocket science.
The latest example was shown last evening = see Not Nominal from Scott Manley. Rocket science is hard, hurts, and things will blow up "all the time".
Those who master "IDS/IPS/Proxy" are not the ones** posting here on this forum. They are the TLS gods ....
The good old days of 'traffic scanning' is gone now.** well, not true, there is one person here in the forum : bmeeks.
First things first : ok that your WAN disconnects ....
But your log starts at the bottom with a LAN disconnect ! That's not good !!
Your mission, if you want to have an easy admin live : stop that from happening.
The 6100 LAN plugs, or actually, any plug, don't disconnect them. The devices connected to these : (switches, ISP boxes) : these are small power consumer, share the pfSense UPS with them.That said, it's 'ok' for interfaces to go down. In theory, this should break anything.
But there is a but ....
When an interface goes down, processes like nginx, the DHCP server (or client), the resolver, the gateway scanner 'dpinger' etc etc etc will restart. And here comes the issue : all these process restart nearly at the same moment, which opens the door for a the most dreaded situation : race conditions. Your mission, a an admin, is : never ever create situations where race conditions can bite you.
The final goal is : Keep the logs dull, with no errors neither warnings messages. You'll be granted a very stable router, an admin's dream.Next issue : you use the "servicewatchdog" : that's another admin-self-inflicting-pain tool.
Don't use it. Like never.
Locate "error: bind: address already in use" phrase in your zipped log. That's "servicewatchdog" doing things it should not do. "servicewatchdog" wasn't needed in this case, and did make things worse.I use a 6100 (4100) myself, with UPS, and my 'unbound' (example) never gets restarted because interfaces went down.
This means you and I have the same hardware, the same software. If you use the same default Netgate settings for the core settings, you have nearly the same settings like me : result, as it is meant to be : it works "forever" - and don"'t take my word for it, have a look yourself.Also repair this :
"haproxy: startup error output!" (several)
and
"/status_logs.php: ERROR! ldap_get_groups() could not bind"Don't be ashamed if you can not make the error go away.
You are allowed and I even advise you to apply the KIS rule : don't use stuff that produces errors in the logs.
And if you do, accept them and with the consequences ^^I'm not saying you can't / shouldn't use "pfSense package X" and "pfSense package Y" **, they are all easy to install. Setting them up can go way beyond what is needed to operate a vanilla pfSense.
** exception : "servicewatchdog" which should be banned from the package list.
Be assured : my goal is that you have a 6100 that 'never fails' on you, so you can go back to do other stuff like spending time in the garden ^^ -
So you have replaced the LAN IP address with
10.10.10.10254
? Are you using a public subnet on LAN?Was there a significant gap in the logs before that? The first thing logged there looks like a reaction to the LAN coming back up.
Mostly what is concerning there is that igc0 is flapping repeatedly. What is it actually connected to? Did you try reassigning LAN to one of the other igc NICs?
-
Thank you for all the valuable responses.
Are you using a public subnet on LAN?
No, it's a class C private network, but I don't share the real LAN on a forum.Was there a significant gap in the logs before that? The first thing logged there looks like a reaction to the LAN coming back up.
I can't go back any further in time.Mostly what is concerning there is that igc0 is flapping repeatedly.
I think it's probably due to a problem with the RJ45 cable.
I installed a new one yesterday, now I'm waiting to see if the crashes persist.
The router is behaving much better, it's faster, and above all, much less talkative.
I'll wait and see. -
Ah, a bad/failing cable would certainly be a problem!
If it does persist i would try reassigning LAN to on of the other igc ports. See if the problem follows it.