SG3100 keeps locking up after latest update
-
@tuser11 said in SG3100 keeps locking up after latest update:
Yes, we do have 2 buildings physically connected and connected with a wire
If they have separate grounds, then what happens is, that wire carries the voltage difference between the grounds. We actually measured voltage on one once but it was long ago. IIRC it was burning out switch ports. Fiber is ideal to connect buildings.
That said, the first two sites I found say it's only an issue with shielded twisted pair and UTP is not a probem, which isn't my recollection at all, but it's been a long time since I've run into a situation not using fiber.
https://www.truecable.com/blogs/cable-academy/how-to-fix-a-ground-loop
https://networkencyclopedia.com/ground-loop/Often an actual Ethernet cable isn't fed through a UPS and even if it is it's likely looking for a 1000v surge not 10 volts.
-
@stephenw10 the other SG-3100 is offline in a box. I'm going to see if i can connect a USB from the SG-3100 to one of the Dell Servers and pass it through to a virtual machine that i can leave a putty session running on. As long as i have a static IP i can plug my laptop into the same switch and get access to that virtual machine even if SG-3100 goes offline. If that doesn't work I'll setup a pi or similar.
-
@SteveITS said in SG3100 keeps locking up after latest update:
https://www.truecable.com/blogs/cable-academy/how-to-fix-a-ground-loop
Good tip, i will look into this more as i just setup a Ethernet to an outdoor building near the house at home. Would hate to see these issues start creeping up at home.
-
-
I had some locking up issues a while back on my sg-3100 that turned out to be a bad power supply. When you swapped for your spare, did you also change PSUs?
-
@netplumbers Good tip. I don't remember. I looked through my logs and i have a log of changing hard drive and later changing the SG-3100 unit. No notes about changing the power supply. With that said, I'm going to assume I didn't and change it anyway tomorrow.
-
@netplumbers Unfortunately the power supply swap was already done and just wasn't in my notes. When I went to swap the power supply today, the supply in the box from the old system had our internal asset ID written on it from the old system. So the power supply in production is the newest supply with less than 4 months of use.
-
I have a putty session running from SG-3100 USB passed through to a Linux virtual machine. I'm logged into the pfsense box via putty so I have somewhere to look next time the system locks up in hopes that even though connecting USB after lockup results in unresponsive putty session, maybe this will still have some output on the screen on the next lockup.
@stephenw10 Do i just leave the prompt as is after login or is there a command i should run to stream something?
-
Anything shown should be pushed to the console whatever is happening at that time.
-
@stephenw10 Hello, today there was another lockup. I had the console connected to a virtual machine for many days waiting for it to lockup again. When it did, I logged into the vm to look at the console that was connected to SG-3100 and there was no output about the event. The last message on the screen in the console was a message that I had successfully logged in via VPN many hours before.
Can this box be easily locked up via a DDoS attack? How could that be identified when there are always lots of blocked IP addresses? I have logs up until the lockup.
-
It could exhaust the state table perhaps but that would not stop it responding at the console. Also you would see the states rising in the monitoring graphs after rebooting.
Were you able to try 'ctl+t' at the console?
If it was a drive error the console would be full of errors showing that.
A hard lock like that with nothing logged at all is more likely a hardware problem IMO.
-
@stephenw10 I didn't try ctl+t at the console. I took a screenshot for proof of last output and then hard-power-cycled the box. I'll try that next time.
-
Do you have a dual WAN setup? My sg-3100 started locking up with no log messages yesterday, less than a day after setting up dual -wan. Two out of 3 times we had trouble on the primary WAN in the first 24 hrs of dual WAN, it slowly stopped routing traffic over a couple of minutes (some traffic would pass, the sg-3100 interface and ssh were unresponsive until it stopped altogether) and reboots.
-
@netplumbers No, i don't have dual WAN setup. I have 1 WAN and use 2 of the LAN ports (one just for management and the other for all vlan traffic). I also have ntopng and snort package installed. snort wasn't installed when the problem started so on 22 Aug I removed ntopng and continue monitoring to see if i continue to get the random lockup without any logs.
-
@stephenw10 It's locked up now. ctrl-t doesn't do anything. still no logs in console
-
Hmm, so reviewing: you're seeing this in every version since 23.01? And on multiple devices? But running the config? And nothing logged at any time?
-
@stephenw10 Yes, on 2 devices (SG3100 and power supplies) and it's only happening when employees are in the office. And logs prior to the event are all normal and no logs are output to the console during the failure.
To try and make isolating if another device is affecting this issue easier, we are going to change the network to something that might be more appropriate anyway. We are going to turn off DHCP and only allow devices on the network by pair (static IP and mac address). Right now we don't have a concept of "trusted" devices as it's a relaxed office. The problem is the traffic logs before the lockup don't seem useful because there are so many devices and I don't know what abnormal traffic is when everyone can bring personal devices and add them to the network. After forcing all devices to be registered, we can start monitoring traffic for specific users.
I don't have any other ideas for monitoring. We eliminated the ground-loop possibility, buying new hardware doesn't seem appropriate considering we already swapped hardware and I manage another SG3100 at a different location that doesn't have this issue.
-
Would it be a security problem if I uploaded pfSense logs for the 2min up to the lockup time?
-
You can upload them here if you don't want them to be public: https://nc.netgate.com/nextcloud/index.php/s/yELBD5g5qwjNban
-
@stephenw10 I uploaded them there. It's unfiltered logs (1min 10sec) up to the system lock up: 2023-09-06 09:16:50.000 Until 2023-09-06 09:18:00.000
-
Those are all firewall logs except:
2023-09-06T09:17:00.000-04:00 pfSense.tstdomain pfSense.tstdomain /usr/sbin/cron[13050] (root) CMD (/usr/sbin/newsyslog) 2023-09-06T09:17:00.000-04:00 pfSense.tstdomain pfSense.tstdomain /usr/sbin/cron[12708] (root) CMD (/usr/local/pkg/servicewatchdog_cron.php)
What do you have enabled in the Service Watchdog? That can cause problems, it should only be used for testing.
Are those the only things in the system log?
Steve