Issue with SG-3100 and 22.01? [Solved]
-
I've always used recovery images for firmware upgrade and that was the case in the Feb time frame. I save the running config as a start, initiate a factory default load, run recovery and load the latest firmware, then restore the saved configuration. That process is the standard procedure since initial ownership.
As I said, I shelved the 3100 until a time when I could invest in the problem.
Status>Interfaces>WAN indicates
'no carrier' /'DHCP down'. And DHCP will not come up.
Activity light on the port is green/solid -
@gherkin-d said in Issue with SG-3100 and 22.01?:
Activity light on the port is green/solid
Even with no cable attached? That would be a bad port if so.
Steve
-
@stephenw10
Thats with Cabling !!
BTW, firmware upgrades via Recovery image includes the WAN cabling attached. -
So the WAN port just shows the left LED solid green when you connect a cable to it? Yet it shows no link in the status?
What does
ifconfig -vvvm mvneta2
show?Is it possible WAN is configured to use on of the other ports?
Steve
-
-
@gherkin-d mvneta2 is the WAN port :) mvneta1 is LAN.
-
Yeah the default config assigned mvneta0 as OPT so that might be expected to show as no carrier.
-
I did not install the SG-3100 from recovery image yet, just being curious if anything is logged at console.
I expect the next crash at May 1st or 2nd.
Up to now the SG crashes every 23/24 days.
A PC is connected to serial interface.But anyhow, I have two more questions: as screenshot shows there is something creeping which decreases memory, any idea how to pinpoint this?
Second question: I noticed that when using this view I need to login again to pfSense after x hours (not sure about the exact value).
When in dashboard view I keep logged in for days!?
Seems that the duration for session depends to the view, is that correct?Regards
-
Check what processes are using RAM in Diag > System Activity or run
top -aSH
at the CLI.The dashboard has a number of active items on it that update periodically keeping the session open. On static pages the session times out after a while.
Steve
-
other commands you can use to show top memory-consuming procs:
show top 10 memory consumers
# ps -Am -opid,pmem,pcpu,rss,vsz,args | sort -k4 -rn | awk 'NR == 1 { print " PID %MEM %CPU RSS VSZ COMMAND" } NR > 1 && NR < 12 { print $0 }'
If you want a continuously updating display
# while :; do clear; ps -Am -opid,pmem,pcpu,rss,vsz,args | sort -k4 -rn | awk 'NR == 1 { print " PID %MEM %CPU RSS VSZ COMMAND" } NR > 1 && NR < 12 { print $0 }'; sleep 1; done
or use
top
# top -o res
-
-
-
This morning it happens again...
From one minute to the other, I was just noticing that accessing a web site was not possible any longer (404 - Not found).
Ping works to all addresses (i.e. 1.1.1.1, 8.8.8.8 or any other IP).
But ping to a name will not work, so DNS service was not doing.
After a few minutes WebGUI from SG-3100 was unreachable too.After last issue I connected an old laptop at SG-3100 with serial connection, so I looked up for any console output and ... the laptop was in a deadlock as well!???
Very strange... need to reboot laptop and after looking up in Putty log I rebootet the SG-3100 by power cycling. There was nothing seen in log, nothing means really nothing since last reboot! Not a single character.
An existing VPN connection into company network was still working (until reboot of SG).
No idea what this is. Will use the replacement hardware during the next weekend.Regards
-
Hmm, nothing appeared at the console at all?
At the very least you should have see webgui logins shown there. If nothing at all was shown it sounds like it was not logging.Your description of the issue really 'feels' like a failing drive. That's exactly how it presents. Except that eMMC failures generally don't recover at power cycle.
If you have an m.2 sata SSD you could try that instead.
Steve
-
Hi,
no M2.SSD yet, but I will get one in the next days.
And no, me too was surprised, I looked in the last weeks from time to time at serial console, nothing was shown there. Sometimes I just pressed the ENTER key to see, if I am still connected to console, it was, but nothing was recorded since startup.Regards
-
Hmm, well try just logging into the webgui to check. That should be shown, for example:
*** Welcome to Netgate pfSense Plus 22.05-BETA (arm) on 3100 *** WAN (wan) -> mvneta2 -> v4/DHCP4: 192.168.126.11/24 LAN (lan) -> mvneta1 -> v4: 192.168.18.1/24 OPT1 (opt1) -> mvneta0 -> v4/DHCP4: 192.168.21.10/24 0) Logout (SSH only) 9) pfTop 1) Assign Interfaces 10) Filter Logs 2) Set interface(s) IP address 11) Restart webConfigurator 3) Reset webConfigurator password 12) PHP shell + Netgate pfSense Plus tools 4) Reset to factory defaults 13) Update from console 5) Reboot system 14) Disable Secure Shell (sshd) 6) Halt system 15) Restore recent configuration 7) Ping host 16) Restart PHP-FPM 8) Shell Enter an option: Message from syslogd@3100 at Jun 3 16:43:12 ... php-fpm[656]: /index.php: Successful login for user 'admin' from: 172.21.16.5 (Local Database)
Steve
-
I have the same problem on my SG-3100 running 22.01. Roughly every 4-5 weeks the device freezes. I can ping the device/gateway address but not traffic goes through the WAN interface. DNS resolver/web GUI etc. does not work or is non-reachable.
I have a RasberryPi connected to it via USB-serial console and recording all the output using GNU screen to a text file. Nothing is recorded when this happens. Nothing particular is shown in boot log either. My knowledge of all the log files is however limited, so I might be missing something.
I ran S.M.A.R.T. tests on my M.2 SATA drive and no errors are shown. I see the similar memory graph as previous poster.
Any ideas or suggestion what to check next would be greatly appreciated.
Should I perform a re-install and restore config from backup maybe?
Register support ticker to Netgate? -
Does the console still respond when this happens? Can pfSense connect out on any interface?
Do you see free ram drop to <10%?
Steve
-
This post is deleted! -
Hmm, and you manually power-cycled it at that point?
-
After it happens again this morning I finally replaced the device with the 2nd SG-3100 (which I didnt in the last weeks).
We will see, if it happens with this device too.
In the meantime I will start rebuild the replaced device from scratch.Regards
Edit: Help needed: After exchange of the SG-3100 I am flooded with mails :"can't create socket: No buffer space available", I cant find any reason for that.
Exchanged the devices again, but also the other device (which stops working this morning) too send thousands(!) of this mails???
Whats happened now?
There are some threads with "No buffer space available", but they are all relating to ping, traceroute or any other networking command.
But I did not find any post with "cant create socket", any ideas?Edit 2: no idea what has happened, but mails now stopped!??? Currently the device which stopped this morning is on duty again. Checked routes, NICs, updated some packages, and mails stopped. Whats a mystery!?
-
@stephenw10 Correct. Can't do much else with no response from the device or WAN connection.