After upgrade to 23.05.1, pfSense on Netgate 3100 basically dead
-
Did a standard upgrade from the webui from 23.01 to 23.05, waited 10 minutes, then another one to 23.05.1
Prior to that system stable since last power failure (~60 days ago).After the upgrade everything came up, was working, and about an hour later no traffic would pass to the internet.
Could ping, could SSH in, could not load WebUI.Did a reboot of the box (pulled power cord), it came back up, and for about a minute I had a webui, then it went non-responsive again. I noticed that my WAN1 was saying link down for some reason, so I set the default gateway to WAN2.
Did another restart of the box, now traffic passes to the internet but still no WebUI. I can finally get the console menu to pull up via SSH (first time it refused) and tried restarting webconfigurator - now I get a 502 nginx error. Option 16 gave me back the webui, but WAN1 is still dead. I verified the ISP/Cable is fine by connecting to a laptop directly. The webUI dies after a few minutes.
dmseg shows a ton of this:
sonewconn: pcb 0xe0eb4d80 (local:/var/run/php-fpm.socket): Listen queue overflow: 193 already in queue awaiting acceptance (1065 occurrences),Anyone have any idea what's going on here?
-
@MordyT what does “waited 10 minutes” mean? Hopefully that’s when it naturally finished?
I would have expected it to go straight to 23.05.1.
-
@SteveITS After the upgrade finished and the webui was available again, I waited 10 minutes just in case of any background tasks before going to update and moving to the .1 release.
-
@MordyT I would try:
-
Check if the console (cable) shows any errors at the time
-
check storage write life: https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html
-
try a reinstall: https://docs.netgate.com/pfsense/en/latest/solutions/sg-3100/reinstall-pfsense.html
-
-
@SteveITS said in After upgrade to 23.05.1, pfSense on Netgate 3100 basically dead:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html
All good on that front, apparently under 10% used
@SteveITS said in After upgrade to 23.05.1, pfSense on Netgate 3100 basically dead:
Check if the console (cable) shows any errors at the time
That's going to be tricky, but I'll work on it. Any other way to get the output?
@SteveITS said in After upgrade to 23.05.1, pfSense on Netgate 3100 basically dead:
try a reinstall: https://docs.netgate.com/pfsense/en/latest/solutions/sg-3100/reinstall-pfsense.html
I've opened a ticket for the image.
-
Reflashing the image and restoring the backup seems to have stabilized the system.
Of note, you can't use just any flash drive - a 5 year old 64GB Kingston USB3.0 drive would crash during the run recovery stage and I had to dig out a 8GB USB2.0 drive for it to work.
I still don't have boot environments though, I thought a reflash brought it to ZFS?
-
@MordyT ZFS is not supported on ARM32 appliances (Netgate 3100 and 1000).
-
J jimp moved this topic from Problems Installing or Upgrading pfSense Software on
-
@MordyT said in After upgrade to 23.05.1, pfSense on Netgate 3100 basically dead:
Reflashing the image and restoring the backup seems to have stabilized the system.
The brute force method, but it's in a known good state going forward.
Re: console, I was hoping it would show something at the time of the crash, maybe a disk error or something. Doesn't sound disk related though so that's good.
-
@bigsy said in After upgrade to 23.05.1, pfSense on Netgate 3100 basically dead:
@MordyT ZFS is not supported on ARM32 appliances (Netgate 3100 and 1000).
This is 100% accurate.
-
Unfortunately, about 30 hours post flash, the issue has returned.
- All traffic to the WAN (dedicated WAN port on the box) just dies. SSHing to the box shows that WAN1 no longer has an IP or anything assigned. I can set my gateway to WAN2 and have connectivity - WAN2 is in OPT1.
- The pfSense box responds to ping, SSH, but the WebUI is either dead or very slow to respond. If I reset option 16 on the console/SSH, I can get it back for a few minutes, then it starts to 504 on me. Option 11 does not help at all.
What logs should I looking for? How should I troubleshoot this?
There are no errors displayed on the console.
-
@MordyT is something stuck/using CPU if you run top? Your note and the PHP error above sound like PHP is not responding.
-
-
@MordyT mvneta2 is the WAN port so that matches what you're saying originally. Sounds like you tried the same patch cable on a laptop. I'd try replacing it anyway just to see. Another thing that helps some people is to put a switch between pfSense and the ISP router so the link stays up as far as pfSense knows.
-
What's the WAN actually connected to? Is it actually losing link like that?
Not that the WAN flapping should cause php to get hung up like you're seeing.What's in the main system logs when that happens?