6100 Fails to come back up after power outage without manual reboot
-
@ITH said in 6100 Fails to come back up after power outage without manual reboot:
It has power on when the power is restored but does not function until manually rebooted again. It is "on" but unresponsive.
pfSense, is like a PC or Mac.
Ripping out the power, not using the Windows => Start menu => and select "Power down", is 'prohibited' by Microsoft. As this might mess up the file system.
pfSense can behave the same. It's rare, because very resilent file system are used, but issues can happen if the SSD was just writing something somewhere ...pfSense isn't a device you shut down when stopping visiting the Internet (something you would do with your PC : you'll put it in sleep mode at and now it won't use the drive anymore so its safe).
This is the reason why you would equip your NAS (always on - will contain all your valuable data) and your local network with an UPS.
An UPS isn't mandatory, of course, but in that case, do shut down your stuff before the power loss happens.
[ I know, this is impossible - that's why UPS's are made possible ^^]You can however know what happened - as it can happen again. So lets go looking for the issue.
First, make a backup of your pfSense config.
That out of the way, hook yourself up to the most important interface of your 6100 : the console access.
Login to it.
Now, rip out the 6100 power.
Wait a while - like 60 seconds.
Put it back in.Here comes my question, and your question : what did you see ? What does it say ? What happens ?
Btw : whatever you see, there are other scenarios to test.
The WAN NIC is connected to what ? Was the other end of the WAN cable also powered down ?
Who starts up faster, pfSense or the other device on the other side of the WAN ? Cable modem are notorious if you "boot stuff out of sequence".Etc etc.
-
@ITH said in 6100 Fails to come back up after power outage without manual reboot:
It has power on when the power is restored but does not function until manually rebooted again. It is "on" but unresponsive.
I misunderstood. My bad. As @SteveITS said, you want to look at the console and/or the system log.
[PS: it sounds like you need a UPS]
-
This unit is at a remote site and it doesn't have a UPS. While I have UPSs at the other locations and yes, I understand that they would be a probable fix, I'm having a hard time believing that cheap residential units or other manufacturers SMB units can gracefully come online and this one can't... Spending hundreds or thousands for a Band-Aid which would eventually run out of power depending on the length of the outage isn't a fix ... all of which leads me to something is wrong with this one. This is my first (only) netgate so I am happy to be told I am overlooking something obvious or doing something dumb :)
I will try to take it offline tonight after office hours and console in to watch while yanking the power out. I have unplugged it numerous times and it comes back up from that though so I'm not sure it will be the same as an outage. I don't know what the difference is though. Theoretically they should produce the same result.
WAN is connected to the cable modem... which is slow to start up and connect. As can some of hte switches. If power comes on to both devices at the same time and the cable modem isn't up yet that could be the difference to me unplugging the router by itself. That is a good idea. Thanks for the suggestion. I will try killing both tonight.
Is there a way to add a startup delay in the netgate? A pause line maybe? If the issue is the rest of the network isn't online yet, that could make sense. I would hope that it would resolve when it detects the rest of the network but if a delay isn't possible maybe an auto-restart of the netgate after 5 mins of no connectivity?
-
@ITH said in 6100 Fails to come back up after power outage without manual reboot:
I'm having a hard time believing that cheap residential units or other manufacturers SMB units can gracefully come online and this one can't.
The units you talk about run their software out of a "ROM" (or flashrom).
Neither do they use a OS like "FreeBSD 15.0", but a severally stripped down, very minimalist Linux. These device have only the drivers for the upfront known hardware on board. You can't add NIC's or something like that.
pfSense can be installed on a "PC" (just add a second NIC, and your good) ... and these do not have all the same hardware (understatement of the year) and use PC like disk storage, which could be old fashioned fast spinning drives, or SSD, or "emmc" stuff. Neither of these like power losses. -
It should come back on. There's no setting for it on the 6100, it will always try to power back up when power is re-applied.
However it sounds like the actual state it was in is unknown, just that it did not connect back to the WAN?
@ITH said in 6100 Fails to come back up after power outage without manual reboot:
Is there a way to add a startup delay in the netgate? A pause line maybe?
Yes. And that can be an issue in situations where the modem and pfSense are simultaneously powered on.
Create the file /boot/loader.conf.local
That is used to store custom loader values. Add to it the line:
autoboot_delay="30"
The default value there is 3. You probably don't need 30s but it won't hurt.
Steve
-
@ITH said in 6100 Fails to come back up after power outage without manual reboot:
Is there a way to add a startup delay in the netgate? A pause line maybe? If the issue is the rest of the network isn't online yet, that could make sense. I would hope that it would resolve when it detects the rest of the network but if a delay isn't possible maybe an auto-restart of the netgate after 5 mins of no connectivity?
That's a known subject and issue to deal with.
Most common 'brutal' solution : you can add a boot delay on the pfSense boot console access.
Typically, it's 3 seconds or so, the time an admin using the console access to switch to single user mode, or something else. By default, it counts to 3 and boots pfSense.Of : if you use DHCP as a WAN access method, play with the (on the WAN interface settings ) :
and now you have new options - see somewhat lower.
Btw : these are just some ideas, see also the pfSense Documentation about this.
-
@stephenw10 said in 6100 Fails to come back up after power outage without manual reboot:
It should come back on. There's no setting for it on the 6100, it will always try to power back up when power is re-applied.
Thank you. Saves me having to check.
-
@Gertjan Fair point. Mental shift. I'm not sure I like the tradeoff at this point. You're right though, the windows / nix servers handling similar functions are all on UPS.
This still leaves me stuck however. It doesn't recover to base operable state and it is the furthest thing out except the modem so I can't treat it like a computer and send a magic packet because there is no way to do so. I can't VPN, I can't use my rmm tool to get to anything behind it. The power could be off for a week.
What else am I overlooking? If we start with the assumptions that time offline will exceed anything from a UPS and it needs to restart to functional without human intervention from that location (remote is okay), what does that leave me with? It would seem that I need another controller capable of self-starting in front of the 6100.
-
Thanks @stephenw10 . I wasn't finding anything in the pages I was reading.
Correct. After both power outages it did the same thing. I was remote and could no longer communicate with the network from outside using vpn, or rmm / remote desktop. I later went to the location and none of the computers could connect to each other or to the internet. I restarted everything - computers, switches, modem and still didn't have functionality. Only after pulling the power cable to manually reboot of the 6100 did everything come back online.
-
Hmm, well I wouldn't expect a WAN issue to prevent internal hosts connecting to each other so it might not be that. Sounds more like DHCPd wasn't running. Did they have the expected IP addresses? Could they reach pfSense?
-
@ITH Assuming NUT or apcupsd, it should shutdown gracefully and reboot cleanly when power returns.
Perhaps the problem you are experiencing is from ungracefully pulling power. Only the console will tell you for sure.
-
@stephenw10
I didn't consider that it might be only part of the functionality that was offline. DHCP being down might make sense. I don't remember if I checked that specifically on the endpoints or not. I tried to RDP to a couple of machines from the inside and when it didn't work started rebooting things. If I remember correctly, my phone didn't pick up wireless even though the access point was lit. I assumed it was just not hitting the internet but if it wasn't getting an address that could describe the result as well. I do know that I rebooted everything, tried again, and then went back to the 6100 a few minutes later and rebooted it last after everything else was back up since it was in a different part of the room. That would suggest that DHCP receive was okay but send was not.If I set up an ovpn connection on the router itself instead of on a computer behind it, is the dchp offer going to come from the same instance as the lan connections? It is not, right? OVPN has a built in function that serves is own pool. Assuming it is separate and not tied to it then I should still be able to make a connection from outside and potentially be able to reboot if the same thing occurs again and it is only the dhcp function that isn't working but the rest of the unit is working. Am I overlooking anything with that approach?
@dennypage Good point. I'll try to plug in and see what I can find. I might just have to steal a small UPS from somewhere else or order one for this.
Thank you all for the feedback. Since this is a recurring issue I'll try to gather some more info for troubleshooting instead of just bringing things back online. Hopefully I can reproduce the issue but if not this will give me a few more things to look at.
-
If it were just a DHCP issue (unlikely IMO) then, yes, you would be able connect to the VPN still.
I would try to replicate the state then test what's actually broken from the console.