pfSense stopped routing after power failure
-
I had a power failure to the pfSense device, and ever since the power came back, I have no L3 routing or NAT at all. I'm running 2.4.4-RELEASE-p3 (amd64) on a Protectli , specs are as follows:
Intel Dual Core Celeron J1800, 64 bit, 2.4GHz, 2MB L2 Cache
2 Intel Gigabit Ethernet NIC ports (em0 WAN, em1 LAN)
8GB DDR3 PCL-12800
Kingston 120G SSDNOW UV500 MSATAThis little guy has been running like a wet dream, no more than 30% RAM usage peak and at times around 60% CPU peak. Max temp I've recorded was 41C.
What I have is em0 connected to a switch, then to my fiber modem. em1 is connected to my Cisco SG300-10 and there's 3 VLANs passing to this guy, 30, 31, 34. 30 is my server VLAN, static IP's only, 31 is my IoT with DHCP and only route to Internet. VLAN34 is my general DHCP VLAN, with access to 30 and 31. 30 has access to both 31 and 34, but 31 does not have access to 30 or 34.
This has worked very well for some time now, but earlier tonight I had a little power outage, and ever since then, things just have not been right. I can't get to anything that is not on my same subnet, unless I'm on the pfSense itself, then I can get everywhere.
So, my laptop is on VLAN34, and I can SSH into the pfSense and can ping it. I can ping other devices on VLAN34, but if I try to ping anything on 30, or the Internet (8.8.8.8 for instance), I get nothing. Same thing happens if I am on a server in VLAN30, I can ping anything on my subnet, but forget anything outside of the subnet.
It seems to me that this power outage has caused pfSense to loose it's L3 routing capability, which is disturbing to say the least. I've rebooted several times and validated that the rules are all correct, NAT config is right, etc.
For more troubleshooting, I did a packet capture on VLAN30 interface using pfSense packet capture in promiscuous mode, and I can see the pings coming from my server on VLAN30 but no response from the device on VLAN34. I switched that capture to the VLAN34 interface and repeated, and no packets showed up in the capture from my server, which really makes me thing that it is a routing problem in pfSense. NAT and port-forwarding also do not work, which if this is as I suspect then that makes sense.
I don't want to reload this guy, that would not be the most fun in the world, so I'm hoping folks can offer a different solution. If it helps, after rebooting pfSense, I have to SSH into it and restart the webconfigurator, otherwise the web GUI doesn't accept connections.
Like I said, I'm at my wits end and am really close to wiping and starting again, but hoping someone has a thought before I go that route.
-
remove the cisco and directly connect a pc to em1 and see if it work
upload some screenshot of your gui configuration (routing table / firewall rules) -
@kiokoman there’s a lot of reconfig that would need to be done due to the VLAN tagging, but not sure what difference that would make since between the server and the pfSense I have no issues, it’s only when needing to do L3 routing I have issues which would be all internal to pfSense.
I will work on the screen shots.
-
You need to examine your system log and perhaps the
dmesg.boot
log for errors. You will find several service logs in /var/log on the firewall. The power failure very well could have resulted in a corrupted filesystem and perhaps some critical component is now not starting correctly.You should put your firewall (and perhaps also your switch) on a small UPS. $40 - $60 is cheap for the peace of mind it buys you by protecting your firewall from filesystem corruption.
-
@bmeeks did run fsck at reboot thinking of the corruption but don’t see any logs for it.
As for the logs, nothing jumped out at me but I will take a second look and will look at dmesg which I did t think of with this being FreeBSD and not Linux.
Didn’t really worry about a UPS because the power is really stable, this was self inflicted due to bad labeling of breakers but you bet I will be getting a UPS at this point.
-
The fact you have to restart the web configurator in order to gain GUI access indicates something is damaged in the pfSense configuration. Exactly what that may be, I don't know. Looking through those logs in /var/log may give you a clue.
-
@bmeeks ok so running sysctl -w net.inet.ip.forwarding=1 gets my internal connectivity restored. NAT is still not working but tells me something’s definitely corrupt on the file system.
So, not even sure where to set this on FreeBSD, and how to troubleshoot the NAT since iptables isn’t used.
-
@jlw52761 said in pfSense stopped routing after power failure:
@bmeeks ok so running sysctl -w net.inet.ip.forwarding=1 gets my internal connectivity restored. NAT is still not working but tells me something’s definitely corrupt on the file system.
So, not even sure where to set this on FreeBSD, and how to troubleshoot the NAT since iptables isn’t used.
It really would be easier to reinstall and restore your config. Check your config backups. Copy the most recent one off to some other location (for example, the PC you normally use to administer pfSense). You can then reinstall pfSense and restore the saved backup configuration. That will put back all of your VLANs and other setup information.
-
@bmeeks yeah, I think I’m at that point. Things seem to be really borked.
-
@bmeeks I ended up just reinstalling the OS and doing a restore. Things came right back up without a hitch.
-
@jlw52761 said in pfSense stopped routing after power failure:
@bmeeks I ended up just reinstalling the OS and doing a restore. Things came right back up without a hitch.
I know having to do that is a little painful and scary, but it's usually the best cure. Now get a UPS configured and install either the
nut
orapcupsd
package to monitor the UPS and gracefully shutdown pfSense when there is another power failure and the battery is near exhaustion. Installing a package is important as that lets the UPS notify the firewall that the AC mains are down and the battery is about to expire. The package code then shuts down pfSense gracefully. You can configured when that happens, but I think the default is when there are 5 minutes of battery life remaining.I had an incident in my neighborhood recently where the driver of a car ran off the road and knocked down a power pole. My house was without power for nearly 6 hours while repairs were made. I have a Netgate SG-5100 and my cable modem plugged into a APC BackUPS 650 ES. The UPS kept my firewall and cable modem running the entire duration of the power outage. Of course the same power pole also carried my cable Internet connection so I was dead in the water in terms of connectivity. I also have a UPS on all of my other computers including my ESXi servers. They all stayed up until their batteries neared exhaustion, then they each shutdown gracefully. Once power was restored they all came right back up just like nothing ever happened.
-
@bmeeks Got one 500VA UPS coming tomorrow for the fiber modem, pfSense, switch, and the two UniFi AP's. This will be USB cabled to the pfSense and it's an APC so if
nut
doesn't work then I will useapcupsd
.I have a second UPS also coming, in the range of 1800VA for my NAS, switch, and ESXi boxes.
As far as the restore, since I have backups and copy them off the appliance, it was stupid simple. I created the USB key, added the FAT32 "Recover" partition and copied the backup xml file and named it config.xml. I had to hook the firewall up to my TV as I have no VGA monitors in my house surprisingly, but it booted, installed, and on reboot applied config.xml and was up and going. Stupid simple DR in my mind and a huge bonus for pfSense in my book!
From now on, it's going to be a DR instead of hours of troubleshooting, its just too damned easy to recover.
Going to use a SIIG USB over IP device and a FTDI cable to have remote access to the console for any future needs.