pfSense on netgate 6100 stops passing traffic multiple times per day
-
@stephenw10 Hi Steve, the only other thing on the network is a WiFi router which is set to AP mode, it get it's DHCP instructions from the Netgate. I have checked the IP information on all the devices whenever this happens and they all still have their correct IP's and gateway set to the Netgate. I will check the Diag -> routes next time. (should be soon)
-
How are you actually testing? How does it fail?
The description sure sounds like some rogue device could be the cause. If it was an IP conflict pfSense would be complaining loudly in the logs.
Make sure when you test that traffic from LAN side clients is actually arriving at pfSense. Check the state tables or add logging to the LAN pass rules so it appears in the firewall logs.
Steve
-
@stephenw10 Hi Steve, first thanks for responding, I appreciate it.
You're asking how do I test, when the Internet suddenly goes down out of nowhere, the first test was, can I still get to devices locally? Which I can, I can still get to any local device, so the next step, is can I get into the firewall, which I can, the Web GUI works just fine. There are no alerts or alarms or logs about anything. The WAN interface on the firewall reports that it is still up. I go the diagnostics page on the Web Gui and I ping google.com, it works. I try to ping google.com from my computer, it fails. What's between me and google.com? The firewall. It works from the firewall, but not my computer. So I try to ping 8.8.8.8 to see if it's a DNS issue, but that fails as well (but works from the firewall diagnostics tool). I check ipconfig to make sure my IP address and gateway are correct, they are. I reboot the firewall, the Internet comes back (for anywhere between 2 and 12 hours, then it goes down again). I also did all the steps which I outlined in the original post. (basically trying to see if it was a particular service or if disabling/re-enabling the WAN interface brought it back or if clearing the states table would fix the issue). The only thing that brings the Internet connection back is rebooting the firewall. Reboots take about 5 minutes. Every time I reboot it, the Internet connection comes back. I'm discounting the rogue DHCP server because nothing on this network has changed in at least 5 years, probably more. (In terms of no new devices added or taken away).
I will add logging to the LAN pass rules as you suggest.
-
@dragonfly There are some disk troubleshooting docs here https://docs.netgate.com/pfsense/en/latest/troubleshooting/index.html#hardware
5 minutes for a normal restart seems very long. 5 minutes for a version update seems fine. I haven’t managed a 6100 yet but a 2100/3100 takes 10-15 minutes or more normally. eMMC vs SSD makes a difference too.
-
@steveits This one is an eMMC. Maybe that's normal for this?
-
@dragonfly I’d expect maybe a minute? I’d watch the console during boot…. And/or check the system and boot logs after. I’m not sure what could make a normal restart take that long.
-
@steveits Ok, I just reviewed the boot logs during the two reboots which took place today. Nothing really stands out, I mean there's a lot of lines of things happening, but nothing looks odd or out of place. It is taking almost exactly 5 minutes from the the issuing of the reboot command to the device being fully reloaded and functional.
-
Ok, that's good troubleshooting. Always difficult to judge quite what level people are operating from on the forum.
So I would still want to be sure traffic from the clients is actually arriving at pfSense when it's in the failed situation. I would just run a pcap on LAN whilst pinging. But using the state table or logging passed traffic will also show that.
If it is arriving and it is being passed then I'd have to guess it's some NAT failure or maybe a package. Snort or Suricata could present like this in blocking mode.
Since pfSense itself can ping google.com it must still have a valid default route. But it may lose the default gateway (whilst keeping the route) somehow.Go to System > Routing > Gateways. Set the WAN gateway as the default rather then 'automatic'. That may prevent the issue.
Steve
-
@stephenw10 Hi Steve, I have added logging on the LAN pass rules as you've suggested and I also just made the changed you suggested in the routing -> gateways screen, let's see how things go today. I appreciate your help. If it goes down today, I'll see if I can post the logging from the LAN pass screen.
-
Go to System -> Package Manager
Install: "Netgate_Firmware_Upgrade" and "System_Patches" if not installed.Go for the "Netgate Firmware Upgrade" under System and install the newest Version.
If you use the old one, the boot take minutes to check and select the boot device, afte the update and the nessesary power cycle, reboot take 90-120 Seconds.
-
@nocling Thank you this was helpful, I thought I was on the latest firmware, but I wasn't both of the screens had updates that I wasn't aware of (I am new to pfSense).
Also as an overall update, since I changed the gateway setting above (from @stephenw10) , it hasn't gone down, one other thing, there was an external IP address that was mercilessly hitting the firewall and I'm wondering if that was taking my connection down. I created a rule to completely block the IP at the same time I made that change on the gateway screen, since then I haven't gone down. If I get through another couple of days then I'm willing to say this is solved.
-
@dragonfly said in pfSense on netgate 6100 stops passing traffic multiple times per day:
there was an external IP address that was mercilessly hitting the firewall
If it was hitting the firewall I assume I was being blocked? If so adding a different rule to block it wouldn't change anything. Unless the new rule is non-logging and hit rate was so high that the number of block logs was creating a significant load.