Random internet connection drops



  • Hello,

    I have started to have problems with my Pfsense installation that is on version 2.4.4-RELEASE-p3 (amd64). It has worked flawlessly for couple of years but now couple of times a day the internet just stops working. It might be even more than just couple of times, but I only notice it every now and then. When this happens, I notice it because no web page loads. Browsers only say that "Could not load the web page, check connectivity" and this last around maybe 30 seconds. All the services are on according to Pfsense GUI and WAN is online.

    Does anyone have any ideas what could cause this? Or is there any logs that could releave what is wrong with pfsense? No reboots etc have helped and temperatures are around 50 celcius and below.


  • LAYER 8

    can be anything, really
    check Status -> system logs to see if there is something of interest
    check "cpu usage" and Diagnostic -> system activity
    check cable and network card


  • Netgate Administrator

    Yeah if it just seems to happen and recovers on it's own then either there really is some upstream connectivity problem. Check the Status > Monitoring graphs for WAN quality. You might want to change the monitoring target to something other than the gateway IP.
    Or the firewall itself stops passing traffic, possibly because it's doing something that uses all available CPU cycles. Can you access the firewall GUI whilst this is happening?

    Steve



  • What is your monitor IP set to under System/Routing/Gateways? I would suggest setting that to something like Google's DNS (8.8.8.8). If you have nothing entered there, then by default it will ping your modem/gateway and tell you your WAN is up, but that really only tells you the modem responds to pings. By setting it to something like 8.8.8.8, that could tell you if your modem is not able to reach the internet.

    When a web page does not load it may not be due to an issue with access to the web. It might just be a DNS issue.
    Next time you have an issue with a web page loading, try opening a command prompt. Then type in ping google.com

    As shown below, it should resolve to an IP address. That is the line highlighted in yellow. If it is not able to resolve to an IP shown in brackets, then that indicates a DNS issue.
    38fd9eee-eae8-488b-8fce-c8e1eac6f39d-image.png

    The next thing I would try is to ping 8.8.8.8. If that works and responds to the pings, then it confirms it is a DNS issue and you do not have a problem with access to the internet. Once you have that narrowed down, we can try to help figure out the next steps. As kiokoman says, it could be anything so you have to help us help you.



  • Thank you all for replies. First thing I did was actually to adjust Suricata logging, that was filling my hard drive, had only 87 % left on it. Do not know if that could cause problems like this...?

    My monitoring IP is my ISP gateway, pfsense is the router between my LAN and ISP.

    I actually have not had any problems today and yesterday evening. Have been eagerly waiting so I could check logs. There were some errors on system logs but they were during night so no idea if network has been up or down during that time and of course I did not copy the errors so can't remenber what they were. But when it happens again I will check logs immediately.

    Thanks again for your replies at this point!


  • Netgate Administrator

    If the drive ever becomes completely full it can cause problems. Resaving the Suricata log rotation settings usually corrects it if the logs are not rotating as expected.

    Steve



  • I'm guessing that might be a typo on the 87% left? Did you mean the logs were taking up 87% of the space?

    Again, I would highly recommend not using the ISP gateway for the monitor IP. That does not tell you the full story of your web access. Change it to 8.8.8.8 or any other reliable IP out on the web.

    Edit:
    Oh and I am assuming you have checked the Suricata blocks/alert tab (depending on legacy/inline mode) to be sure the site(s) you're having trouble with didn't get blocked? Depending on your rules, there could be a lot of false positives. I have learned the hard way that sometimes addresses that are blocked do not resolve to the site you're accessing, yet the still affect the sites you want to access.



  • @Raffi_ Yes that's a typo, 87 % full of course :D Thanks for the advise, I change it to Google DNS :)



  • If you have trouble again, try clearing your blocks tab (assuming legacy mode) in Suricata and then refresh the web page. If that immediately solves it, then you have to look into your rules to figure out what is triggering the block.



  • @Raffi_ My Suricata has been on blocking mode about 24 hours so that's not the problem. Will paste logs when the outage happens again!



  • Blocking is disabled in Suricata? It is only used for monitoring?



  • @Raffi_ It was only for monitoring, I collected logs to and viewed them in separate platform. But now it is in blocking mode, just configured it!



  • If you are having trouble loading sites. I would advise to disable blocking mode for now and go back to monitoring. Also, when you disable blocking, you still have to clear the blocks tab in order to allow that traffic to pass again.



  • I think that it happened again. Was playing playstation and tried to look someones profile and eventually playstation informed me that there is something wrong with DNS. Immediately went to Pfsense, took a glance at the logs and noticed that there are events like this:

    Aug 31 18:27:29 	unbound 	79372:0 	info: start of service (unbound 1.9.1).
    Aug 31 18:27:29 	unbound 	79372:0 	notice: init module 1: iterator
    Aug 31 18:27:29 	unbound 	79372:0 	notice: init module 0: validator
    Aug 31 18:27:07 	unbound 	79372:0 	notice: Restart of unbound 1.9.1.
    Aug 31 18:27:07 	unbound 	79372:0 	info: server stats for thread 3: requestlist max 2 avg 1 exceeded 0 jostled 0
    Aug 31 18:27:07 	unbound 	79372:0 	info: server stats for thread 3: 3 queries, 0 answers from cache, 3 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    Aug 31 18:27:07 	unbound 	79372:0 	info: server stats for thread 2: requestlist max 2 avg 1 exceeded 0 jostled 0
    Aug 31 18:27:07 	unbound 	79372:0 	info: server stats for thread 2: 3 queries, 0 answers from cache, 3 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    Aug 31 18:27:07 	unbound 	79372:0 	info: server stats for thread 1: requestlist max 2 avg 1 exceeded 0 jostled 0
    Aug 31 18:27:07 	unbound 	79372:0 	info: server stats for thread 1: 3 queries, 0 answers from cache, 3 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    Aug 31 18:27:07 	unbound 	79372:0 	info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
    Aug 31 18:27:07 	unbound 	79372:0 	info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    Aug 31 18:27:07 	unbound 	79372:0 	info: service stopped (unbound 1.9.1).
    Aug 31 18:27:07 	unbound 	79372:2 	info: generate keytag query _ta-4f66. NULL IN
    Aug 31 18:27:07 	unbound 	79372:0 	info: start of service (unbound 1.9.1).
    Aug 31 18:27:07 	unbound 	79372:0 	notice: init module 1: iterator
    Aug 31 18:27:07 	unbound 	79372:0 	notice: init module 0: validator
    Aug 31 18:26:45 	unbound 	79372:0 	notice: Restart of unbound 1.9.1.
    

    From DHCP logs I saw this

    Aug 31 18:26:45 	dhcpleases 		Sending HUP signal to dns daemon(79372)
    Aug 31 18:26:45 	dhcpleases 		Sending HUP signal to dns daemon(79372) 
    

    I googled it and based on that I unticked the option "Register DHCP leases in the DNS Resolver" from my router. I will see if that helps.


  • Netgate Administrator

    Hmm, 27s seems excessive for Unbound to restart. Do you have pfBlocker running with DNS-BL enabled and a lot of lists?

    Steve



  • @stephenw10 I do have quite a list on DNSBL.


  • Netgate Administrator

    Hmm, OK if it takes that long reloading then adding dhcp leases probably isn't a practical option. That is probably what you're hitting there.

    Steve



  • Good catch @JohanÅ ! Yea, I agree with Steve on this. The combination of large pfblocker lists and the register DHCP leases option in unbound has high potential for trouble. I had to disable DHCP registration in resolver also. Let us know how it goes.


Log in to reply