Intermittent loss of internet connectivity



  • Hi all, this is my third post on issues I am having with an XG7100-1U...
    I have it working fine, ipsec tunnels all up and connected. Randomly, the LAN users loose internet connectivity although the ipsec tunnels are active as well as Openvpn server and connections. When this happens, I cannot ping from the firewall out to google or any other site.
    I use unbound resolver and connect to Cloudflare quad1, as well as Quad9 vis SSL TLS which works fine. I have the same setup on two SG4860s with no issues.
    In Unbound, I have the following custom options:
    server:
    forward-zone:
    name: "."
    forward-ssl-upstream: yes
    forward-addr: 1.1.1.1@853
    forward-addr: 1.0.0.1@853
    forward-addr: 9.9.9.9@853
    forward-addr: 149.112.112.112@853
    which work fine anywhere else.

    The routing log shows the following entry:
    Jul 12 14:05:19 pfSense miniupnpd[74620]: PCP: External IP in request didn't match interface IP

    The Gateways log shows:
    Jul 12 15:42:02 pfSense dpinger: VIDEOTRON_DHCP xx.xxx.xxx.x: Alarm latency 33937us stddev 17827us loss 22%
    Jul 12 15:45:51 pfSense dpinger: VIDEOTRON_DHCP xx.xxx.xxx.x: Clear latency 42931us stddev 23990us loss 12%

    I have stopped, restarted services, Unbound, reloaded filters, shutdown and restarted the machine to no avail. I am currently connected to it via ipsec but cannot ping outside the firewall.
    Is it a hardware thing ? This unit is expensive and I would expect it to be flawless so I don't think it is hardware. It has run fine and suddenly stops routing traffic to the internet.

    Heeeelllp !!

    If anyone can offer any assistance it would be greatly appreciated.


  • Netgate Administrator

    You have upnp enabled, does it need to be? Can you disable it as a test?

    Check the routing table in Diag > Routes when it's in this state. Does it have a default route? Is it the correct one?

    Steve



  • @stephenw10 Thanks for the heads-up... I deactivated UPNP and it seems as thought this may have been an issue. The routing table seems ok. I will monitor it over the week end. Can't believe it would be this simple...



  • @stephenw10 My only preoccupation is that the configuration as noted is identical on two other SG4860s that work flawlessly. I was worried it might be the hardware. I have had to exchange two SG4860s in the past due to bad hardware. When you have someone relying on you to propose networking solutions and the hardware is faulty it does not bode well....


  • Netgate Administrator

    That doesn't seem like a hardware issue.

    Can clients connect to devices across the VPN when they lose general internet connectivity?

    We need to determine exactly what is working and what isn't what this happens.

    Steve



  • @stephenw10 Yes, I can access via ipsec tunnel and OpenVPN and from there I ping test sites and I get NO connection. Now this connection is from a Montreal Cable Internet provider and we recently moved to a new office where they installed a new much smaller cable modem. However, when directly connected to the cable modem with a laptop, it runs fast and without issues. As per earlier posts, I remotely disabled upnp this AM and I got connectivity back. On this network I have a rack-mount Synology NAS that has 4 LAGGed ports to a Unifi switch on which jumbo frames are enabled. I was recently doing a Dropbox dowstream backup back to the NAS and the router seemed to loose connectivity. Could the jumbo frames between the switch and NAS be an issue ?


  • Netgate Administrator

    @claferriere said in Intermittent loss of internet connectivity:

    Could the jumbo frames between the switch and NAS be an issue ?

    Unlikely.

    Are the VPN site-to-site? Can you access resources across them when it fails?

    That implies it's passing traffic fine and has connectivity upstream too. A bad default route would have been a good fit for that issue but you say it's fine.

    Steve



  • @stephenw10 Yes, site to site and through the ipsec tunnel other resources on the connected network. I understand the bad route issue, but what would cause everything to suddenly just stop ? It worked fine for a week or two then it stopped. I would reboot the cable modem and or the pfsense box and would get back the connection. Nothing in the logs other than what I indicated in my initial post seemed out of character.


  • Netgate Administrator

    If the default route was lost you would only be able to reach subnets you have static routes to, which would include OpenVPN, or over IPSec which is policy based.

    The firewall itself is unable to ping by IP or FQDN when this happens?

    Can it resolve anything?

    The DNS setup you have is the only unusual thing you've posted so far.

    Steve



  • @stephenw10 The firewall was unable to ping ip or FQDN after loss of internet access, but as mentioned, ipsec and openvpn were fine. Dns under Diagnostics lookup was also not working.
    When you say "Unusual" about the DNS, it was setup to ensure secure DNS lookups to Cloudflare, Quad9. It has been configured like this on the SG4860s as well and it works fine usually. Should I just be using the pre-configured options in Unbound?
    What about flushing routes if IP goes down ?


  • Netgate Administrator

    What error do you see when you try to ping by IP? No route to host or 100% packet loss?

    Steve



  • @stephenw10 100% packet loss.



  • @claferriere said in Intermittent loss of internet connectivity:

    forward-zone:
    name: "."
    forward-ssl-upstream: yes
    forward-addr: 1.1.1.1@853
    forward-addr: 1.0.0.1@853
    forward-addr: 9.9.9.9@853
    forward-addr: 149.112.112.112@853

    Not that I'm using DNS over TLS but I really thought there is no need any more to manually enter these option : it became a simple check box.
    What pfSense version are you using ?



  • @Gertjan It was recommended when I set it up. I believe the traffic didn't show up on port 853 without this.



  • See https://www.netgate.com/blog/pfsense-2-4-4-release-now-available.html (pfSense 2.4.0 from Septembre - last year) : it was included.

    The next logical question : what is your pfSense version ?



  • @Gertjan 2.4.4 P3 on all machines



  • Ok, great.

    When you drop VPN usage and step back to a normal "WAN' connection, then your packet loss issue is gone ?



  • @Gertjan No, the packet loss was generalized for anything on the network. However, I can still connect via ipsec or Openvpn. Once on the pfsense box, I could not ping or dns lookup from the Diag menu... But since I turned off NAT PNP it seems to have resolved the issue...keeping my fingers crossed !


  • Netgate Administrator

    Mmm, that implies something was opening things using upnp that somehow broke opening new states perhaps. Hard to see how it could do that though. Was it open to requests from WAN maybe?
    Something local to the device triggering it would explain why the same setup appears fine on other hardware in other location.

    Steve


Log in to reply