Intermittent Internet Access after Update



  • I have just come back home from some extended travel and was doing a bit of housekeeping including updating my pfSense install.

    After updating from 2.2.6 to 2.3.1 and then 2.3.1_1, I am experiencing intermittent internet access, as in, after a reboot internet access works correctly but will die in ~30mins - 1 hour from rebooting.

    WAN interface seems to be up at all times, but when 'down' I get no connectivity to the internet on any devices.

    Attached is a screenshot of my system log. The problems seem to coincide with the processes at 23:25 in the screen shot. But I am way out of my depth as to what is actually happening or how to solve it.

    Potentially worth noting, the php-fpm processes that there seem to be a lot of are apparently occuring in the future…

    In summary, all seems to be fine after a reboot for a while, until a similar set of processes occur as depicted from 23:25 in the screenshot. I can still access LAN when I lose the internet, just not the internet.

    Is anyone able to point me in the right direction for further troubleshooting. Happy to provide more info, I just wasn't sure what would be useful.

    Regards,

    rancid




  • I seem to have a similar problem after the upgrade to 2.3 and then 2.3.1.  In my case, files choke on download after somewhere between 10 and 20 MB of download size.  Streaming is no problem - it's as if the system times out.  Documents, software blobs - even the 2.3.1 pfsense download - choked and timed out.  It's definitely related to 2.3, and I believe it's related to dnsmasq.  When I disable all services, the problem went away long enough to get the 2.3.1 upgrade.

    I'm still working through it and should have more information later that may be of help.  I'm betting your problem is related.



  • Just to update this and not leave it open. This issue has now been resolved.

    To be honest I don't think I got the root cause. I run pfSense in a VM on ESXi.

    What I did do was I updated to esxi host to v6, the problem was still present after this. I then went away for a week, which was bliss, not having to reboot pfsense every half hour!

    On my return I didn't have the problem. Not sure why.

    I did have an IP address conflict message pop up on my computer before I went. All I can think was that it was an IP conflict issue and the DHCP lease expired whilst I was away and when I came back things resolved themselves. This is a ballpark guess though.

    Sorry this isn't more useful!

    rancid



  • Right, so it would seem that this problem hasn't been solved at all. I could use some diagnostic advice if anyone is able to spare the time.

    So I think everything in my original post still stands. My problem exhibits exactly the same symptoms.

    I have learnt a few new things

    • The problem exhibits after a reboot, and appears to resolve itself if I leave for a while. Example, I have rebooted the server got the half hourly internet cut outs, gone away for the weekend and when I come back, system is up and running smoothly without me doing anything. If I were to reboot again then, it would go back into the 'cycle of severe annoyance' as I am now calling it.
      n.b. If I go to work for the day, this does not appear to be long enough and the problem is still active when I return.

    • I have an external IP on the WAN interface at all times, it is up on the dashboard.

    • Gateway goes 'offline' when internet breaks. I suspect this should be a big clue but I just don't know what steps to use to troubleshoot/diagnose and make use of this knowledge.

    I still think it has something to do with the process described in my first post and included in the attachment. I suspect somehow they are taking the gateway offline. If that is right I just need to find out how/why!

    Any help appreciated.

    rancid.



  • Apparently if I leave it with the problem for 12+ hours, when I get back the Internet is up and working again.

    Still no further towards identifying cause. I have internet now, but I am assuming upon restarting the server/vm it will kick in the the cycle of intermittent Internet.

    Help appreciated.

    rancid



  • Okay, so I thought I would update this post. There wasn't much help being offered here but in case this helps anyone.

    I did some searching regarding the gateway offline issue and came across this:
    https://forum.pfsense.org/index.php?topic=110043.0

    This sounds very much aligned with my problem, so I tried the steps mentioned in there, I was hopeful, but this did not help my situation. I put the link there as it does seem to have worked for some people and somebody might come across this thread and find it useful.

    Having tried that and noticing that I was getting a lot of entries in the log from dpinger. In somewhat reckless (desperate) fashion I stopped the service from the dashboard. I now have consistent internet, no dropouts to speak of yet - I am only a few hours in at the moment. I don't know what this means really other that I probably won't have any gateway monitoring…

    Not sure if this is going to be perm or something I will have to do every restart either.

    If anyone has any more info on what might be going on / how my actions will affect pfSense going forwards, I'm all ears!

    Cheers,

    rancid



  • Not sure this has been successful. I miss the days when my pfsense box was rock solid. This is sooo frustrating…



  • Right, so I gave up on my install completely, I have gone for a complete reinstall, I'm still not seeing consistent internet access though.

    What I have done: Tried two different versions of pfsense as clean installs on esxi host, no back up from config xml, set them both up from new.

    Initial version that exhibited problems 2.3.1 i386 - problems described in this thread.

    New versions tried:
    2.2.6 x64
    2.3.2 x64

    Both new (clean) versions exhibit the same gateway issues.

    Known differences from my initial setup, both the 2.2.6 and 2.3.2 x64 versions seem to have DNS resolver enabled by default, I was originally using dns forwarder. I don't recall if this was something I changed originally, it would have been years ago if I did. Not sure if/how this affects my problems now.

    I don't know if I should go back to an even older version to check if that would work, but from by best memory 2.2.6 was working for me in my initial setup. So this is currently leading me to think this a setup issue of my making. No surprises there…! :o

    So, in the new versions I have installed I am seeing WAN Interface gets an IP no problems. WAN_DHCP gateway fluctuates between online and offline quite frequently and regularly, and hence internet access is affected inline with this.

    My system log seems to point to something weird going on with WAN, namely it repeats this error over and over again.

    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: Info: starting on em1.
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: No IPv6 address found for interface WAN [wan].
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: Info: starting on em1.
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: No IPv6 address found for interface WAN [wan].
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: Info: starting on em1.
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: No IPv6 address found for interface WAN [wan].
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: Info: starting on em1.
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: No IPv6 address found for interface WAN [wan].
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: Info: starting on em1.
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: No IPv6 address found for interface WAN [wan].
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: Info: starting on em1.
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: No IPv6 address found for interface WAN [wan].
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: Info: starting on em1.
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: No IPv6 address found for interface WAN [wan].
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: Info: starting on em1.
    Aug 7 19:44:50 	php-fpm 	29545 	/rc.newwanipv6: rc.newwanipv6: No IPv6 address found for interface WAN [wan]. 
    

    Can anyone explain what is going on here?? Is this a problem that I should fix??

    My gateway logs are taken up with dpinger alerts that show the fluctuating internet access. See below.

    Aug 7 18:57:38 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 8826us stddev 9725us loss 21%
    Aug 7 18:58:08 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 1178329us stddev 6722886us loss 70%
    Aug 7 18:58:39 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 7930us stddev 2873us loss 53%
    Aug 7 18:59:08 	dpinger 		WAN_DHCP 176.26.X.X: Clear latency 7733us stddev 2029us loss 5%
    Aug 7 19:00:50 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 7837us stddev 2265us loss 21%
    Aug 7 19:01:25 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 1800145us stddev 8209140us loss 81%
    Aug 7 19:01:57 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 7688us stddev 640us loss 50%
    Aug 7 19:02:27 	dpinger 		WAN_DHCP 176.26.X.X: Clear latency 20475us stddev 44552us loss 7%
    Aug 7 19:13:26 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 9490us stddev 16525us loss 21%
    Aug 7 19:15:04 	dpinger 		WAN_DHCP 176.26.X.X: Clear latency 83613us stddev 641992us loss 14%
    Aug 7 19:15:16 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 79299us stddev 671765us loss 21%
    Aug 7 19:15:55 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 1556632us stddev 8193464us loss 75%
    Aug 7 19:16:20 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 8150us stddev 2703us loss 60%
    Aug 7 19:16:54 	dpinger 		WAN_DHCP 176.26.X.X: Clear latency 7864us stddev 1783us loss 5%
    Aug 7 19:18:58 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 7711us stddev 403us loss 22%
    Aug 7 19:19:30 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 2216698us stddev 8411963us loss 73%
    Aug 7 19:20:06 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 0us stddev 0us loss 100%
    Aug 7 19:20:09 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 35555248us stddev 1018363us loss 94%
    Aug 7 19:20:46 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 7738us stddev 641us loss 68%
    Aug 7 19:21:08 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 1045933us stddev 6229170us loss 68%
    Aug 7 19:21:40 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 11149us stddev 7093us loss 92%
    Aug 7 19:22:00 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 11225886us stddev 19425589us loss 89%
    Aug 7 19:22:26 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 7700us stddev 460us loss 59%
    Aug 7 19:22:59 	dpinger 		WAN_DHCP 176.26.X.X: Clear latency 7673us stddev 507us loss 6%
    Aug 7 19:25:38 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 8322us stddev 2945us loss 22%
    Aug 7 19:26:09 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 1275590us stddev 6943279us loss 73%
    Aug 7 19:26:39 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 8116us stddev 2273us loss 51%
    Aug 7 19:27:08 	dpinger 		WAN_DHCP 176.26.X.X: Clear latency 8017us stddev 2098us loss 5%
    Aug 7 19:33:26 	dpinger 		WAN_DHCP 176.26.X.X: Alarm latency 8084us stddev 2627us loss 22%
    Aug 7 19:34:49 	dpinger 		WAN_DHCP 176.26.X.X: Clear latency 7715us stddev 681us loss 5% 
    

    That's all for this update I think, I'm happy to provide anything that would be of use to help diagnose this.

    Note the above log snippets are from the 2.3.2 x64 version. I figure I might as well use the latest version that I would prefer to continue with going forwards - even if it currently isn't working!

    Any suggestions or discussion welcomed!  :)

    rancid


Log in to reply