WAN connection randomly drops?



  • Hi all,

    I've setup my pfSense router (as a vm) a week ago and everything is working how it should. Except for one thing, the WAN connection seems to be dropping at random (em0: link state changed to DOWN). When the connection drops it seemingly seems to receive a new WAN ip, but the IP stays the same. It also isn't out for long, 1-2 minutes every time it has happened. I haven't noticed any pattern either, it just seems to happen randomly. It has happened 4 times in the past couple of days.

    My pfSense router is behind a bridged router from my ISP.

    These are the first two events that happen when the WAN goes down:

    Feb 22 12:12:45	check_reload_status		Linkup starting em0
    Feb 22 12:12:45	kernel		                em0: link state changed to DOWN
    

    The check_reload_status event fires first, then the link state changes to down. After that it starts reloading the services and after about half a minute the WAN comes back up again.

    Feb 22 12:13:23	kernel		                em0: link state changed to UP
    Feb 22 12:13:23	check_reload_status		Linkup starting em0
    

    I'm pretty much a newbie and I'm not sure where to look anymore as I couldn't find a similar topic. Hopefully you guys can help me out here. Let me know if there's any more info I need to post!


  • Netgate Administrator

    It will show the 'IP address changed' log line if it changes from the real IP to 0.0.0.0 and back for example.

    That log shows it actually loses link though. Are you passing through the NIC to that VM directly? What's it connected to?

    Steve



  • It doesn't show an 'IP address changed' log line, but it definitely changes to 0.0.0.0 when down. I'm passing through a HP NC364T quad nic to the VM. The em0 WAN port is directly connected to the bridged ISP router



  • I'm a newbie at this as well, so take this with a grain of salt.

    I was having the same issues as you, WAN would drop randomly 1-2 times a day (no good when gaming). I connect through my modem with PPPoE and not DHCP, which I think is what the issue was. I tried everything to try and fix it, but nothing worked (new computer, new NIC, new cables, fresh install of pfSense, etc.). As a last resort I tried 2.5 and it has been rock solid for over a week with no disconnects.

    Like I said before take what I say knowing I'm new at this, but connecting to my ISP through PPPoE just wasn't working on 2.4, but 2.5 seems to have solved the issue for me.



  • @flyboy320 My ISP doesn't support PPPoE so that's not going to work


  • Netgate Administrator

    Well 2.5 has fixes for numerous other things, it might work for you too but it would be better to narrow down what is actually failing when that happens.



  • It just happened again

    Feb 23 19:53:44	php-fpm	                85374	/rc.start_packages: Restarting/Starting all packages.
    Feb 23 19:53:43	check_reload_status		Starting packages
    Feb 23 19:53:43	php-fpm	                72364	/rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 185.47.x.x -> 185.47.x.x - Restarting packages.
    Feb 23 19:53:41	php-fpm	                72364	/rc.newwanip: Creating rrd update script
    Feb 23 19:53:41	php-fpm	                72364	/rc.newwanip: Resyncing OpenVPN instances for interface WAN.
    Feb 23 19:53:39	check_reload_status		Reloading filter
    Feb 23 19:53:39	check_reload_status		updating dyndns wan
    Feb 23 19:53:38	php-fpm	                72364	/rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1582484018] unbound[83768:0] error: bind: address already in use [1582484018] unbound[83768:0] fatal error: could not open ports'
    Feb 23 19:53:36	php-fpm	                72364	/rc.newwanip: Gateway, none 'available' for inet6, use the first one configured. ''
    Feb 23 19:53:36	php-fpm	                72364	/rc.newwanip: rc.newwanip: on (IP address: 185.47.x.x) (interface: WAN[wan]) (real interface: em0).
    Feb 23 19:53:36	php-fpm	                72364	/rc.newwanip: rc.newwanip: Info: starting on em0.
    Feb 23 19:53:35	check_reload_status		Restarting ipsec tunnels
    Feb 23 19:53:35	php-fpm	                43417	/rc.linkup: Gateway, none 'available' for inet6, use the first one configured. ''
    Feb 23 19:53:35	php-fpm	                43417	/rc.linkup: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP'
    Feb 23 19:53:35	check_reload_status		rc.newwanip starting em0
    Feb 23 19:52:55	php-fpm	                43417	/rc.linkup: HOTPLUG: Configuring interface wan
    Feb 23 19:52:55	php-fpm	                43417	/rc.linkup: DEVD Ethernet attached event for wan
    Feb 23 19:52:54	kernel		                em0: link state changed to UP
    Feb 23 19:52:54	check_reload_status		Linkup starting em0
    Feb 23 19:52:33	php-fpm	                43417	/rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
    Feb 23 19:52:33	php-fpm	                43417	/rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP'
    Feb 23 19:52:32	check_reload_status		Reloading filter
    Feb 23 19:52:32	check_reload_status		Restarting OpenVPN tunnels/interfaces
    Feb 23 19:52:32	check_reload_status		Restarting ipsec tunnels
    Feb 23 19:52:32	check_reload_status		updating dyndns WAN_DHCP
    Feb 23 19:52:32	rc.gateway_alarm	14180	>>> Gateway alarm: WAN_DHCP (Addr:185.47.x.x Alarm:1 RTT:1.386ms RTTsd:.291ms Loss:22%)
    Feb 23 19:52:18	check_reload_status		Reloading filter
    Feb 23 19:52:17	php-fpm	                43417	/rc.linkup: DEVD Ethernet detached event for wan
    Feb 23 19:52:16	kernel		                em0: link state changed to DOWN
    Feb 23 19:52:16	check_reload_status		Linkup starting em0
    

    I'm really not sure what's going on and why this is happening


  • Netgate Administrator

    Feb 23 19:52:54	kernel		                em0: link state changed to UP
    Feb 23 19:52:54	check_reload_status		Linkup starting em0
    ...
    Feb 23 19:52:17	php-fpm	                43417	/rc.linkup: DEVD Ethernet detached event for wan
    Feb 23 19:52:16	kernel		                em0: link state changed to DOWN
    Feb 23 19:52:16	check_reload_status		Linkup starting em0
    

    This implies the NIC lost link. Since it's a VM that's unlikely unless it is a physical NIC passed through, is it?

    What is em0 connected to?

    Steve



  • It is a physical nic passed through and em0 is the WAN connection from the bridged router


  • Netgate Administrator

    And it's connected directly to the router? You might try putting a switch in between.

    If it really is losing link that will probably prevent it but it will still lose connectivity.

    It loses link for 38s, is the upstream router rebooting?

    Steve



  • Not sure what good a switch would do when the connection just randomly drops?

    I don't think the bridged router is restarting because it never happened before I switched to pfSense, but at this point I'm not sure if it's pfSense's fault or the isp router config. Would there be any way to 'debug' this problem?



  • i don't want to be 'that guy', but do you happen to have a cable tester? or are you using a 'known good cable'?


  • Netgate Administrator

    Putting a switch in between should mean the link would not drop between em0 and the switch. You would see that change in the logs. If it does not drop then it looks like a problem with the router, or at least with the connection between that and the NIC. If it still drops then it's some issue with pfSense on the VM setup.

    Steve



  • @sparkyMcpenguin I've tested the cables with a cable tester and did a speedtest, those work fine. That shouldn't be the reason for a random drop.

    @stephenw10 Hmm I could give that a try and see if that works. If it's not pfSense's fault then I'll have to contact my isp, thanks for the tip!



  • Another update: Even with the switch attached it still dropped packets, but not the link. I'm pretty sure this is an ISP issue so I'll get in touch with them. I'll post another update once that's done.



  • It is most definitely something wrong with pfSense. The same thing is happening to a VM somewhere across the country.

    Here's the log of that VM:

    Feb 23 00:14:51 	php-fpm 	        60158 	/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP.
    Feb 23 00:14:51 	php-fpm 	        60158 	/rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. 'WAN_DHCP6'
    Feb 23 00:14:51 	php-fpm 	        60158 	/rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP'
    Feb 23 00:14:50 	check_reload_status 		Reloading filter
    Feb 23 00:14:50 	check_reload_status 		Restarting OpenVPN tunnels/interfaces
    Feb 23 00:14:50 	check_reload_status 		Restarting ipsec tunnels
    Feb 23 00:14:50 	check_reload_status 		updating dyndns WAN_DHCP
    Feb 23 00:14:50 	rc.gateway_alarm 	96155 	>>> Gateway alarm: WAN_DHCP (Addr:145.44.x.1 Alarm:0 RTT:5.100ms RTTsd:3.103ms Loss:17%)
    Feb 23 00:11:48 	php-fpm 	        27479 	/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP.
    Feb 23 00:11:48 	php-fpm 	        27479 	/rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. 'WAN_DHCP6'
    Feb 23 00:11:48 	php-fpm 	        27479 	/rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP'
    Feb 23 00:11:47 	check_reload_status 		Reloading filter
    Feb 23 00:11:47 	check_reload_status 		Restarting OpenVPN tunnels/interfaces
    Feb 23 00:11:47 	check_reload_status 		Restarting ipsec tunnels
    Feb 23 00:11:47 	check_reload_status 		updating dyndns WAN_DHCP
    Feb 23 00:11:47 	rc.gateway_alarm 	8754 	>>> Gateway alarm: WAN_DHCP (Addr:145.44.x.1 Alarm:1 RTT:5.367ms RTTsd:3.278ms Loss:21%) 
    
    

    And here's my log (after putting a switch between the bridged router and pfSense:

    Feb 25 03:46:01	check_reload_status		Reloading filter
    Feb 25 03:46:01	check_reload_status		Restarting OpenVPN tunnels/interfaces
    Feb 25 03:46:01	check_reload_status		Restarting ipsec tunnels
    Feb 25 03:46:01	check_reload_status		updating dyndns WAN_DHCP
    Feb 25 03:46:00	rc.gateway_alarm	43954	>>> Gateway alarm: WAN_DHCP (Addr:185.x.x.1 Alarm:0 RTT:1.397ms RTTsd:.280ms Loss:5%)
    Feb 25 03:43:59	php-fpm	                43417	/rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. ''
    Feb 25 03:43:59	php-fpm	                43417	/rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP'
    Feb 25 03:43:58	check_reload_status		Reloading filter
    Feb 25 03:43:58	check_reload_status		Restarting OpenVPN tunnels/interfaces
    Feb 25 03:43:58	check_reload_status		Restarting ipsec tunnels
    Feb 25 03:43:58	check_reload_status		updating dyndns WAN_DHCP
    Feb 25 03:43:58	rc.gateway_alarm	71623	>>> Gateway alarm: WAN_DHCP (Addr:185.x.x.1 Alarm:1 RTT:1.477ms RTTsd:.290ms Loss:22%)
    

    The interval is different for both VMs tho and mine is settling at somewhere between 2 and 1.5 days. On the other VM there were 5 days in between. It's starting to look like some sort of DHCP lease setting is configured wrong? Maybe you can give some more insight @stephenw10 ?


  • Netgate Administrator

    Neither of those logs indicate any issue other than there was more than 20% packet loss on the WAN to the IP being monitored. Everything else there is exactly what I would expect to happen when there is that much packet loss.

    Make sure you are monitoring an IP that actually responds to ping reliably. The ISP gateway may not always do that.

    If that's the only gateway you can disable 'gateway monitoring action' on it whilst still monitoring. That will avoid most of what you see in the logs there but it shouldn't be causing a problem.

    Make sure you have a default IPv4 gateway set in Sys > Routing > Gateways rather than automatic to avoid switching to a bad gateway.

    Steve



  • @stephenw10 There's only one gateway configured in the routing section which is my ipv4 dhcp. I've already tried to set the monitoring ip to the google dns, but the same alarms still appear. At the time those alarms appear the internet is infact down. Is there a setting I'm overlooking here?


  • Netgate Administrator

    No that's fine then. If the internet is actually down the alarms should trigger.

    What exactly is the problem here?



  • @stephenw10 It's the fact that the internet is down every 1.5 day when that literally never happened before I switched to pfSense. It's nice that the pfSense alarms trigger, but that's not the problem. It seems like pfSense disconnects the internet every 1.5 day for no apparent reason. It looks like the same thing is happening to the random VM server which makes it seem like it's a pfsense bug or setting that's configured incorrectly.


  • Netgate Administrator

    If you disable the gateway monitoring action it will not actually do anything other then set an alarm.

    Check the quality monitoring graphs in Status > Monitoring. Are you seeing packet loss consistently or just spikes before it goes down?
    How were you monitoring the connection be fore you had pfSense?

    Steve



  • @stephenw10 This is the graph showing the packet loss at the moment it goes downf9a952f7aff57918b2f50fd3935cd9ab.png

    There are no other spikes (traffic, packets etc) to be found. It's just plain packet loss at a random time. I wasn't really monitoring the connection before pfSense, but I never experienced internet outages this often. The only outages that happened were more like 1 hour+ long and those only happened rarely



  • @ExecutableFix said in WAN connection randomly drops?:

    That shouldn't be the reason for a random drop.

    Poor connections are often the cause of intermittent failure. I've seen those RJ45s fail many times.



  • @JKnott

    @JKnott said in WAN connection randomly drops?:

    @ExecutableFix said in WAN connection randomly drops?:

    That shouldn't be the reason for a random drop.

    Poor connections are often the cause of intermittent failure. I've seen those RJ45s fail many times.

    This is different tho, plus there definitely wouldn't be a pattern


  • Netgate Administrator

    Ok, good news. The binary part of the fix for this is now in 2.4.5 snapshots:
    https://github.com/pfsense/FreeBSD-src/commits/RELENG_2_4_5/sbin/dhclient/dhclient.c

    The next available snapshot should have it. The full fix also requires changes to the dhclient script which can be applied via the system patches package. I have briefly tested that and it didn't seem to break anything.

    That patch is here: https://redmine.pfsense.org/attachments/download/2682/pfsense-dhclient-script-patch.txt

    If you're able to test it we may be able to include it in 2.4.5.

    Steve



  • @stephenw10 Oh that's awesome news. I'll try to give it a shot tomorrow. Hopefully this is indeed the fix for the random drops



  • This post is deleted!


  • I've just installed 2.4.5_RC and applied the patch, let's see if this works



  • Update:

    Feb 26 11:25:15	check_reload_status		Reloading filter
    Feb 26 11:25:15	check_reload_status		Restarting OpenVPN tunnels/interfaces
    Feb 26 11:25:15	check_reload_status		Restarting ipsec tunnels
    Feb 26 11:25:15	check_reload_status		updating dyndns WAN_DHCP
    Feb 26 11:25:15	rc.gateway_alarm	44749	>>> Gateway alarm: WAN_DHCP (Addr:185.47.x.1 Alarm:0 RTT:1.455ms RTTsd:.288ms Loss:5%)
    Feb 26 11:23:19	check_reload_status		Reloading filter
    Feb 26 11:23:19	check_reload_status		Restarting OpenVPN tunnels/interfaces
    Feb 26 11:23:19	check_reload_status		Restarting ipsec tunnels
    Feb 26 11:23:19	check_reload_status		updating dyndns WAN_DHCP
    Feb 26 11:23:19	rc.gateway_alarm	87807	>>> Gateway alarm: WAN_DHCP (Addr:185.47.x.1 Alarm:1 RTT:1.437ms RTTsd:.274ms Loss:21%)
    

    I don't know what to do anymore. This happened after the patch had been applied. The loss is always about 20% and the interval matches again: 1.5~1.6 days. That can't be a coincidence anymore



  • Under interface -wan -advanced there is a reject dhcp lease option

    Put the wan-dhcp ip there and apply to see if it helps

    This was the last thing I tried before getting a static ip from my provider which ultimately fixed my issue



  • @bcruze Could you provide a screenshot? I can see these settings:
    10b5d0fcb586771e99187c9b1187cfe3.png

    I also calculated the time between the downtimes and it's 1 days, 7 hours and about 40~50 minutes every time. It's definitely some sort of lease time issue



  • i worded it wrong apologies.

    reject leases from : is the area i ment to say.

    you can also try the FreeBSD default for a preset. i tried it but it did not fix my issue

    its kind of confusing, if you tweak it you have to select saved cfg to see what its actually set too. at first i didn't think it was saving my settings...



  • @bcruze So setting that option to the x.x.x.1 ip should reject new leases and thus not disconnect my internet. Was your issue exactly the same? Is it something that is caused by the way pfSense handles these things?



  • according to what i've read Pfsense should ignore the lease change. at least what i've read online.

    i changed ISP from charter spectrum to a carrier grade nat fiber to the home company. previously my IP never changed with spectrum so i never saw this issue. with CGNAT my Nokia modems internal ip changed roughly every 24-29 hours. my internet would go down and unplugging the cable from the nokia from pfsense and plugging it back it would fix it. OR if you went into status - interface > release wait a few seconds then renew. my connection would come back up.

    my work around is paying 10 dollars a month(static IP address) to my provider to stay online as they have no way of changing the programming of the Nokia modem. all the other modems they did not suggest. and i refuse to replace my Pfsense routers, ubnt POE switches and ubiquiti Nano, LR access points! to their equipment...



  • @bcruze I unfortunately don't have the option to get a static IP, but my IP has infact never changed before so I'm not too worried about that happening. The only thing I'm concerned about is the fact that the internet automatically comes back up again (which could indicate an issue with my isp rather then pfSense). I've not had to manually do anything for it to be online again. I'll definitely give this a shot and see of it resolves anything.



  • I know you are trying to troubleshoot and find the actual issue, but as a last resort I would still give 2.5 a try.



  • @flyboy320 I'll give @bcruze's method a try first, if that doesn't work then I might upgrade to 2.5. I still think it's an ISP issue


  • Netgate Administrator

    Check the DHCP log though at that time. Look for dhclient trying and failing.

    I just realised I got confused between threads. I meant to have @bcruze test this because what he's seeing does look like dhcp. 🙄

    Steve



  • @stephenw10 Interesting thing is, the dhcp logs don't indicate any issue



  • @ExecutableFix

    The DHCP logs won't show anything, unless it fails to get an address. That will only happen when it tries to renew the address. This is typically around 1/2 or 2/3 the lease time. If the connection fails before then, it won't show in the log. When I had a connection problem, several years ago, I wrote a script to ping my ISP's gateway every minute and logged when it failed. This gave me a list of when it failed and restored.


Log in to reply