PfSense not recovering from WAN failure
-
Hi all
I'm running the latetest pfSense on a PC Engines APU which is connected to a cable modem from UPC Cablecom in bridged mode (pfSense directly get's a dynamic public IP). Once in a while (way too often) the WAN fails. Already two times the WAN connection did not recover and a manual interaction was required. Manually renewing the DHCP Lease on the WAN interface is sufficient to bring WAN back to live. During the failure apinger considers the interface as down (which is logical).
I suspect that the problem comes from a combination of the pfSense and the cable modem but I'm having difficulties to troubleshoot it or to find a work around. Has anyone else experienced similar problems? Is there a way to tell pfSense to "reset" the WAN interface if it goes down? Any ideas on how I can troubleshoot this? I'll gladly provide further information.
Thanks in advance!
-
I had a similar issue. I inadvertently posted the solution in another thread, so pardon the double posts. Under Interfaces > WAN try these advanced options:
Before this I would have to call the ISP and request a reboot of the modem after an outage if I was not on site renew the DHCP lease.
-
What IP, if any, does it have on WAN at the time? joako's guess is probably pretty likely, usually that's attributable to your cable modem handing out a private IP lease when it's lost upstream connectivity. If it has no IP at the time, your dhclient logs from Status>System logs, DHCP tab would be necessary to tell anything.
-
Thanks joako and cmb! :D This provides an possible explanation of the issue.
I haven't noticed that the Firewall would get a private IP from the cable modem. The issue hasn't appeared again for a while now (and I hope it will stay like that). If it happens again I will pay specific attention to the WAN IP the Firewall gets. -
Hi all,
It's happening again. Now I have more detailed information and logs. So hopefully someone here can help me troubleshoot.
Here what happens
-
WAN connection shows problems and eventually goes down completely (confirmed from provider)
-
WAN comes back up after some time
-
pfSense still thinks WAN is "down" (monitoring shows 100% loss)
-
Manually releasing / renewing DHCP lease on the WAN interface brings the pfSense back online.
Observations
-
pfSense has same public IP all the time (cable modem is configured in bridge mode). So the solution stated by joako seems not to apply unfortunately.
-
dpinger does log package loss but not the interface as being down (see logs below), but
-
Monitoring (Quality) shows 100% loss and,
-
the Gateway widget on the Dashboard shows the WAN as offline
Questions
Might there be something wrong with my configuration? Is there a way to have the pfSense recovering from such failures more gracefully and without the need of a manual interaction? A script which is executed and brings the interface up again?Logs
Status > System Logs > System > GatewaysTime Process PID Message Jul 4 11:09:11 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 77.xx.xx.1 bind_addr 77.xx.xy.129 identifier "WAN_DHCP " #THIS IS AFTER DHCP RENEW (2\. time) Jul 4 11:09:08 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 11:09:08 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 11:09:07 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 11:09:07 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 11:09:06 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 11:09:06 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 11:09:05 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 11:09:05 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 11:09:04 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 11:09:04 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 #THIS IS WHEN DHCP RELEASE WAS MANUALLY TRIGGERED (2.time) Jul 4 10:20:53 dpinger WAN_DHCP 77.xx.xx.1: Alarm latency 8220us stddev 2679us loss 22% #THIS IS WHEN THE ISSUE BEGAN (2\. time) Jul 4 08:58:01 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 77.xx.xx.1 bind_addr 77.xx.xy.129 identifier "WAN_DHCP " #THIS IS AFTER DHCP RENEW Jul 4 08:57:58 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 08:57:57 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 08:57:57 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 08:57:56 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 08:57:56 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 08:57:55 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 08:57:55 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 Jul 4 08:57:54 dpinger WAN_DHCP 77.xx.xx.1: sendto error: 65 #THIS IS WHEN DHCP RELEASE WAS MANUALLY TRIGGERED Jul 4 08:22:44 dpinger WAN_DHCP 77.xx.xx.1: Alarm latency 8248us stddev 2020us loss 22% #THIS IS WHEN THE ISSUE BEGAN Jul 2 23:13:48 dpinger WAN_DHCP 77.xx.xx.1: Clear latency 79384us stddev 269716us loss 11%
Status > System Logs > DHCP (dhclient only)
Time Process PID Message Jul 4 11:09:09 dhclient 95303 bound to 77.xx.xy.129 -- renewal in 33465 seconds. Jul 4 11:09:09 dhclient Creating resolv.conf Jul 4 11:09:09 dhclient /sbin/route add default 77.xx.xx.1 Jul 4 11:09:09 dhclient Adding new routes to interface: re0 Jul 4 11:09:09 dhclient New Routers (re0): 77.xx.xx.1 Jul 4 11:09:09 dhclient New Broadcast Address (re0): 255.255.255.255 Jul 4 11:09:09 dhclient New Subnet Mask (re0): 255.255.240.0 Jul 4 11:09:09 dhclient New IP Address (re0): 77.xx.xy.129 Jul 4 11:09:09 dhclient ifconfig re0 inet 77.xx.xy.129 netmask 255.255.240.0 broadcast 255.255.255.255 Jul 4 11:09:09 dhclient Starting add_new_address() Jul 4 11:09:09 dhclient REBOOT Jul 4 11:09:09 dhclient 95303 DHCPACK from 10.yz.yz.1 Jul 4 11:09:09 dhclient 95303 DHCPREQUEST on re0 to 255.255.255.255 port 67 Jul 4 11:09:09 dhclient PREINIT Jul 4 11:09:02 dhclient 52332 exiting. Jul 4 11:09:02 dhclient 52332 connection closed Jul 4 08:57:58 dhclient 51480 bound to 77.xx.xy.129 -- renewal in 37400 seconds. Jul 4 08:57:58 dhclient Creating resolv.conf Jul 4 08:57:58 dhclient /sbin/route add default 77.xx.xx.1 Jul 4 08:57:58 dhclient Adding new routes to interface: re0 Jul 4 08:57:58 dhclient New Routers (re0): 77.xx.xx.1 Jul 4 08:57:58 dhclient New Broadcast Address (re0): 255.255.255.255 Jul 4 08:57:58 dhclient New Subnet Mask (re0): 255.255.240.0 Jul 4 08:57:58 dhclient New IP Address (re0): 77.xx.xy.129 Jul 4 08:57:58 dhclient ifconfig re0 inet 77.xx.xy.129 netmask 255.255.240.0 broadcast 255.255.255.255 Jul 4 08:57:58 dhclient Starting add_new_address() Jul 4 08:57:58 dhclient REBOOT Jul 4 08:57:58 dhclient 51480 DHCPACK from 10.yz.yz.1 Jul 4 08:57:58 dhclient 51480 DHCPREQUEST on re0 to 255.255.255.255 port 67 Jul 4 08:57:58 dhclient PREINIT Jul 4 08:57:53 dhclient 71337 exiting. Jul 4 08:57:53 dhclient 71337 connection closed Jul 4 05:44:39 dhclient Creating resolv.conf Jul 4 05:44:39 dhclient RENEW
Status > System Logs > System > General (additional ARP messages removed)
Jul 4 09:05:14 kernel arp: 192.168.150.47 moved from f4:f9:51:d9:43:c9 to 70:73:cb:b4:c7:1f on re1_vlan2 Jul 4 08:58:11 radiusd 35760 Ready to process requests. Jul 4 08:58:11 radiusd 35254 Loaded virtual server <default>Jul 4 08:58:10 radiusd 27655 Exiting normally. Jul 4 08:58:10 radiusd 27655 Signalled to terminate Jul 4 08:58:09 radiusd 27655 Ready to process requests. Jul 4 08:58:09 radiusd 27452 Loaded virtual server <default>Jul 4 08:58:09 php-fpm 59374 /rc.start_packages: Skipping STARTing packages process because previous/another instance is already running Jul 4 08:58:08 radiusd 35007 Exiting normally. Jul 4 08:58:08 radiusd 35007 Signalled to terminate Jul 4 08:58:08 php-fpm 41273 /rc.start_packages: Restarting/Starting all packages. Jul 4 08:58:08 check_reload_status Starting packages Jul 4 08:58:08 php-fpm 59374 /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 77.xx.xy.129 -> 77.xx.xy.129 - Restarting packages. Jul 4 08:58:07 check_reload_status Starting packages Jul 4 08:58:07 php-fpm 41273 /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - -> 10.0.8.1 - Restarting packages. Jul 4 08:58:07 check_reload_status Reloading filter Jul 4 08:58:07 php-fpm 41273 /rc.newwanip: rc.newwanip called with empty interface. Jul 4 08:58:07 php-fpm 41273 /rc.newwanip: rc.newwanip: on (IP address: 10.0.8.1) (interface: []) (real interface: ovpns1). Jul 4 08:58:07 php-fpm 41273 /rc.newwanip: rc.newwanip: Info: starting on ovpns1. Jul 4 08:58:06 check_reload_status rc.newwanip starting ovpns1 Jul 4 08:58:05 kernel ovpns1: link state changed to UP Jul 4 08:58:05 php-fpm 59374 /rc.newwanip: Creating rrd update script Jul 4 08:58:05 check_reload_status Reloading filter Jul 4 08:58:05 check_reload_status Reloading filter Jul 4 08:58:05 kernel ovpns1: link state changed to DOWN Jul 4 08:58:05 php-fpm 59374 /rc.newwanip: Resyncing OpenVPN instances for interface WAN. Jul 4 08:58:04 php-fpm 59374 /rc.newwanip: phpDynDNS (): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Jul 4 08:58:01 php-fpm 59374 /rc.newwanip: ROUTING: setting default route to 77.xx.xx.1 Jul 4 08:57:59 php-fpm 59374 /rc.newwanip: rc.newwanip: on (IP address: 77.xx.xy.129) (interface: WAN[wan]) (real interface: re0). Jul 4 08:57:59 php-fpm 59374 /rc.newwanip: rc.newwanip: Info: starting on re0. Jul 4 08:57:58 check_reload_status rc.newwanip starting re0 Jul 4 08:57:58 kernel arpresolve: can't allocate llinfo for 77.xx.xx.1 on re0 Jul 4 08:57:57 kernel arpresolve: can't allocate llinfo for 77.xx.xx.1 on re0 Jul 4 08:57:56 kernel arpresolve: can't allocate llinfo for 77.xx.xx.1 on re0 Jul 4 08:57:56 kernel arpresolve: can't allocate llinfo for 77.xx.xx.1 on re0 Jul 4 08:57:56 kernel arpresolve: can't allocate llinfo for 77.xx.xx.1 on re0 Jul 4 08:57:54 kernel arpresolve: can't allocate llinfo for 77.xx.xx.1 on re0 Jul 4 08:57:54 kernel arpresolve: can't allocate llinfo for 77.xx.xx.1 on re0 Jul 4 08:57:54 kernel arpresolve: can't allocate llinfo for 77.xx.xx.1 on re0 Jul 4 08:57:54 kernel arpresolve: can't allocate llinfo for 77.xx.xx.1 on re0 Jul 4 08:56:19 kernel arp: 192.168.150.47 moved from 70:73:cb:b4:c7:1f to f4:f9:51:d9:43:c9 on re1_vlan2 Jul 4 08:55:38 check_reload_status Syncing firewall Jul 4 08:55:22 check_reload_status Syncing firewall .... Jul 4 08:54:34 php-fpm 83885 /index.php: Successful login for user 'admin' from: 192.168.148.160 Jul 4 08:53:19 kernel arp: 192.168.150.47 moved from 70:73:cb:b4:c7:1f to f4:f9:51:d9:43:c9 on re1_vlan2 ..... Jul 4 08:23:19 kernel arp: 192.168.150.47 moved from 70:73:cb:b4:c7:1f to f4:f9:51:d9:43:c9 on re1_vlan2 Jul 4 08:22:46 php-fpm 83885 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP. Jul 4 08:22:45 php-fpm 86536 /rc.dyndns.update: phpDynDNS (): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Jul 4 08:22:44 check_reload_status Reloading filter Jul 4 08:22:44 check_reload_status Restarting OpenVPN tunnels/interfaces Jul 4 08:22:44 check_reload_status Restarting ipsec tunnels Jul 4 08:22:44 check_reload_status updating dyndns WAN_DHCP Jul 4 08:21:58 kernel arp: 192.168.150.47 moved from f4:f9:51:d9:43:c9 to 70:73:cb:b4:c7:1f on re1_vlan2</default></default>
-
-
Bump for this. I have exact the same issue that was happened right after ISP remotely upgraded the modem firmware and rebooted.
-
I'm seeing this as well and would really like to fix it. dhclient seems to give up and stay with its current lease for over a day. That's not going to work for me. If I have to, I'll write a cron script to restart dhclient every 5 minutes when I can't ping google or similar. Or just all the time. I did try the timing options joako mentioned. Perhaps that will help, I added the modem IP to reject leases as well, but it's not getting one from there, it just seems like a good idea in general.
-
Still no solution? I have SG-2440 attached to cable modem.
I have similar problem reported by OP.Most of the time my connection comes back after 2-3 minutes.
I have the 'honor' of being the last connection on the cable line. -
I just got bit by this one myself. I've got it set to reject DHCP leases from the cable modem and when I came home I had no IPv4 address listed in the interface status. A quick renew fixed it but I would love to have something that retry DHCP lease every so often.
-
I may be having the same problem as FreeMinded
Manually Saving Interfaces/WAN brings the connection back.
My modem is part of a satellite service which in Australia means the connection (upstream of modem) is sometimes intermittent. (ie all too frequent need for manual intervention to restore service).
Has anyone succeeded in working around this problem?I am on latest release:
2.3.4-RELEASE (amd64) -
I also have the same issue with UPC.
But in my case the UPC DHCP server does not answer and pfSense then uses the recorded lease which does not work and just goes offline without trying to get a new lease. Is there a way to keep trying to get a new lease without a timeout?Even though the logs says that it is using the recorded IP, pfSense actually does not. In Status => Interfaces WAN Interface (wan, em0) the Status is up, but the interface has no IP.
Jul 21 02:41:18 pfs dhclient[38369]: em0 link state up -> down Jul 21 02:41:19 pfs dhclient[32644]: connection closed Jul 21 02:41:19 pfs dhclient[32644]: exiting. Jul 21 02:41:57 pfs dhclient: PREINIT Jul 21 02:41:57 pfs dhclient[54351]: DHCPREQUEST on em0 to 255.255.255.255 port 67 Jul 21 02:41:59 pfs dhclient[54351]: DHCPREQUEST on em0 to 255.255.255.255 port 67 Jul 21 02:42:03 pfs dhclient[54351]: DHCPREQUEST on em0 to 255.255.255.255 port 67 Jul 21 02:42:08 pfs dhclient[54351]: DHCPDISCOVER on em0 to 255.255.255.255 port 67 interval 2 Jul 21 02:42:10 pfs dhclient[54351]: DHCPDISCOVER on em0 to 255.255.255.255 port 67 interval 3 Jul 21 02:42:13 pfs dhclient[54351]: DHCPDISCOVER on em0 to 255.255.255.255 port 67 interval 8 Jul 21 02:42:21 pfs dhclient[54351]: DHCPDISCOVER on em0 to 255.255.255.255 port 67 interval 8 Jul 21 02:42:29 pfs dhclient[54351]: DHCPDISCOVER on em0 to 255.255.255.255 port 67 interval 15 Jul 21 02:42:44 pfs dhclient[54351]: DHCPDISCOVER on em0 to 255.255.255.255 port 67 interval 9 Jul 21 02:42:53 pfs dhclient[54351]: DHCPDISCOVER on em0 to 255.255.255.255 port 67 interval 13 Jul 21 02:43:06 pfs dhclient[54351]: DHCPDISCOVER on em0 to 255.255.255.255 port 67 interval 3 Jul 21 02:43:09 pfs dhclient[54351]: No DHCPOFFERS received. Jul 21 02:43:09 pfs dhclient[54351]: Trying recorded lease 84.112.2X4.XX Jul 21 02:43:09 pfs dhclient: TIMEOUT Jul 21 02:43:09 pfs dhclient: Starting add_new_address() Jul 21 02:43:09 pfs dhclient: ifconfig em0 inet 84.112.2X4.XX netmask 255.255.255.0 broadcast 255.255.255.255 Jul 21 02:43:09 pfs dhclient: New IP Address (em0): 84.112.2X4.XX Jul 21 02:43:09 pfs dhclient: New Subnet Mask (em0): 255.255.255.0 Jul 21 02:43:09 pfs dhclient: New Broadcast Address (em0): 255.255.255.255 Jul 21 02:43:09 pfs dhclient: New Routers (em0): 84.112.2X4.1 Jul 21 02:43:10 pfs dhclient: New Routers (em0): 84.112.2X4.1 Jul 21 02:43:11 pfs dhclient: Deleting old routes
Also the protocol timing does not seem to work if the documentation is correct because it should retry every 5 minutes: https://www.freebsd.org/cgi/man.cgi?query=dhclient.conf&sektion=5#PROTOCOL_TIMING
Thanks!
-
on actual isp outages pfsense has been fine for me (when it hasnt I believed it to be an isp issue).
However yesterday I was testing failover by pulling the wan cable for about 30 seconds and then replacing it.
After the test my ipv6 traffic was working fine, the wan ipv4 gateway continued to show 100% loss tho, and no action was taken by pfsense to recover the connection, so there was nothing in the logs for it to reinitialise the wan, was very odd. Otherwise tho in normal use I havent had problems.
I might play with the settings shown in this thread, I think wont work, but never know. :)
-
Hi I have the same or similar problem. My pfsense box keep dropping the wan ip e.g it set it to 0.0.0.0 and never gets out of this state until I reboot. I have two other devices connected to the same fiber switch and thy have nerver failed for the last 3 years (Dlink router and a alarm box).
-
I also have the same issue with UPC.
But in my case the UPC DHCP server does not answer and pfSense then uses the recorded lease which does not work and just goes offline without trying to get a new lease. Is there a way to keep trying to get a new lease without a timeout?Dear all,
I have the same problem :-[
What i figured out is, if I unplug the WAN-cable from pfSense / cable-modem and plug it in again, than the gateway is „green“ and the Internet works again …Has someone solved the problem finally ?
Is it a pfSense bug ?Thanks a lot in advance !!
-
Dear all,
I have found a solution for my WAN failure problem with UPC and pfSense.
Symptoms:
WAN would randomly lose its connection and not recover.
Status > System Logs > System > Gateways Log ist full of sendto error: 65
Status > System Logs > System > DHCP Log shows:
No DHCPOFFERS received.
Trying recorded lease XXX.XXX.XXX.XXX
Deleting old routesExplanation:
The UPC router goes down for an unknown reason and might restart. (Not sure what the router is doing)
When the links to the router comes back up PFS tries to get a lease, but by default times out after 60 seconds and then tries the recorded lease.
That would in my case even work since I always have the same IP.
Unfortunately at the end PFS deletes the old routes including the default route.
This causes the sendto error: 65.
You can check this by going to: Diagnostics > Routes and check if you still have a default route when WAN is down.
In my opinion this is a bug.Solution:
In my case it always takes about 3 minutes until the UPC dhcp server starts to respond again.
So all you have to do is go to: Interfaces > WAN
DHCP Client Configuration section
check Options Advanced Configuration
Set the Protocol timing Timeout higher.
I chose 1800 which is 30 minutes, but I wanted to be on the safe side.Since then my WAN has recovered every time.
I also attached a screenshot.![Bildschirmfoto 2017-08-24 um 12.10.17.png](/public/imported_attachments/1/Bildschirmfoto 2017-08-24 um 12.10.17.png)
![Bildschirmfoto 2017-08-24 um 12.10.17.png_thumb](/public/imported_attachments/1/Bildschirmfoto 2017-08-24 um 12.10.17.png_thumb) -
@bunzilein, thank you very much for sharing your experience and solution :)
Is there any possibility to inform the pf-Team on this issue to check whether it is a bug or not ?
As newbie, may I ask you to explain or to share to me some link where I can study what each of the "Protocol timing" options mean and what the values are doing (causing), if you don't mind, please ?
Thanks again a lot !!
-
The link is actually on the bottom of the DHCP client configuration section: https://www.freebsd.org/cgi/man.cgi?query=dhclient.conf&sektion=5#PROTOCOL_TIMING
If this also solves your problem and you have the same behavior with default route being deleted then maybe it is a bug.
Test it and if you find it to be a bug as well report it like this: https://doc.pfsense.org/index.php/Bug_reporting -
thanks bunzilein will try your solution.
I had an actual isp outage a couple of nights ago, the ipv6 came back up by itself but the ipv4 didnt, the unit just sat there not even showing activity in logs trying to reconnect. When I manually disabled the WAN and reenabled it again it came back up fine.
Also my isp is dhcp also, so is possibly perhaps a default settings issue.
-
@bunzilein, belated thanks for your posted tentative solution.
I have set my porotocol timing timeout to 1800 seconds and will run with that from here on or until further indications arise. Unfortunately it can be a month or more between the observed incidents so it will take a while to draw subjective conclusions. -
Hi
I have seen several fellows having the same issue
Suddenly we loose internet connectionIn my case after almost exactly 8 hrs of use, pfsense kicks me out of the internet, and I do need to restart my pfsense
I am trying the following:
Set up my modem to use a smaller DHCP range of IPs, and on pfsense I set up WAN interface with a static IP, outside of that rangeI will see if that solves the problem, I am posting it before confirming that, if maybe it helps someone
Have a good time