Auto-renew DHCP after outage
This it! Thanks now it's working fine! Thanks.
@e4ch I had to set pfSense to reject the DHCP info offered by my cable modem when it is not connected to the the Internet, that causes pfSense to wait to do a DHCP request until I'm on-line and getting DHCP information from my ISP instead of the internal modem server.
Reject leases from
To have the DHCP client reject offers from specific DHCP servers, enter their IP addresses here (separate multiple entries with a comma). This is useful for rejecting leases from cable modems that offer private IP addresses when they lose upstream sync.
I still haven't done anything yet and the problem persists. The reject setting did not solve the issue and I'm a bit reluctant to implement a script that is dependent on external sites - from the sample script, already two of the four sites no longer exist. Also, why should we check external sites, if the problem is somehow clearly detectable? When the dashboard shows "n/a" as WAN IP address, then we have a problem.
Let me show you the situation again:
(not sure how to attach images here, so let me give a description)
On the Dashboard, in Interfaces, the WAN shows as "up", but as IP shows "n/a" when the problem exists (with some network traffic on the chart). If there's no problem, it shows a correct external IP address instead.
In Interface Status, Status and DHCP both show as "up" (both in working / not working case). The IPv4 Address only shows in the working case. When it's not working it's not listed. The DNS servers there are shown as 127.0.0.1 and the four servers from my ISP in the working case. When it's not working because the modem rebooted, it only shows the 127.0.0.1. When it's not working because modem and pfSense rebooted both together, it shows all 5 DNS like in the working case.
I created a log file (I'm on latest version 2.4.4-RELEASE-p2(amd64)).
Here first the system log:
Mar 5 01:04:38 kernel igb1: link state changed to DOWN Mar 5 01:04:38 check_reload_status Linkup starting igb1 Mar 5 01:04:39 php-fpm 37326 /rc.linkup: DEVD Ethernet detached event for wan Mar 5 01:04:41 php-fpm 37326 /rc.linkup: Shutting down Router Advertisment daemon cleanly Mar 5 01:04:41 check_reload_status Reloading filter Mar 5 01:04:53 rc.gateway_alarm 8999 >>> Gateway alarm: WAN_DHCP (Addr:8X.XXX.XX.1 Alarm:1 RTT:10.948ms RTTsd:8.602ms Loss:21%) Mar 5 01:04:53 check_reload_status updating dyndns WAN_DHCP Mar 5 01:04:53 check_reload_status Restarting ipsec tunnels Mar 5 01:04:53 check_reload_status Restarting OpenVPN tunnels/interfaces Mar 5 01:04:53 check_reload_status Reloading filter Mar 5 01:04:54 php-fpm 37326 /rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP' Mar 5 01:04:54 php-fpm 37326 /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. '' Mar 5 01:04:54 php-fpm 37326 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP. Mar 5 01:04:54 php-fpm 345 /rc.dyndns.update: Dynamic DNS: updatedns() starting Mar 5 01:04:54 php-fpm 345 /rc.dyndns.update: Dynamic DNS (): running get_failover_interface for wan. found igb1 Mar 5 01:04:54 php-fpm 345 /rc.dyndns.update: Dynamic DNS () There was an error trying to determine the public IP for interface - wan (igb1 ). <<<Time b>>> Mar 5 01:06:08 kernel igb1: link state changed to UP Mar 5 01:06:08 check_reload_status Linkup starting igb1 Mar 5 01:06:09 php-fpm 20955 /rc.linkup: DEVD Ethernet attached event for wan Mar 5 01:06:09 php-fpm 20955 /rc.linkup: HOTPLUG: Configuring interface wan <<<Time c>>> Mar 5 01:07:32 php-fpm 20955 /rc.linkup: calling interface_dhcpv6_configure. Mar 5 01:07:32 php-fpm 20955 /rc.linkup: Accept router advertisements on interface igb1 Mar 5 01:07:32 php-fpm 20955 /rc.linkup: Starting rtsold process Mar 5 01:07:34 php-fpm 20955 /rc.linkup: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP' Mar 5 01:07:34 php-fpm 20955 /rc.linkup: Default gateway setting Interface WAN_DHCP Gateway as default. Mar 5 01:07:34 php-fpm 20955 /rc.linkup: Gateway, none 'available' for inet6, use the first one configured. '' Mar 5 01:07:34 check_reload_status Restarting ipsec tunnels Mar 5 01:07:35 rtsold 9729 <sendpacket> sendmsg on igb1: Permission denied Mar 5 01:07:37 check_reload_status updating dyndns wan Mar 5 01:07:37 check_reload_status Reloading filter Mar 5 01:07:38 php-fpm 344 /rc.dyndns.update: Dynamic DNS: updatedns() starting Mar 5 01:07:39 php-fpm 344 /rc.dyndns.update: Dynamic DNS (): running get_failover_interface for wan. found igb1 Mar 5 01:07:39 php-fpm 344 /rc.dyndns.update: Dynamic DNS () There was an error trying to determine the public IP for interface - wan (igb1 ). Mar 5 01:07:39 rtsold 9729 <sendpacket> sendmsg on igb1: Permission denied Mar 5 01:07:43 rtsold 9729 <sendpacket> sendmsg on igb1: Permission denied
What I've done:
Starting with fully working pfSense
Turned off the ISP modem (logs until <<<Time b>>>)
Turned on the ISP modem again (logs until <<<Time c>>>); Dashboard showing IP 0.0.0.0 for WAN
After rest of the log: Dashboard now showing "n/a" for WAN
Here the DHCP log for the same period:
Mar 5 01:04:38 dhclient 36543 igb1 link state up -> down Mar 5 01:04:39 dhclient 32254 connection closed Mar 5 01:04:39 dhclient 32254 exiting. <<<Time b>>> Mar 5 01:06:09 dhclient PREINIT Mar 5 01:06:09 dhclient 73556 DHCPREQUEST on igb1 to 255.255.255.255 port 67 <<<Time c>>> Mar 5 01:06:11 dhclient 73556 DHCPREQUEST on igb1 to 255.255.255.255 port 67 Mar 5 01:06:16 dhclient 73556 DHCPREQUEST on igb1 to 255.255.255.255 port 67 Mar 5 01:06:29 dhclient 73556 DHCPDISCOVER on igb1 to 255.255.255.255 port 67 interval 2 Mar 5 01:06:31 dhclient 73556 DHCPDISCOVER on igb1 to 255.255.255.255 port 67 interval 3 Mar 5 01:06:34 dhclient 73556 DHCPDISCOVER on igb1 to 255.255.255.255 port 67 interval 8 Mar 5 01:06:42 dhclient 73556 DHCPDISCOVER on igb1 to 255.255.255.255 port 67 interval 13 Mar 5 01:06:55 dhclient 73556 DHCPDISCOVER on igb1 to 255.255.255.255 port 67 interval 20 Mar 5 01:07:15 dhclient 73556 DHCPDISCOVER on igb1 to 255.255.255.255 port 67 interval 13 Mar 5 01:07:28 dhclient 73556 DHCPDISCOVER on igb1 to 255.255.255.255 port 67 interval 2 Mar 5 01:07:30 dhclient 73556 No DHCPOFFERS received. Mar 5 01:07:30 dhclient 73556 Trying recorded lease 8X.XXX.XX.X11 Mar 5 01:07:30 dhclient TIMEOUT Mar 5 01:07:30 dhclient Starting add_new_address() Mar 5 01:07:30 dhclient ifconfig igb1 inet 8X.XXX.XX.X11 netmask 255.255.240.0 broadcast 255.255.255.255 Mar 5 01:07:30 dhclient New IP Address (igb1): 8X.XXX.XX.X11 Mar 5 01:07:30 dhclient New Subnet Mask (igb1): 255.255.240.0 Mar 5 01:07:30 dhclient New Broadcast Address (igb1): 255.255.255.255 Mar 5 01:07:30 dhclient New Routers (igb1): 8X.XXX.XX.1 Mar 5 01:07:31 dhclient New Routers (igb1): 8X.XXX.XX.1 Mar 5 01:07:32 dhclient Deleting old routes Mar 5 01:07:32 dhclient 73556 bound: renewal in 27196 seconds.
Any other ideas than implementing a cron script? And if I have to use a cron script, then how would I need to change it in order to detect this situation without relying on external sites? I'm lacking bash/Linux knowledge here.
Ok, I've now also created a script. Instead of using external sites, I'm now just checking if the WAN adapter (igb1) has an IPv4 address. If not, it waits 2 minutes and tries again. If it still has no IPv4 address, it issues the same commands as the other scripts: ifconfig down, ifconfig up, dhclient (not sure if the last one is necessary). It does not include rebooting the firewall, because I think this is overkill. Actually I just wanted to force-renew the DHCP client lease, but I couldn't get that to work, although I looked at what the PHP source is doing.
I'm using this script now:
#!/bin/sh wan="igb1" LOGFILE=/var/log/pingtest.log currip=$(ifconfig $wan | grep "inet " | cut -d " " -f 2) if test -z "$currip"; then echo `date +%Y%m%d.%H%M%S` "Detected empty IP on $wan! Will try again in 120 seconds." >> $LOGFILE sleep 120 currip=$(ifconfig $wan | grep "inet " | cut -d " " -f 2) if test -z "$currip"; then echo `date +%Y%m%d.%H%M%S` "2nd try: Still empty IP on $wan! Will fix now." >> $LOGFILE ifconfig $wan down sleep 10 ifconfig $wan up sleep 20 dhclient $wan echo `date +%Y%m%d.%H%M%S` "Fixing done!" >> $LOGFILE else echo `date +%Y%m%d.%H%M%S` "2nd try: $wan has IP $currip; ok" >> $LOGFILE fi else echo `date +%Y%m%d.%H%M%S` "$wan has IP $currip; ok" >> $LOGFILE fi
If you want to use it, you would have to do the following:
- Diagnostics / Edit File: Enter file name /usr/local/bin/pingtest.sh and paste the file from above in there and click Save. Update igb1 with the name of your WAN interface. That name is visible in Status / Interfaces in the title. For me it says there: "WAN Interface (wan, igb1)"
- Diagnostics / Command: "chmod +x /usr/local/bin/pingtest.sh" (without quotes) and click Execute. This makes the file runnable.
- System / Package Manager / Available Packages: Install Cron. To my understanding, this is just the user interface for Cron.
- Services / Cron / Settings: Leave the existing packages there and add a new one.
- Minute: "/10" (without quotes) or just "" for every minute, but every 10 minutes should be fine and avoids filling up the log.
- Other values: "*", User: "root", Command: "/usr/local/bin/pingtest.sh"
From time to time you can take a look at the log file "/var/log/pingtest.log" and maybe delete it to avoid that it's getting too big.
For me this works now if I restart the modem. If even works if I schedule it to every minute; then two scripts will be running at the same time, but due to the 2 minute delay and retest, it works fine too. Without the second test it also worked, but then it already issues a fix while starting up, so it tries to fix it twice. I wanted to avoid that. We'll see how this works in the coming months/years.
For me this is a clear bug in pfSense, as this happens with many different modems and searching through the forum shows that many people have this problem. Even if the modem is the culprit, I still think pfSense should be able to recover, because with other clients it works fine.
Thanks a lot for posting a workable solution! Same problem here from same provider (UPC, Switzerland).
And yes, I agree that this should be fixed in pfSense (it should be able to automatically overcome such problems), or at least make such a check an option.
I have exactly the same problem (UPC, Poland). Also while searching for solution I saw that there was many people with this problem here, on reddit, etc. Thank you for that solution. But still I will try to find different and easier fix too. Maybe it is possible to somehow do this without cron and scripts.
One question for people being longer in pfSense community. Because it won't fix by itself. Looking at descriptions from this topic is it enough to create a task in https://redmine.pfsense.org/ ? If not then what should be added? If yes, then should it be a bug (pfSense doesn't react in such a situation) or a feature (detection of such situation should be added).
At the same time, I would be grateful for the suggestion about pfSense code. Could somebody point me the code (if it exists) responsible for the detection of network problems? Even without help, I will check it for myself because this problem is killing me ;) and having some suggestions will make it easier. And if by chance I find even a partial/not perfect solution then I will be more then happy. (I know that here we have a working workaround but it should be addressed in pfSense code)
I was able to narrow down the root cause to a (confirmed: https://lists.freebsd.org/pipermail/freebsd-net/2019-February/052894.html) bug in the FreeBSD DHCP client, dhclient, as well as an apparent bug in the associated dhclient-script provided with pfSense that both involve handling DHCP protocol timeouts improperly.
I have opened a pfSense bug ticket containing the technical details of my findings as well as a working patch set which addresses this at: https://redmine.pfsense.org/issues/9267
Thanks @tomashk! Looks like you found the underlying root cause! I looked at your changes and they sound reasonable, at least the wrong byte that was returned (I don't fully understand the script). Thanks a ton for providing such fixes and posting them at the right place; that helps the maintainers a lot to integrate such fixes more easily into the main branch. Unfortunately, even though you reported this already in January 2019, it doesn't seem to be included in version 2.5 and isn't even in the open issues list, but it is still in the list "new issues". So it might take a while until we can see this in a standard update. There are several threads about this, so this will help many people and I can then finally get rid of my repair script.
@e4ch It is fairly easy to hack this fix in to an existing pfSense install; with the toughest/most involved part being getting dhclient rebuilt. Patching the dhclient-script such that it (correctly) returns nonzero when the default gateway in the cached lease is not pingable is trivially doable with the "system patches" package and the patch from the bug ticket.
The easiest way I have found to build the patched dhclient is to just setup a FreeBSD 11.2 VM, build the patched dhclient and copy over the binary to the pfSense host. This will persist until an update is performed. I am more than willing to share my dhclient binary if desired as well.
You might also be able to use the FreeBSD 12 stable branch dhclient verbatim as well, since such contains the exit status patch, however I haven't tested this personally.
I have the same DHCP issue here. ISP is Net1, Bulgaria. Thanks for the script. It helps with mitigating the problem.
@e4ch Thanks for the script. Just what I needed. I modified it to log to system.log, which uses the clog system, so the log won't get big...
#!/bin/sh wan="em5" currip=$(ifconfig $wan | grep "inet " | cut -d " " -f 2) if test -z "$currip"; then logger `date +%Y%m%d.%H%M%S` "pingtest - Detected empty IP on $wan! Will try again in 120 seconds." sleep 120 currip=$(ifconfig $wan | grep "inet " | cut -d " " -f 2) if test -z "$currip"; then logger `date +%Y%m%d.%H%M%S` "pingtest - 2nd try: Still empty IP on $wan! Will fix now." ifconfig $wan down sleep 10 ifconfig $wan up sleep 20 dhclient $wan logger `date +%Y%m%d.%H%M%S` "pingtest - Fixing done!" else logger `date +%Y%m%d.%H%M%S` "pingtest - 2nd try: $wan has IP $currip; ok" fi else logger `date +%Y%m%d.%H%M%S` "pingtest - $wan has IP $currip; ok" fi
@ohbobva @e4ch thanks for your scripts! This just hit my parents pfSense install - stupid cable went out (and of course they insist nothing is wrong) and pfSense did not automatically reconnect the WAN. I had my mom manually refresh the WAN interface and it came right back up. How annoying! Hopefully they can permanently fix it now that the root cause appears to be evident; in the meantime hopefully this script will take care of it the next time they flake out and swear nothing is wrong but then it magically starts working right after they complain.
@EricE : see https://forum.netgate.com/topic/148017/dhcp-client-issue
Thank You for the scripts!!
Are the scripts still needed in the latest version of PFsense or is this bug fixed?
Then i indeed also keep using them. I didn't have any issues anymore with Ziggo
So I tried your script, works well.
Pings and says in log that IP is up. However gateway monitor still says down. Have changed monitor IP to Google and still says down. I would need to renew wan lease in order to get it back up. Keep in mind that when gateway monitor is down, the script can still ping and says it’s ok.
So I’m confused why on status/interface it says up but in gateway monitor it says down.
Is there a script that would automatically release and renew if either gateway monitor shows down or interface?
Can someone tell me if the patch to dhclient is still needed for 2.4.5? I just upgraded to 2.4.5 and am wondering if I now need to include the patch mentioned at the link below, or if the patch is already included in the 2.4.5 I just installed.
Click on your own link, find the line that says :
Patch to pfSense-dhclient-script was applied on 2.4.5 as well
That was 8 month ago.
Yes, I was uncertain about if this meant it was included in version 2.4.5 or if he was reporting that when he applied the patch to 2.4.5 it worked. I assume from your response it's the former.