Auto-renew DHCP after outage

e4ch

Ok, I've now also created a script. Instead of using external sites, I'm now just checking if the WAN adapter (igb1) has an IPv4 address. If not, it waits 2 minutes and tries again. If it still has no IPv4 address, it issues the same commands as the other scripts: ifconfig down, ifconfig up, dhclient (not sure if the last one is necessary). It does not include rebooting the firewall, because I think this is overkill. Actually I just wanted to force-renew the DHCP client lease, but I couldn't get that to work, although I looked at what the PHP source is doing.

I'm using this script now:

#!/bin/sh
wan="igb1"
LOGFILE=/var/log/pingtest.log

currip=$(ifconfig $wan | grep "inet " | cut -d " " -f 2)
if test -z "$currip"; then
	echo `date +%Y%m%d.%H%M%S` "Detected empty IP on $wan! Will try again in 120 seconds." >> $LOGFILE
	sleep 120
	currip=$(ifconfig $wan | grep "inet " | cut -d " " -f 2)
	if test -z "$currip"; then
		echo `date +%Y%m%d.%H%M%S` "2nd try: Still empty IP on $wan! Will fix now." >> $LOGFILE
		ifconfig $wan down
		sleep 10
		ifconfig $wan up
		sleep 20
		dhclient $wan
		echo `date +%Y%m%d.%H%M%S` "Fixing done!" >> $LOGFILE
	else
		echo `date +%Y%m%d.%H%M%S` "2nd try: $wan has IP $currip; ok" >> $LOGFILE
	fi
else
	echo `date +%Y%m%d.%H%M%S` "$wan has IP $currip; ok" >> $LOGFILE
fi

If you want to use it, you would have to do the following:

Diagnostics / Edit File: Enter file name /usr/local/bin/pingtest.sh and paste the file from above in there and click Save. Update igb1 with the name of your WAN interface. That name is visible in Status / Interfaces in the title. For me it says there: "WAN Interface (wan, igb1)"
Diagnostics / Command: "chmod +x /usr/local/bin/pingtest.sh" (without quotes) and click Execute. This makes the file runnable.
System / Package Manager / Available Packages: Install Cron. To my understanding, this is just the user interface for Cron.
Services / Cron / Settings: Leave the existing packages there and add a new one.
Minute: "/10" (without quotes) or just "" for every minute, but every 10 minutes should be fine and avoids filling up the log.
Other values: "*", User: "root", Command: "/usr/local/bin/pingtest.sh"

From time to time you can take a look at the log file "/var/log/pingtest.log" and maybe delete it to avoid that it's getting too big.

For me this works now if I restart the modem. If even works if I schedule it to every minute; then two scripts will be running at the same time, but due to the 2 minute delay and retest, it works fine too. Without the second test it also worked, but then it already issues a fix while starting up, so it tries to fix it twice. I wanted to avoid that. We'll see how this works in the coming months/years.

For me this is a clear bug in pfSense, as this happens with many different modems and searching through the forum shows that many people have this problem. Even if the modem is the culprit, I still think pfSense should be able to recover, because with other clients it works fine.

bachi_ch

Thanks a lot for posting a workable solution! Same problem here from same provider (UPC, Switzerland).
And yes, I agree that this should be fixed in pfSense (it should be able to automatically overcome such problems), or at least make such a check an option.

tomashk

I have exactly the same problem (UPC, Poland). Also while searching for solution I saw that there was many people with this problem here, on reddit, etc. Thank you for that solution. But still I will try to find different and easier fix too. Maybe it is possible to somehow do this without cron and scripts.

tomashk

One question for people being longer in pfSense community. Because it won't fix by itself. Looking at descriptions from this topic is it enough to create a task in https://redmine.pfsense.org/ ? If not then what should be added? If yes, then should it be a bug (pfSense doesn't react in such a situation) or a feature (detection of such situation should be added).

At the same time, I would be grateful for the suggestion about pfSense code. Could somebody point me the code (if it exists) responsible for the detection of network problems? Even without help, I will check it for myself because this problem is killing me ;) and having some suggestions will make it easier. And if by chance I find even a partial/not perfect solution then I will be more then happy. (I know that here we have a working workaround but it should be addressed in pfSense code)

nkaminski

@e4ch @tomashk
I have seen exactly this behavior where specifically a DHCP-assigined IP is lost for an amount of time equal to the last cached lease time if a DHCP timeout occurs.

I was able to narrow down the root cause to a (confirmed: https://lists.freebsd.org/pipermail/freebsd-net/2019-February/052894.html) bug in the FreeBSD DHCP client, dhclient, as well as an apparent bug in the associated dhclient-script provided with pfSense that both involve handling DHCP protocol timeouts improperly.

I have opened a pfSense bug ticket containing the technical details of my findings as well as a working patch set which addresses this at: https://redmine.pfsense.org/issues/9267

e4ch

Thanks @tomashk! Looks like you found the underlying root cause! I looked at your changes and they sound reasonable, at least the wrong byte that was returned (I don't fully understand the script). Thanks a ton for providing such fixes and posting them at the right place; that helps the maintainers a lot to integrate such fixes more easily into the main branch. Unfortunately, even though you reported this already in January 2019, it doesn't seem to be included in version 2.5 and isn't even in the open issues list, but it is still in the list "new issues". So it might take a while until we can see this in a standard update. There are several threads about this, so this will help many people and I can then finally get rid of my repair script.

nkaminski

@e4ch It is fairly easy to hack this fix in to an existing pfSense install; with the toughest/most involved part being getting dhclient rebuilt. Patching the dhclient-script such that it (correctly) returns nonzero when the default gateway in the cached lease is not pingable is trivially doable with the "system patches" package and the patch from the bug ticket.

The easiest way I have found to build the patched dhclient is to just setup a FreeBSD 11.2 VM, build the patched dhclient and copy over the binary to the pfSense host. This will persist until an update is performed. I am more than willing to share my dhclient binary if desired as well.

You might also be able to use the FreeBSD 12 stable branch dhclient verbatim as well, since such contains the exit status patch, however I haven't tested this personally.

mariyan

I have the same DHCP issue here. ISP is Net1, Bulgaria. Thanks for the script. It helps with mitigating the problem.

ohbobva

@e4ch Thanks for the script. Just what I needed. I modified it to log to system.log, which uses the clog system, so the log won't get big...

#!/bin/sh
wan="em5"

currip=$(ifconfig $wan | grep "inet " | cut -d " " -f 2)
if test -z "$currip"; then
	logger `date +%Y%m%d.%H%M%S` "pingtest - Detected empty IP on $wan! Will try again in 120 seconds."
	sleep 120
	currip=$(ifconfig $wan | grep "inet " | cut -d " " -f 2)
	if test -z "$currip"; then
		logger `date +%Y%m%d.%H%M%S` "pingtest - 2nd try: Still empty IP on $wan! Will fix now."
		ifconfig $wan down
		sleep 10
		ifconfig $wan up
		sleep 20
		dhclient $wan
		logger `date +%Y%m%d.%H%M%S` "pingtest - Fixing done!"
	else
		logger `date +%Y%m%d.%H%M%S` "pingtest - 2nd try: $wan has IP $currip; ok"
	fi
else
	logger `date +%Y%m%d.%H%M%S` "pingtest - $wan has IP $currip; ok"
fi

axxxxe

Thank you @e4ch and @ohbobva! I am also on UPC in Switzerland and the script has fixed it for me.

EricE

@ohbobva @e4ch thanks for your scripts! This just hit my parents pfSense install - stupid cable went out (and of course they insist nothing is wrong) and pfSense did not automatically reconnect the WAN. I had my mom manually refresh the WAN interface and it came right back up. How annoying! Hopefully they can permanently fix it now that the root cause appears to be evident; in the meantime hopefully this script will take care of it the next time they flake out and swear nothing is wrong but then it magically starts working right after they complain.

Gertjan

@EricE : see https://forum.netgate.com/topic/148017/dhcp-client-issue

Brian Smit

Thank You for the scripts!!
Are the scripts still needed in the latest version of PFsense or is this bug fixed?

Regards

Brian

EricE

@Brian-Smit

@Brian-Smit said in Auto-renew DHCP after outage:

Thank You for the scripts!!
Are the scripts still needed in the latest version of PFsense or is this bug fixed?

I still needed them for my parents firewall.

Brian Smit

Then i indeed also keep using them. I didn't have any issues anymore with Ziggo

Wirepower

So I tried your script, works well.
Pings and says in log that IP is up. However gateway monitor still says down. Have changed monitor IP to Google and still says down. I would need to renew wan lease in order to get it back up. Keep in mind that when gateway monitor is down, the script can still ping and says it’s ok.

So I’m confused why on status/interface it says up but in gateway monitor it says down.
Is there a script that would automatically release and renew if either gateway monitor shows down or interface?

axxxxe

Can someone tell me if the patch to dhclient is still needed for 2.4.5? I just upgraded to 2.4.5 and am wondering if I now need to include the patch mentioned at the link below, or if the patch is already included in the 2.4.5 I just installed.

https://redmine.pfsense.org/issues/9267

Gertjan

Click on your own link, find the line that says :

Patch to pfSense-dhclient-script was applied on 2.4.5 as well

That was 8 month ago.

axxxxe

Yes, I was uncertain about if this meant it was included in version 2.4.5 or if he was reporting that when he applied the patch to 2.4.5 it worked. I assume from your response it's the former.

Gertjan

@axxxxe said in Auto-renew DHCP after outage:

uncertain

The patch was initially targeted for 2.5.0, but finally back ported to 2.4.5(-p1).
I had to read the patch story twice also ;)