dhclient exiting on WAN
-
@stephenw10
Yes. Now with working internet I have situation from the picture below. With scenario 1 I don't have the first line starting with default
I will test a little bit more soon because I want to rule any situation that those problems are caused by my mistakes with configuration. I believe those problems started with ver 2.6/22.01 but I can't be sure - I haven't noticed it previously because I had long uptime and there was no need for turning it off.
-
Mmm, there are a number of threads reporting similar issues. It look to be real IMO but it's not something we can easily replicate. It seems to be timing related.
Steve
-
Scenario 1: from bottom to top :
Scenario 2: from bottom to top :
So, if scenario 2 is what should happen = adding the 'default' route, it is not logged and thus not happening in scenario 1.
Maybe the IP was the same so the "add route default" wasn't needed ?
As far as know, dhclient uses shell scripts to do this. I'll have a look into that this weekend. -
Sorry for late response - I was away for a few days.
@gertjan said in dhclient exiting on WAN:
Maybe the IP was the same so the "add route default" wasn't needed ?
Probably IP was the same. Unfortunately I've lost those logs but currently whenever I renew I get the same WAN IP.
PS. For test I installed pfsense on another device and restored my configuration. Of course it works properly every time. Probably it will take some time to find how it differs when compared with "wrong logs". The main difference I see from the start is that when it works (on new installation) dhclient is able to start properly. So there is nothing like
rc.bootup: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf igb0 > /tmp/igb0_output 2> /tmp/igb0_error_output' returned exit code '1', the output was ''
in the logs. And later I can see standard DHCPDISCOVER (or others) or dhclient's messages about link state like
igb0 link state up -> down igb0 link state down -> up
I guess I'll try to find out what might stop dhclient from starting and throwing
igb0: no link .............. giving up
into /tmp/igb0_output. Of course I'll try to find other differences, but this one looks like the main one.
-
@tomashk said in dhclient exiting on WAN:
I guess I'll try to find out what might stop dhclient from starting and throwing
igb0: no link .............. giving up
into /tmp/igb0_output
You've answered your own question.
The dhclient aborts when there is no WAN interface any more.
It stops executing, as the interface WAN isn't listed as a system resource any more.Your issue isn't dhclient , that does exacly what it is designed to do.
Focus on the WAN interface : make it stopping going down - up - down - up - etc.
The thing is, you know already who brings the interface down : the other device in front of it.
Modems do so very often. It's their way to 'signal' that they created a "soft link" with the upstream ISP.
Or : this scenario : it powers up. The internal web GUI comes up, and the modem brings up it's LAN == the wan of pfSense. When the dhclient starts now, and send a DHCPDISCOVER, it will obtain an RFC1918 from the DHCP server of the modem. This RFC1918 will permit you to connect to the GUI of the modem to change it settings.
Moments later, the modem finished building the soft link with the ISP.
It brings down it's LAN, as it has to as this is the only way to reset the current connection, and brings it again. Now, when the dhclient of pfSense send a DHCPDISCOVER, the modem will be transparent, and the DHCPDISCOVER will reach the ISP. This time, the DHCP server of the ISP will reply, and give you a 'real' WAN IP.
This is how most modems work. For decades now.Things become even more interesting if initially the modem has a hard time syncing with the ISP. Every time it looses the connection, because, for example, the initial speed was to high, it aborts, (bring its LAN down), and tries a connection with a lower speed.
dhclient, on the pfSense side, looses track.
Your mission, as a modem user, is to find the right timings settings.
You know it('s possible, because most often, when you hook up a PC to your modem and use that pppoe connection setting, your PC will connect.
Your PC also uses a dhcp client like software. -
@tomashk said in dhclient exiting on WAN:
For test I installed pfsense on another device and restored my configuration. Of course it works properly every time.
What's the difference between those devices?
The is probably a timing issue. If the upstream device resets the link it may only be down for a short time and that must coincide with the dhclient script running in order to fail.
Running on hardware that is significantly faster or slower could well change the timing enough to prevent it happening.It should not be possible to hit that but it may also be easy to work around by adding a boot delay in pfSense.
Steve
-
@stephenw10 The one failing is based on small Asrock motherboard with Intel Celeron J4005 and working one is hp t620 plus thin client. I believe the second one is taking longer to start.
As you said this is probably a timing issue. The idea with boot delay sounds like good quick fix once I'll try to go back to J4005. For now I'll have to leave it because work doesn't leave me much time to experiment. For a moment I have that temporary new device which works. But in a week or two I am going to get back to it and try to understand it better.
By understand I mean for example that long time ago I looked into https://github.com/pfsense/FreeBSD-src/blob/devel-12/sbin/dhclient/dhclient.c (the file is important - I just took random branch from github as example) and I found the code that gives up (line 469)
Due to value 10 in line 468 and sleep from line 472 dhlient will try to wait for WAN interface only for 10 seconds. I was thinking that maybe it would be good to change value 10 to environment variable (with default value 10) so user can set this, but there are two problems with my current poor understanding:
- I'm not able to predict what would be broken in completely different part of the system if dhclient starts too long (maybe nothing but maybe some kind of the butterfly effect :) )
- there might already exist some system script that should be used for such situations
-
Mmm, I agree increasing that initially seems to be a good idea but could easily cause problems on systems with multiple disconnected WANs.
It would be nice to have that value configurable.
As a simple test you could just set the line:
autoboot_delay="30"
In /boot/loader.conf. That's normally 3s. If that works you might put it in loader.conf.local instead to prevent it getting overwritten.
Steve
-
Quick update. Because I wanted to test the release candidate for 22.05 I took the old device, that had problems described above. After updating to the RC version it still existed. But I used suggested autoboot_delay and it fixed this timing problem. For me 10 seconds were enough to make it work.
-
Anyone still hitting this who can test there is a patch now available here:
https://redmine.pfsense.org/issues/13671#note-4
We need feedback there to confirm it.Steve