WAN dhclient (DHCP) issues - bug in time intervals?
-
I’m using a french ISP that has gotten REALLY nitpicky with DHCP v4/v6 Client RFCs and DHCP options. So far I’m only using v4 and I have succeded in making it work, but I have a major issue with lease renewals as pfSense seems to fail this proces.
It’s early stages yet, but the ISP is using more or less standard time settings for renewal, rebinding and expiration, but the client needs to request these by using dhcp options - and it needs to respect them as well.
The DHCP renewal interval is “standard” between 22 and 26 hours. The lease expiration interval is 7 days, and the rebind interval is about 5.6 days. All pretty standard.But here’s the problem: The ISP requires the client to respect the RFC and attempt renewal before or when the renewal time (T1) expires. Otherwise this particular ISP will stop routing traffic and nothing will flow before a full lease release and rebind is done by my pfSense.
My problem is that my dhclient on pfSense (as far as the log says) does not respect the received timers, and as a consequence I’m left with no internet when renewal time has been reached. I then need to login and release/renew my DHCP lease on WAN (or probably wait for 7 days).
I have inspected the /var/db/dhclient.leases.mvneta0 file created when I obtain my lease, and a few things strikes me as odd:
1: The renewal time is recieved as a DHCP option and recorded in the lease. But the calculated RENEWAL timer seems to be calculated from UTC and not my timezone. If I get a 25 hours renewal timer, the renewal time will be 23 hours later because my timezone is GMT+1 (currently two hours ahead of UTC).
2: The rebind timer calculation is all out of whack. If i get a 5.6 days rebind interval, my leases calculation says my rebind interval starts in about 40 hours.
3: The lease expiration timer seems to be calculated correctly - however, like the renewal timer its calculated mistakenly from UTC rather than my timezone, so it expires 2 hours to early.
HOWEVER: My pfsense does not respect those timers, and no renewals is attempted - at least according to my log. There is simply nothing logged by DHCLIENT at either the 23h or the 25h marker where the actual T1 timer expires. A few minutes after that my Internet goes down as the ISP stops routing traffic.
Any ideas why the DHCLIENT does not respect the recieved options?
PS: There is a setting to configure DHCP Client VLAN priority tagging (802.1p), but that does not seem to work. Regardless if I configure it or not, there is nothing about priority tagging in the /var/etc/dhclient.conf file, and as my ISP requires Priority 6, nothing works. If i configure the “option modifier” field to “vlan-pcp 6”, then it is tagged correctly and works (This is also recorded in /var/etc/dhclient.conf).
-
@keyser It would help a great deal to disclose the pfSense version you're using...
-
@nollipfsense said in WAN dhclient (DHCP) issues - bug in time intervals?:
@keyser It would help a great deal to disclose the pfSense version you're using...
Yeah, sorry - 23.01 on SG-2100
I had the same issue on 22.05 after the ISP started requiring this additional RFC compliance, so I just upgraded a few days ago to see if 23.01 made a difference.
The issue remains the same, but there seems to be a difference in the fact that 23.01 starts logging “dhclient: no route to host “ after a while when the routing is “blocked by the ISP” - but no messages about renew attempts or anything else. This makes me suspect dhclient is doing something that is not logged, and perhaps there is more information to gain.This has caused me to start a packetcapture that will run through tommorow to capture any DHCP exchanges after the connection is broken.
-
Hmm, interesting problem.
iscdhcp usually uses UTC for lease times so that's not surprising.I expect the default renewal to be at half the lease time. So for pfSense behind pfSense with it's default 2hr lease it should show as 1hr after it pulled the lease, in UTC:
lease { interface "ix3"; fixed-address 172.21.16.246; next-server 172.21.16.1; filename "ipxe.efi"; option subnet-mask 255.255.255.0; option routers 172.21.16.1; option domain-name-servers 172.21.16.1; option host-name "6100"; option domain-name "stevew.lan"; option dhcp-lease-time 7200; option dhcp-message-type 5; option dhcp-server-identifier 172.21.16.1; renew 2 2023/5/16 19:57:55; rebind 2 2023/5/16 20:42:55; expire 2 2023/5/16 20:57:55; }```
-
@stephenw10 Yeah. As you can see the contents of my dhclient.leases file is quite “interesting” on regards to timing.
(Note: edited for removal of identifying info)lease { interface "mvneta0.832"; fixed-address 90.xxx.xxx.xxx; next-server 80.xxx.xxx.xxx; option subnet-mask 255.255.248.0; option routers 90.xxx.xxx.xxx option domain-name-servers 81.xxx.xxx.xxx,80.xxx.xxx.xxx; option host-name "paradis"; option broadcast-address 90.xxx.xxx.xxx; option dhcp-lease-time 604800; option dhcp-message-type 5; option dhcp-server-identifier 80.xxx.xxx.xxx; option dhcp-renewal-time 92354; option dhcp-rebinding-time 483840; option dhcp-client-identifier 1:48:29:52:xx:xx:xx; option option-90 xxxxxxxx:69:76:65:62:xxxxxxxxx; option domain-search "TLN.access.orange-multimedia.net."; option option-125 0:0:5:58:c:1:a:0:1:0:0:0:0:0:0:0:0; renew 3 2023/5/17 16:16:26; rebind 4 2023/5/18 11:30:48; expire 2 2023/5/23 14:37:12; }
This is the first lease in the file after I started over. Tommorrow the dhclient will add another lease to the file (the new one I get when releasing and renewing). After that dhclient will always perform two rebinds whenever i release/renew. That is another “quirk” I have noticed - I would have expected it to only perform one.
Both leases are added to the file even though it’s the same IP/interface and all. -
What time (in UTC and local) was that lease handed to it?
-
@stephenw10 2023/5/16 at 16:37 local time (14:37 UTC)
-
What does it log as the renewal time in the dhcp log when it first pulls the lease?
-
@stephenw10 The same 92354 seconds as written i the lease file.
-
@stephenw10 About the double DHCP request at startup once there is two leases in the lease file:
At startup it logs this rather long error message - perhaps that is related to trying two times?:
/status_interfaces.php: The command '/usr/local/sbin/dhclient {$ipv} -d -r -lf '/var/db/dhclient.leases.mvneta0.832' -cf '/var/etc/dhclient_wan.conf' -sf '/usr/local/sbin/pfSense-dhclient-script'' returned exit code '1', the output was 'Internet Systems Consortium DHCP Client 4.4.3-P1 Copyright 2004-2022 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ /var/etc/dhclient_wan.conf line 7: no option named dhcp-class-identifier in space dhcp \x09send dhcp-class-identifier "sagem" ^ /var/etc/dhclient_wan.conf line 8: semicolon expected. \x09send ^ /var/etc/dhclient_wan.conf line 10: no option named option-90 in space dhcp \x09send option-90 00: ^ /var/etc/dhclient_wan.conf line 11: semicolon expected. \x09request ^ /var/etc/dhclient_wan.conf line 12: semicolon expected. \x09vlan-pcp 6; ^ /var/etc/dhclient_wan.conf line 14: semicolon expected. \x09script ^ /var/db/dhclient.leases.mvneta0.832 line 22: expecting lease declaration. next-server ^ /var/db/dhclient.leases.mvneta0.832 line 23: expecting semicolon. option ^ /var/db/dhclient.leases.mvneta0.832 line 34: no option named option-90 in space dhcp option option-90 0: ^ /var/db/dhclient.leases.mvneta0.832 line 36: no option named option-125 in space dhcp option option-125 0: ^ Listening on BPF/mvneta0.832/48:29:52:25:2c:50 Sending on BPF/mvneta0.832/48:29:52:25:2c:50 Can't attach interface {} to bpf device /dev/bpf0: Device not configured If you think you have received this message due to a bug rather than a configuration issue please read the section on submitting bugs on either our web page at www.isc.org or in the README file before submitting a bug. These pages explain the proper process and the information we find helpful for debugging. exiting.'
-
@stephenw10 Perhaps my DHCP WAN settings are relevant as well:
The full text in Send Options:
dhcp-class-identifier "sagem",dhcp-client-identifier 01:48:29:52:xx:xx:xx,user-class "+FSVDSL_livebox.Internet.softathome.Livebox4",option-90 xxxxxxxxxxx:72:77:67:66:xxxxxxxxxxxxxxxx
The full text in Request Options:
subnet-mask,broadcast-address,dhcp-lease-time,dhcp-renewal-time,dhcp-rebinding-time,domain-search,routers,domain-name-servers,option-90,option-125
Some edits have been made to obscure identities (xxxx’s)
-
Hmm. So if the client doesn't send those request options it never gets a lease?
-
@stephenw10 No, they require all the options, and they require the client to respect the RFCs around them (even though their own server does not respect the RFC).
Obviously all a play to make it increasingly harder to use any equipment but their own livebox equipment (which they do not allow to be placed in bridgemode). So no public IP on your own equipment unless you go through all these hoops to get your own box to behave like theirs.
-
So what happens if you don't send all the request option?
What happens if you request a shorter lease?
-
@nollipfsense said in WAN dhclient (DHCP) issues - bug in time intervals?:
It would help a great deal to disclose the ....
ISP ? Orange ?
Fibre ? ADSL ? Box used ?Never mind : it's encoded : "Livebox4" so Orange and probably ADSL.
Fiber needs a version 5 box. Or the "6", I've one. ( but I'm not planning to remove ISP Orange Livebox, imo to much of a hassle )For more support, I would advice you to dig into https://lafibre.info/
@stephenw10 said in WAN dhclient (DHCP) issues - bug in time intervals?:
Hmm. So if the client doesn't send those request options it never gets a lease?
Yep. DHCP is used to get the IPv4, DHCPv6 is used to get all the IPv6 info, and options are used to identify the 'box', and send authentication/identification.
Par example : https://lafibre.info/remplacer-livebox/durcissement-du-controle-de-loption-9011-et-de-la-conformite-protocolaire/ to mention just one.
-
@gertjan Hi @Gertjan
Thanks for chiming in. Yes, the ISP is Orange and yes, I have read a million posts on lafibre.info (though that’s tough when google translate is your only option)
I have had it running for about a year and a half without issues, but this winter Orange started tightening up the DHCP RFC requirements on the DHCP exchange, and I stopped being able to renew my IP.
I’m on fibre, and I replaced a Livebox5 with my SG-2100 (and a 4100 for a while). I have a fs.com ONT SFP module inserted, and it works flawlessly (https://www.fs.com/de-en/products/133619.html) -
@gertjan Maybe I should mention I never had IPv6 working because the DHCPv6 client in pfSense does not support dhcp RAW options - at least in 22.05.
But I only need IPv4 at this stage as the site is tied into a v4 IPsec infrastructure in a country that has very limited/no v6 support/availability. -
@stephenw10 It takes a while to do fault finding because I have to wait about 24 hours for the problem to occur. Also I’m only first now on-site to do proper diagnosing.
Later today when it fails again I will have a packet capture of DHCP exchanges when the issue occurs.
From then on I can start testing workarounds and fixes. But mainly I’m interested in keeping as many of the settings they use in their own livebox as possible. I find it likely they will continue tightning the DHCP exchange to make it more and more difficult to use equipment other than the supplied but crappy livebox.
For now it’s a major advantage for me if I can have the IPv4 public IP on my pfSense WAN instead of doing double NAT and miss the ability to remotely contact the box directly (IPsec/management) -
@gertjan About lafibe.info:
Seems there is a huge userbase in France that uses Mikrotik equipment for replacing the Livebox.
There used to be quite a pfSense gathering as well, but they have all migrated to OPNSense because the DHCPv6 client in OPNsense is much more flexible and supports RAW options. There are a few that stayed on pfSense but hacked it by replacing the DHCPv6 client with the OPNSense binary - but that is not without issues…. -
@keyser said in WAN dhclient (DHCP) issues - bug in time intervals?:
Orange started tightening up the DHCP RFC requirements on the DHCP exchange, and I stopped being able to renew my IP.
What I make of it ( I'm not French at all, but I can read 'french' very well ) : people still manage to connect, but the rules are more strict.
@keyser said in WAN dhclient (DHCP) issues - bug in time intervals?:
because the DHCPv6 client in pfSense does not support dhcp RAW options - at least in 22.05
Create you own /root/att-rg-dhcpv6-pd.conf file : the dhcpd6 config file, and you have full control ;) You only need to know how to create such a file : you have to look up the options yourself.
That worked for me to obtain more then one prefix from my Livebox 6 (but still, the second was not operational, not routed).@keyser said in WAN dhclient (DHCP) issues - bug in time intervals?:
replacing the DHCPv6 client with the OPNSense binary - but that is not without issues
As long as the ABi or FreeBSD kernel version is the same, changes are create it works.
I'm not sure if it needs a "non standard" dhcpd6 client version.
You'll lose the full GUI control .... but we never had full GUI control anyway, just look at all the Ipv4 option for dhcp-client : if you need them, you have to implement them manually with options or go for the : -
@gertjan & @stephenw10
Okay, I have learned i lot so far:- PfSense does respect the renewal timer and sends a request when the renewal interval expires.
But i accidentally created my packetcapture on WAN (which is mvneta0.832) instead of the RAW mvneta0 interface, so while the renewal requests were sent, I could not see if they were 802.1p priority 6 tagged as they should be.
I cannot find a way to manually attempt to renew the DHCP lease without releasing it first (which leads to a discover instead of a request).
The closest thing I found was “/sbin/dhclient -l /var/db/dhclient.lease.mvneta0.832 mvneta0.832”
That command does trigger a dhcprequest, but it is broadcast instead of unicasted to the DHCP server, and it is critically NOT 802.1p priority 6 tagged.So my main suspicion now is that while DHCP discover packets are 802.1p priority 6 tagged (because of “vlan.pcp 6” in modifiers), the lease renew packets are not. Which would explain why Orange ignores them.
This then leads me to a bug I discovered in the packet capture UI dialog of pfsense. I cannot create a capture with a protocol or port filter that captures VLAN tagged packets on the RAW interface. I can only capture VLAN tagged packets if I do not filter for anything.
That makes it “impossible” for me to capture the actual renewal attempt tomorrow and see if it is VLAN priority tagged. My only option is to calculate approximate renewal time and prevent any clients from talking in that period, and then manually capture everything, and hope the size is “workable”, so I get the actual renew attempt. -
Probably better running a pcap at the CLI for something like that. In 23.01 at least. 23.05 has a whole new interface to allow that.
Have you tried setting pcp 6 on the VLAN intself? That would send all traffic priority tagged but that probably doesn't matter.
Steve
-
@stephenw10 Okay, but how would the command line version of a tcpdump on mvneta0 look if it should filter so only UDP packets with ports 67/68 in use are captured (regardless of having VLAN tags on them)? My intial try shows the same behavior as the UI - standard filters ignores VLAN tagged frames.
-
I would try:
tcpdump -eni ix3 -c 1000 -U '((udp) and (port 67 or port 68)) or ((vlan and (udp) and (port 67 or port 68)))'
-
@stephenw10 I’ll give it a spin. When done from the CLI, do I have to keep the SSH session open to avoid it being killed if I’m disconnected or my ssh client goes to sleep?
Or should I do at the console to allow it to run?
How do i stop a cli driven tcpdump? -
Okay - I’m still waiting for my first renew attempt after changing thing around, but it seems it’s VERY likely a missing COS6 tagging of DHCPv4 Renew frames that is the culprit.
I found this thread on OPNsense’s forum (i check there because OPNsense is used a lot more in france because they are quicker and more flexible with DHCP client issues and options):
https://forum.opnsense.org/index.php?topic=33376.0
Very clearly the same issue, and clearly a floating rule with a match to change the VLAN COS tagging on renew frames is the solution. I have just implemented my rule now, and tonight at renew time we will know if this is the same bug in the DHCP Client (which is now patched on OPNsense).
-
stephenw10 Netgate Administratorlast edited by stephenw10 May 18, 2023, 9:23 PM May 18, 2023, 9:22 PM
Hmm, well it will be interesting to see if that works. It 'feels' like there might be a separate dhclient option for the renews.
I'd also be interested in knowing if just setting the PCP tag on the VLAN fixes it. -
@stephenw10 said in WAN dhclient (DHCP) issues - bug in time intervals?:
Hmm, well it will be interesting to see if that works. It 'feels' like there might be a separate dhclient option for the renews.
I'd also be interested in knowing if just setting the PCP tag on the VLAN fixes it.So, after following the renew attempt last night and analyzing the packetcapture of the process, two things seems obvious:
1: The pfSense DHCP Client renews does not use and follow the "vlan-pcp 6" modifier that I have configured on WAN. Only Lease releases and DHCP discovery is priority tagged properly. Renew attempts are tagged with 0 = best effort. So I'm now 99,9% sure that's why I'm unable to renew my DHCP release. Orange clearly states it is required, and the OPNsense forum also shows lots of people with the same issue, that fixed it by priority tagging the renew process with priority 6.
2: My attempt at having a floating rule match and set the vlan priority 6 tag on renews did not work. Regardless of what I tried, no packets where ever matched with my attempted floating rule. I might not fully understand how to create the rule properly, but it seems quite simple, yet it didn't work. Is there a "loophole" where packets originating from an actual daemon on pfsense itself is not passed through the firewall rules?
I first created a match IPv4 rule with source "firewall (self)" UDP 68 to destination any port 67, and direction out on both the RAW and my vlan 832 tagged WAN interface. I set the match rule to apply VLAN priority tag 6. Didn't work.
I then opened the rule up with source any port any - didn't work
lastly I enabled Quick even though that should not be needed as I understand it - Didn't work.
Nothing was ever matched by by floating rule.Any idea if I'm doing it wrong?
Any idea's on how to get the DHCP client to respect the configured VLAN priority tag on renews as well? This should probably be considered an actual bug, so I'll create a redmine on that later today.
-
@stephenw10 said in WAN dhclient (DHCP) issues - bug in time intervals?:
Hmm, well it will be interesting to see if that works. It 'feels' like there might be a separate dhclient option for the renews.
I'd also be interested in knowing if just setting the PCP tag on the VLAN fixes it.Hmm loking at /tmp/rules.debug I’m suspecting my VLAN priority set is newer applied because the built-in web-configurator rules has a quick pass rule for dhcp requests out of WAN that are higher up in the rules.debug file that probably invalidates my match rule?
-
Yes if it's a 'quick' rule it will override anything below it so the match won't happen.
Setting the tag on the VLAN should apply to all traffic leaving it though so I'd expect that to work. -
@stephenw10 said in WAN dhclient (DHCP) issues - bug in time intervals?:
Yes if it's a 'quick' rule it will override anything below it so the match won't happen.
Setting the tag on the VLAN should apply to all traffic leaving it though so I'd expect that to work.Yeah But not really a good solution as the ISP severely throttles the amount of COS6 traffic allowed compared to the fibers actual throughput.
-
@keyser said in WAN dhclient (DHCP) issues - bug in time intervals?:
How do i stop a cli driven tcpdump?
Ctrl+C
-
@stephenw10 FYI: https://forum.netgate.com/topic/180212/how-to-hack-built-in-dhcp-client-pfrule
-
@keyser said in WAN dhclient (DHCP) issues - bug in time intervals?:
vlan.pcp
Ah, OK I see, it's because the renewals are unicast and don't use the bpf rule. So, yes something similar is required there. Set the tagging on the pf pass-out rule if they are enabled in the dhclient.
Let's see...