DHCP weirdness after 23.09 upgrade
-
I upgraded to 23.09 again and see the same result. I compared the dhcpd.conf between versions and they are exactly the same. By now I'm also pretty sure I messed up because
ps -axw | grep dhcp
returns
/usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid igb5 igb4 igb3
on 23.09 as well.
I'll summarize what I'm seeing again:
- most clients seem to require more DHCPREQUEST/DHCPACK cycles
- 1 client (Windows Server 2019) fills the log with over 2k log lines
- 1 client (Dyson PH04) gives up and loses the IP address
I've checked the generated firewall ruleset and it correctly contains the DHCP server rules for 67/68.
I'm really baffled. What are my other options. Complete new install and restore config? Anything else I can check?
-
@mathiasringhof Saw your Reddit post, echoing my response here.
On a 2100 after updating to 23.09 I am seeing similar. I did not save logs from before however. It does seem to vary on different devices? This one is triggering this pattern every 10 minutes:
Nov 17 22:16:40 dhcpd 38701 DHCPACK on 10.0.0.93 to 60:e8:x:x:x:1c (airthings-view) via mvneta1 Nov 17 22:16:40 dhcpd 38701 DHCPREQUEST for 10.0.0.93 (10.0.0.1) from 60:e8:x:x:x:1c (airthings-view) via mvneta1 Nov 17 22:16:40 dhcpd 38701 reuse_lease: lease age 35402 (secs) under 25% threshold, reply with unaltered, existing lease for 10.0.0.93 Nov 17 22:16:40 dhcpd 38701 DHCPOFFER on 10.0.0.93 to 60:e8:x:x:x:1c (airthings-view) via mvneta1 Nov 17 22:16:40 dhcpd 38701 DHCPDISCOVER from 60:e8:x:x:x:1c (airthings-view) via mvneta1 Nov 17 22:16:40 dhcpd 38701 reuse_lease: lease age 35402 (secs) under 25% threshold, reply with unaltered, existing lease for 10.0.0.93 Nov 17 22:16:39 dhcpd 38701 DHCPACK on 10.0.0.93 to 60:e8:x:x:x:1c (airthings-view) via mvneta1 Nov 17 22:16:39 dhcpd 38701 DHCPREQUEST for 10.0.0.93 from 60:e8:x:x:x:1c (airthings-view) via mvneta1 Nov 17 22:16:39 dhcpd 38701 reuse_lease: lease age 35401 (secs) under 25% threshold, reply with unaltered, existing lease for 10.0.0.93 Nov 17 22:16:39 dhcpd 38701 DHCPACK on 10.0.0.93 to 60:e8:x:x:x:1c (airthings-view) via mvneta1 Nov 17 22:16:39 dhcpd 38701 DHCPREQUEST for 10.0.0.93 from 60:e8:x:x:x:1c (airthings-view) via mvneta1 Nov 17 22:16:39 dhcpd 38701 reuse_lease: lease age 35401 (secs) under 25% threshold, reply with unaltered, existing lease for 10.0.0.93 Nov 17 22:16:39 dhcpd 38701 DHCPACK on 10.0.0.93 to 60:e8:x:x:x:1c (airthings-view) via mvneta1 Nov 17 22:16:39 dhcpd 38701 DHCPREQUEST for 10.0.0.93 from 60:e8:x:x:x:1c (airthings-view) via mvneta1 Nov 17 22:16:39 dhcpd 38701 reuse_lease: lease age 35401 (secs) under 25% threshold, reply with unaltered, existing lease for 10.0.0.93 Nov 17 22:16:39 dhcpd 38701 DHCPACK on 10.0.0.93 to 60:e8:x:x:x:1c (airthings-view) via mvneta1 Nov 17 22:16:39 dhcpd 38701 DHCPREQUEST for 10.0.0.93 from 60:e8:x:x:x:1c (airthings-view) via mvneta1 Nov 17 22:16:39 dhcpd 38701 reuse_lease: lease age 35401 (secs) under 25% threshold, reply with unaltered, existing lease for 10.0.0.93 Nov 17 22:16:38 dhcpd 38701 DHCPACK on 10.0.0.93 to 60:e8:x:x:x:1c (airthings-view) via mvneta1 Nov 17 22:16:38 dhcpd 38701 DHCPREQUEST for 10.0.0.93 from 60:e8:x:x:x:1c (airthings-view) via mvneta1 Nov 17 22:16:38 dhcpd 38701 reuse_lease: lease age 35400 (secs) under 25% threshold, reply with unaltered, existing lease for 10.0.0.93
Another, an old Wii, is every 5 minutes but only one attempt.
I tried undoing my custom lease time (2 days) to no avail. Kea seems to not have this repeating log spam issue, after watching the logs for 15-20 minutes or so, however, the device is still requesting the renewal every 10 minutes.
I do seem to recall reading somewhere that more detailed logging is enabled in Kea, currently. (?)
After changing to Kea, which started Kea, Server/DHCP Server/LAN showed a warning that Kea needed to be restarted (which I did). Kea log has two entries (note the :56 timestamp):
Nov 17 22:56:38 kea-dhcp4 80687 INFO [kea-dhcp4.leases.0x49c69c616600] DHCP4_LEASE_ALLOC [hwtype=1 60:e8:x:x:x:1c], cid=[no info], tid=0xc2fcc550: lease 10.0.0.93 has been allocated for 172800 seconds Nov 17 22:56:38 kea-dhcp4 80687 INFO [kea-dhcp4.dhcpsrv.0x49c69c616600] EVAL_RESULT Expression pool_lan_0 evaluated to 1 Nov 17 23:06:38 kea-dhcp4 80687 INFO [kea-dhcp4.leases.0x49c69c616600] DHCP4_LEASE_ALLOC [hwtype=1 60:e8:x:x:x:1c], cid=[no info], tid=0x8197618b: lease 10.0.0.93 has been allocated for 172800 seconds Nov 17 23:06:38 kea-dhcp4 80687 INFO [kea-dhcp4.dhcpsrv.0x49c69c616600] EVAL_RESULT Expression pool_lan_0 evaluated to 1
However that is only two entries per renewal not 20+.
-
@SteveITS Thank you!
I've created a Redmine as a couple of people on reddit are seeing similar behaviour so it might be something a bit more common and looking into (i.e. not some stupidity with my config, at least hopefully):
https://redmine.pfsense.org/issues/15011 -
@mathiasringhof I don’t know another non-business setting I can look at with pfSense and multiple IOT type devices. It’s quite possible they just ignore the lease time and do whatever they want. And Kea might log that differently. Windows Server we set as the DHCP server or a static IP so none there.
Looking at ISC logs before an upgrade to 23.09 would be informative I think, to compare. Otherwise we (I) may be assuming the logging is different.
-
@SteveITS Luckily while the IoT device is the one losing the IP address the biggest offender right now is a Windows Server 2019 so it's not just a "IoT device has stupid implementation" although that might be part of it.
I've added a comparison log to the red mine where one can see that with 23.05 the same server just issues a couple of messages in 24h vs thousands.
In addition, the screenshot in the first post should also give a good indicator that as soon as you upgrade my installation to 23.09 there is a significant jump in log volume.
-
@mathiasringhof in my head I guess I was thinking, is it a logging issue or a device issue but just logging wouldn’t explain the devices losing leases as described.
A better way to say it is the 5-10 minute lease renewal time I saw (vs 1 day) may be a red herring but the repeated requests every 10m is the behavior change, right?
We usually set up with a RAM disk anyway, but otherwise a busy network could have a lot of extra logging I/O which isn’t ideal for eMMC.
-
@SteveITS depending on the device it's more than every 10 minutes. Look at an excerpt from the Gist linked in the Redmine issue:
2023-11-17T11:55:22.733+01:00 DHCPREQUEST for 192.168.50.31 from ce:3c:e8:92:d2:fa via igb3 2023-11-17T11:55:22.734+01:00 DHCPACK on 192.168.50.31 to ce:3c:e8:92:d2:fa via igb3 2023-11-17T11:55:23.734+01:00 DHCPACK on 192.168.50.31 to ce:3c:e8:92:d2:fa via igb3 2023-11-17T11:55:23.734+01:00 DHCPREQUEST for 192.168.50.31 from ce:3c:e8:92:d2:fa via igb3 2023-11-17T11:55:25.734+01:00 DHCPACK on 192.168.50.31 to ce:3c:e8:92:d2:fa via igb3 2023-11-17T11:55:25.734+01:00 DHCPREQUEST for 192.168.50.31 from ce:3c:e8:92:d2:fa via igb3 2023-11-17T11:55:42.735+01:00 DHCPACK on 192.168.50.31 to ce:3c:e8:92:d2:fa via igb3 2023-11-17T11:55:42.735+01:00 DHCPREQUEST for 192.168.50.31 from ce:3c:e8:92:d2:fa via igb3 2023-11-17T11:55:43.735+01:00 DHCPACK on 192.168.50.31 to ce:3c:e8:92:d2:fa via igb3 2023-11-17T11:55:43.735+01:00 DHCPREQUEST for 192.168.50.31 from ce:3c:e8:92:d2:fa via igb3 2023-11-17T11:55:44.735+01:00 DHCPACK on 192.168.50.31 to ce:3c:e8:92:d2:fa via igb3
vs this is the whole day under 23.05.1 and expected behaviour with the default lease time (7200 seconds = 2h = every 1h the client asks):
Link -
-
@mathiasringhof I think we’re talking about the same thing…the device requests the renewal at some point, be it a few minutes or several hours, but there’s a cycle of several to many, many retries each time it does.
IOW I doubt it’s the device getting a 2 second lease. On your Server ipconfig /all will show you the lease expiration IIRC.
-
Looks like we found something, see the Redmine issue. The binary shipped in 23.09 causes an issue. Running assumption is that it responds on random ports to DHCPREQUEST which seems to cause issues with some clients if they reject anything not coming from port 67.
Replacing the binary with the one from 23.05.1 resolved the issue for me.
-
@mathiasringhof yeah just did a quick sniff.. and see this
15:39:55.151294 IP 192.168.4.81.68 > 192.168.4.253.67: UDP, length 304 15:39:55.151592 IP 192.168.4.253.18184 > 192.168.4.81.68: UDP, length 300
That doesn't seem right, the reply should be from 67.. not some random port 18184..
The client sure doesn't seem to like it
15:39:55.151294 IP 192.168.4.81.68 > 192.168.4.253.67: UDP, length 304 15:39:55.151592 IP 192.168.4.253.18184 > 192.168.4.81.68: UDP, length 300 15:40:58.682252 IP 192.168.4.81.68 > 192.168.4.253.67: UDP, length 304 15:40:58.682557 IP 192.168.4.253.18184 > 192.168.4.81.68: UDP, length 300 15:41:04.499847 IP 192.168.4.65.68 > 192.168.4.253.67: UDP, length 308 15:41:04.500249 IP 192.168.4.253.18184 > 192.168.4.65.68: UDP, length 300 15:42:03.221195 IP 192.168.4.81.68 > 192.168.4.253.67: UDP, length 304 15:42:03.221489 IP 192.168.4.253.18184 > 192.168.4.81.68: UDP, length 300 15:42:04.507527 IP 192.168.4.65.68 > 192.168.4.253.67: UDP, length 308 15:42:04.507775 IP 192.168.4.253.18184 > 192.168.4.65.68: UDP, length 300
-
Tagging @stephenw10.
The extra logging isn’t really a breaking change but clients not getting a lease is.
Most people, if they even notice this, can use Kea, but not everyone since it’s not feature complete/ready yet.
-
@SteveITS I just switched to binary from 23.05.1 just waiting for something to renew.. But log already looks better.. 3 clients renewed before I got the sniff going..
2 of the problem childred I was seeing from 4.81 and 4.63 have now renewed it seems
Nov 19 10:12:37 dhcpd 36201 DHCPACK on 192.168.4.63 to 68:57:2d:98:28:22 via igb2.4 Nov 19 10:12:37 dhcpd 36201 DHCPREQUEST for 192.168.4.63 from 68:57:2d:98:28:22 via igb2.4 Nov 19 10:12:03 dhcpd 36201 DHCPACK on 192.168.4.81 to a8:1b:6a:24:ec:26 via igb2.4 Nov 19 10:12:03 dhcpd 36201 DHCPREQUEST for 192.168.4.81 from a8:1b:6a:24:ec:26 via igb2.4
edit: ok couple of devices renewed while I was sniffing - this looks more normal
16:34:01.236228 IP 192.168.4.73.68 > 192.168.4.253.67: UDP, length 308 16:34:01.236487 IP 192.168.4.253.67 > 192.168.4.73.68: UDP, length 300 16:34:53.173058 IP 192.168.4.49.68 > 192.168.4.253.67: UDP, length 302 16:34:53.173367 IP 192.168.4.253.67 > 192.168.4.49.68: UDP, length 300
-
-
-
-
I assume this file is the same across most of the various hardware.
I have a SG-1100 backup that is currently on 23.05.1, If the file is the same I could pull it from it. If it is hardware specific, then I will have to jump back to 23.05.1 on my SG-5100 to pull it.
Do you happen to know?
Phizix
-
@Phizix the 1100 and 2100 and 3100 are Arm CPUs so that won’t work. If you had reinstalled at some point and have ZFS you could revert to an old boot environment temporarily.
-
@Phizix said in DHCP weirdness after 23.09 upgrade:
How does one replace the binary with the one from 23.05.1?
For me I just booted the zfs 23.05.1 on my VM, grabbed the file via sftp.. Then stopped dhcpd on my sg4860, copied the current dhcpd to dhcpd.old and then via sftp put the dhcpd from my vm on my 4860 and then restarted it..
I am going to have to do a bit of research - but I could understand why some clients might not like an answer from some random port when it sent to 67.. Have to fire up the rfc and see if client is just not up to date on what is should do.. It is odd, and if doesn't like it renew why does it take it on discover. Or is the discover coming from 67? Very interesting situation for sure.
edit:
It was actually a nice test of booting to an old zfs.. which worked out great. -
My SG-5100 is on ZFS, so I should be able to boot to the previous 23.05.1 environment and grab it.
I did not know if all the files were hardware specific or not.
I don't think it will be too hard, but if it had not been hardware specific then it would have almost been trivial to grab it.
Thanks for the response!
Phizix
-
@johnpoz said in DHCP weirdness after 23.09 upgrade:
It was actually a nice test of booting to an old zfs.. which worked out great.
The boot environment feature is so clutch. Just awesome
-
I'm seeing similar problems while applying "DHCP enforcement" through my wifi infrastructure - the controllers aren't recognizing the DHCP transactions and are kicking clients off even after they receive DHCP leases. Switching to Kea isn't an option for me yet since I'm running redundant pairs. The behavior is reliably reproducible on 23.09 and doesn't exist on 23.05.1.