Sending DHCP NACK by default to out-of-range "requested IP address"

streamholder

Hello,
I'm managing a Windows network with a pfSense SG-2440.
Since I have enough devices to make static IP addresses difficult and not enough to justify an IP address management solution, I'm using DHCP+DNS to address the computers on the network with fixed names.
We are making changes to the network to interconnect various sites via VPN, and sometimes the need to change the IP address range will arise.

Last week, during a test, I noticed that no matter what I did (I tried very very hard, just short of reinstalling the OS, trust me!), if a Windows machine had previously received an IP address in a subnet, and the router sends a DHCPOFFER with an address in the same subnet, the Windows machine will send a DHCPREQUEST with its previous IP address and will, no matter what happens, use that IP address it requests unless the DHCP server explicitly NACKs it, and only in this case, the Windows machine will accept the offered address.
Now, I initially thought this was a problem of Windows machines, but then I realized that it is a fully compliant behaviour. After all, NACK messages exist for a reason.
I do not know which DHCP server pfSense uses, but I'm pretty sure that, for example, ISE has the option to send out NACKs to addresses out of range, by setting the server as authoritative.
Shouldn't such an option be added to whatever pfSense uses as well?

Note: right now, as a workaround, I'm creating additional IP address ranges spanning the rest of the subnet, setting them to deny all requests, and this works, so I encourage everyone that has this problem (more people than I expected, as per Google) to temporarily solve it this way. But it cannot be the "official" solution! this is nothing more than a workaround and should not deter the developers to add a proper option, even if it would be something like automagically setting up the aforementioned workaround.

JKnott

Fire up Wireshark and see what's happening. If the computer had an address previously and is just trying to renew, then it requests from it's DHCP address. If it's new to the network or the lease has expired, it's supposed to use 0.0.0.0 as it's source address. I don't think it's ever allowed to use an address after receiving a NAK. Here's some info from https://documentation.meraki.com/zGeneral_Administration/Tools_and_Troubleshooting/DHCP_NAK

A DHCP NAK is a negative acknowledgment from the DHCP server. The server will send a DHCP NAK in the following scenarios:

Requested address from possibly the same subnet but not in the address pool of the server. This can be the failover scenario in which 2 DHCP servers are serving the same subnet so that when one goes down, the other should not NAK to clients which got an IP from the first server.
Requested address is on a different subnet.
Requested address is already in use by another client.

So, it appears Windows is not working properly. Given my experience with networking & Windows, I'm not surprised. In my work, I often have to connect my computer to a customer's network. DHCP in Windows works so poorly, I often have to use a static configuration, as DHCP will not get a valid address for that network.

streamholder

@JKnott:

Fire up Wireshark and see what's happening. If the computer had an address previously and is just trying to renew, then it requests from it's DHCP address. If it's new to the network or the lease has expired, it's supposed to use 0.0.0.0 as it's source address. I don't think it's ever allowed to use an address after receiving a NAK.

Yep, already done! Sorry if I didn't explain myself clearly but what you described in the first half of the quote is what's happening: Windows is sending the request using its old DHCP address (which no longer resides in any on the pfSense pools), and it is not using that IP address anymore after receiving a NAK (and I can force that to happen by employing the workaround I described). BUT pfSense does not send the NAK without the workaround, which is the problem!

@JKnott:

Here's some info from https://documentation.meraki.com/zGeneral_Administration/Tools_and_Troubleshooting/DHCP_NAK

A DHCP NAK is a negative acknowledgment from the DHCP server. The server will send a DHCP NAK in the following scenarios:

Requested address from possibly the same subnet but not in the address pool of the server. This can be the failover scenario in which 2 DHCP servers are serving the same subnet so that when one goes down, the other should not NAK to clients which got an IP from the first server.
Requested address is on a different subnet.
Requested address is already in use by another client.

Yeah, what I have highlighted is exactly what I said pfSense should do and does not do (point #1). That is exactly the problem, pfSense does not NAK a request from an address that is out-of-pool, it just ignores it thus allowing Windows to keep the address (and this, afaik, is perfectly valid as per protocol).

@JKnott:

So, it appears Windows is not working properly. Given my experience with networking & Windows, I'm not surprised. In my work, I often have to connect my computer to a customer's network. DHCP in Windows works so poorly, I often have to use a static configuration, as DHCP will not get a valid address for that network.

While I agree that Windows can behave strangely with DHCP, as I've pointed out, this time it really seems to be pfSense's own fault. It's just that Linux and FreeBSD's most used DHCP clients behave differently in this particular case, but it does not mean that Windows behaves incorrectly.

kpa

PfSense uses the de facto DHCP server that just about everyone else is using, the ISC DHCPD. I feel that this is a problem that should be raised at upstream because at least I can't find any way to achieve what you're after using the options from the documentation.

streamholder

@kpa:

PfSense uses the de facto DHCP server that just about everyone else is using, the ISC DHCPD. I feel that this is a problem that should be raised at upstream because at least I can't find any way to achieve what you're after using the options from the documentation.

That makes sense, and I will probably do that. Can you confirm that what I'm seeing is in fact different from the behaviour described in the Cisco document shared by JKnott? Just to have a second pair of eyes.

kpa

It certainly looks like the DHCP server should NAK out of pool requests so yes it's different.

johnpoz

A dchp server would only nak if

"unless the address is incorrect for the network segment to which the client has been attached and the server is authoritative for that network segment, in which case the server will send a DHCPNAK even though it doesn't know about the address.

If you have a dhcp server set with say a pool of .100 to .150.. And the client requests .50 why should it send a NAK?? What if there is another dhcp server on the network handling that part of the network pool. For a failover scenario..

Where did the client get this IP from that is asking for it again? If its the same network as the dhcp server currently seeing the request?

If there is a reservation for the client, and it request a different IP, then the dhcp server should send a NAK..

I would like to duplicate what your seeing.. So what version of windows are you using?

So client got an ip address from dhcp server, lets call it 192.168.1.50..
How did this client then move to being on the pfsense network where the pool is only 192.168.1.100 to .150?
Did you just plug it into a different network, did it change wifi networks? Or did you just alter the pool which before included the .50 and now it doesn't? And the client never went anywhere?

Your saying this client which had a lease for .50, and is asking for it again. And not getting a NAK just continues to ask for it?? And never switches to send a discover?

JKnott

Perhaps a capture of what happens is in order.

johnpoz

yup with understand where the client got the address its requesting as well. And what the details of the current pool are, etc.

streamholder

@johnpoz:

A dchp server would only nak if

"unless the address is incorrect for the network segment to which the client has been attached and the server is authoritative for that network segment, in which case the server will send a DHCPNAK even though it doesn't know about the address.

If you have a dhcp server set with say a pool of .100 to .150.. And the client requests .50 why should it send a NAK?? What if there is another dhcp server on the network handling that part of the network pool. For a failover scenario..

I do realize (now more than before) that what I describe should not necessarily be the default behaviour, but I think there should be a non-crafty option to just explicitly NAK all non-pool addresses in pfSense, as this is not only entirely permitted by the standard, but solves a set of real world problems.
(e.g. Cisco Meraki, as we've seen, does exactly this by default, and for a good reason: in its application it is vital that IP addresses are always centrally assigned no matter what).

@johnpoz:

Where did the client get this IP from that is asking for it again? If its the same network as the dhcp server currently seeing the request?

From a DHCP server that was on the network before (as in "until 2 months ago") that used to assign addresses in the same subnet but from a different range. The leases obviously expired, but since the new server never sended any NAK, some of the Windows machines just decided to keep using the old IP addresses (and ipconfig /renew always failed, as it apparently does nothing more than what Windows already does when you just replug the network cable, it does not flush cached leases, which is VERY annoying but technically not uncompliant).

@johnpoz:

I would like to duplicate what your seeing.. So what version of windows are you using?

Windows 10. If you need, I can post the build number over the weekend.

@johnpoz:

So client got an ip address from dhcp server, lets call it 192.168.1.50..
How did this client then move to being on the pfsense network where the pool is only 192.168.1.100 to .150?
Did you just plug it into a different network, did it change wifi networks? Or did you just alter the pool which before included the .50 and now it doesn't? And the client never went anywhere?

As described above, the DHCP server changed. If it can be of any help, it was the built in server of a Cisco 800 router (a nightmare :) before. We just configured a different range on the new one as a kind of "non-disrupting" production test (and it was a good choice after all).

@johnpoz:

Your saying this client which had a lease for .50, and is asking for it again. And not getting a NAK just continues to ask for it?? And never switches to send a discover?

It did also send DHCPDISCOVERs, but then just ignored the corresponding OFFERs after not getting the NAK on the old IP address' request. :-\

johnpoz

It did also send DHCPDISCOVERs, but then just ignored the corresponding OFFERs after not getting the NAK on the old IP address' request

Well that sure and the hell does not sound rfc compliant to me..

Pfsense can only offer options in dhcp that are actually in the dhcp server they use.. Does it offer what your asking about NAK any request for an IP that is not in the current pool? Off the top I would guess no.

streamholder

@johnpoz:

It did also send DHCPDISCOVERs, but then just ignored the corresponding OFFERs after not getting the NAK on the old IP address' request

Well that sure and the hell does not sound rfc compliant to me..

Pfsense can only offer options in dhcp that are actually in the dhcp server they use.. Does it offer what your asking about NAK any request for an IP that is not in the current pool? Off the top I would guess no.

Well, should I write my own? Is there even a simple way to integrate external software with the control panel, and make it survive updates?

doktornotor

Perhaps you should just get the DHCP hotfix from MS for the shitty W10 OS.

JKnott

and ipconfig /renew always failed

What about /release?

doktornotor

Yeah that was one hack around the W10 bug. Others claimed that it required


netsh int ip reset
ipconfig /flushdns

as well… Regardless, this W10 stupidity is not a ISC DHCP fault.

streamholder

@JKnott:

and ipconfig /renew always failed

What about /release?

@doktornotor:

Yeah that was one hack around the W10 bug. Others claimed that it required
netsh int ip reset
ipconfig /flushdns

None of that worked. Nor uninstalling and reinstalling the network card, nor resetting the networking stack.
Some claim to have solved the problem by mounting the Registry in another Windows installation and deleting a specific key. Yuck.

@doktornotor:

Perhaps you should just get the DHCP hotfix from MS for the shitty W10 OS.

Hotfixes are no more a thing. Now there are "incremental updates", and all the computers are up to date.

@doktornotor:

Regardless, this W10 stupidity is not a ISC DHCP fault.

That's a fair observation, but we all know how it works in the real world.

What I mean is, I can't go to my client and tell him to toss all its Windows machines and replace them with Linux machines, and to replace their ERP software because the present one is based on AcuCobol ( ::) ).
Nor I can tell him to buy an IP address management system nor can I use fixed IP addresses, because they have too many machines and a too dynamic of a network to do that.

The easiest solution (and less costly for the client) might really be writing my own dead-simple DHCP server, for how absurd it may sound, if ISC's daemon really does not have an option or a less crafty way to NAK by default.
I found an interesting topic on the ISC user mailing list: https://lists.isc.org/pipermail/dhcp-users/2012-April/015276.html
It seems to talk about this exact issue.

johnpoz

So this is a MS shop?? Sounds like you have lots of windows machines - so your not running Active Directory?? Or any windows servers? Why would you not just run MS dhcp.. Does it NAK clients that ask for an IP that is not in its pool?

Dhcpd will normally NAK an IP that is outside the scope of the network.. So for example if your network is 192.168.0/24 and you run a pool on the dhcp server of .100 to .150 if a client asks for 192.168.0.50 it will not be nak..

But if it asks for 192.168.1.x then it would get a nak..

This is by design so dhcpd is not sending a Nak to clients that are getting a dhcp server from a 2nd or 3rd dhcp server handling parts of the pool..

Normally client that does not get back an answer to a request, and then sends discover should then if gets an offer for that discover should use that IP.. As dok mentions sounds like you have bad dhcp clients to me.. Which are MS clients, so I would be curious if the MS dhcpd naks how you want it too.. My guess is also no..

I can see 1 problem for sure if running multiple dhcp with portion of the pools and sending naks for IP requests that are not in their pool. Now the client would be sending a discover to get an IP every time vs a renew or request.. So wouldn't this mean that you would never actually renew a lease but be getting new lease every time?? Never actually tried to lab that or test such a thing because it really should never come into play.

JKnott

Release followed by renew didn't work????

Yesterday, I used Wireshark to see what happened on a Windows 10 computer. When I just did a renew, I could see the computer make the dhcp request from it's previously assigned address and then receive an ack. When I did a release first, it went through the full discover, offer, request, ack sequence and used 0.0.0.0 as the source address until the ack, at which point it started using the assigned address. If this isn't happening for you, then you've got a real serious problem that has nothing to do with pfSense.

This shows why capturing packets is an extremely useful tool I prefer Wireshark, but you can also use the pfSense packet capture for this. You can also run Wireshark on a Windows computer and leave it running while you try release & renew. For IPv4 DHCP, filter with "port 67 or 68". This will show the packets from both the computer and DHCP server.

johnpoz

^ yup.. packet capture is like step 1 trying to figure stuff like this out on what is going on.

If your setup the pool to include the IPs that are being requested by the client, and there is a lease for that IP then pfsense should be sending NAK.. If your using a subset of the range for the pool.. And a request comes in for an IP that is on that correct network, but outside the pool then no dhcpd will not NAK it.

streamholder

@johnpoz:

So this is a MS shop?? Sounds like you have lots of windows machines - so your not running Active Directory?? Or any windows servers? Why would you not just run MS dhcp.. Does it NAK clients that ask for an IP that is not in its pool?

Because I do not want to run any more MS stuff! I'm not gonna make them pay thousands of dollars for a domain controller just to run DHCP.

@johnpoz:

Dhcpd will normally NAK an IP that is outside the scope of the network.. So for example if your network is 192.168.0/24 and you run a pool on the dhcp server of .100 to .150 if a client asks for 192.168.0.50 it will not be nak..

But if it asks for 192.168.1.x then it would get a nak..

This is by design so dhcpd is not sending a Nak to clients that are getting a dhcp server from a 2nd or 3rd dhcp server handling parts of the pool..

Normally client that does not get back an answer to a request, and then sends discover should then if gets an offer for that discover should use that IP.. As dok mentions sounds like you have bad dhcp clients to me.. Which are MS clients, so I would be curious if the MS dhcpd naks how you want it too.. My guess is also no..

I can see 1 problem for sure if running multiple dhcp with portion of the pools and sending naks for IP requests that are not in their pool. Now the client would be sending a discover to get an IP every time vs a renew or request.. So wouldn't this mean that you would never actually renew a lease but be getting new lease every time?? Never actually tried to lab that or test such a thing because it really should never come into play.

Johnpoz - I really appreciate the effort you are putting to explain this to me again and again and your help in general - but it's not that I don't get it. I understand that this is the default behaviour of DHCP and how it is intended to be normally use.
But sometimes in the real world there are needs that are slightly different from the "intended" behaviour, and this is why pfSense already contains many granular options that are not strictly "standard" but are there because they proved to be useful in some real world situations.
This particular option would be useful in my (and other people's) real world situation as I only want to have a single DHCP server on the whole network, and I need to send the NAKs by default. :)

@JKnott:

Release followed by renew didn't work????

Nope! I was as shocked as you. At one point I was about to bang my head against the desk and surrender.

@JKnott:

Yesterday, I used Wireshark to see what happened on a Windows 10 computer. When I just did a renew, I could see the computer make the dhcp request from it's previously assigned address and then receive an ack. When I did a release first, it went through the full discover, offer, request, ack sequence and used 0.0.0.0 as the source address until the ack, at which point it started using the assigned address. If this isn't happening for you, then you've got a real serious problem that has nothing to do with pfSense.

This shows why capturing packets is an extremely useful tool I prefer Wireshark, but you can also use the pfSense packet capture for this. You can also run Wireshark on a Windows computer and leave it running while you try release & renew. For IPv4 DHCP, filter with "port 67 or 68". This will show the packets from both the computer and DHCP server.

I've already gone that way. As mentioned, the computer would send DHCPDISCOVER and then just ignore the DHCPOFFER, keeping its old IP address after noticing that its DHCPREQUEST for it didn't get NAKed.
And - again - after receiving NAKs for those DHCPREQUESTs (which I got pfSense to send by employing the workaround I described in the initial post), it would no longer ignore the DHCPOFFERs and actually used the new IP address.
In other words, the Windows machine DID request a new IP address, which got correctly offerend, but then it tried to request its old IP address and since it didn't get NAKed by the DHCP server it would actually decide to use that.
I honestly do not think this is not compliant, I think this is just a very weird behaviour. And this is why I think pfSense needs the option to send NAKs by default.

@johnpoz:

^ yup.. packet capture is like step 1 trying to figure stuff like this out on what is going on.

If your setup the pool to include the IPs that are being requested by the client, and there is a lease for that IP then pfsense should be sending NAK.. If your using a subset of the range for the pool.. And a request comes in for an IP that is on that correct network, but outside the pool then no dhcpd will not NAK it.

This was added while I was replying. I just confirm what I've already said. Packet capture done, I understand the behaviour and why it is that way, I still think that there should be the option to override this behaviour.