DHCP on LAN stops working
-
New user here. I installed a new cable modem and pfsense based router. Due to a miss provisioning of the modem by my service provider 15 to 20 percent of traffic was being lost. pfsense reported warnings about the DNS traffic loss rate. During this time the router entered a state on 3 different occasions over a couple days where it would not perform DHCP protocol on the LAN. My indication of this was the windows PC reverted to the 169 IP address that Microsoft uses when it can't configure a port with DHCP. As a result I could not manage the router in band and I couldn't use the network due to the incorrect IP address. I could still get to the console via HDMI and USB keyboard. I'm not familiar enough with the console menu to have done any meaningful debug. Detaching and reinserting the pc ethernet cable did not help. It is possible that more router functions than DHCP were not working. I rebooted the router each time to get the network running again.
Version is 2.6.0 built Jan 31 2022The problem with cable modem provisioning and traffic has been resolved and the router has performed reliably for several days since then. My guess is that some protocol implementation on the router has problems with repeated loss of its protocol packets on the WAN interface. Mentioning it here as someone working on the code might be able to simulate random packet loss and then catch and kill another bug.
-
@erniee
Cable modem : are you using DHCP on your WAN ? Or PPPOE ?The WAN side uses a DHCP client process, that runs when the WAN connections comes up.
On the LAN(s) side, there is the DHCP server process.
DHCP client and DHCP server are not related to each other.I can pull out my WAN cable, power down my ISP upstream router , but that doesn't have any influence on my LAN, the DHCP server, it keeps on running.
Btw : if DHCP on LAN wouldn't work, true, you would be having a hard time connecting to the GUI from LAN.
You would not be able to set up your WAN connection type.
So you would not be able to connect to your LAN ( and around we go ).If the DHCP server can not start (restart) when there is no WAN any more, this forum would have been obliterated by messages since 2.6.0 came out.
@erniee said in DHCP on LAN stops working:
My indication of this was the windows PC reverted to the 169 IP address that Microsoft uses when it can't configure a port with DHCP
True.
Did you get a "169" address right after the WAN went away ?
Or many hours later (half the DHCP LAN lease time to be exact) ? -
@gertjan Thanks for your reply. The WAN side configuration is basic. I haven't configured any protocols that aren't on by default. I believe DHCP client is used to get the WAN interface's public IP address and gateway IP. The failure indication was on the LAN side. It seemed the DHCP server was not working any more. After some number of hours of working fine my PC reverted to the microsoft 169 address. At that point the PC couldn't do anything on the network because of the bad IP. I bounced link to the PC hoping to stimulate DHCP activity (pulled and reinserted cable) but the port stayed misconfigured so I assume the DHCP server was not working. I didn't think to static configure a good IP on the PC to continue debugging. I was somewhat stressed that both my new modem was dropping packets and my router was failing randomly. Each time this occurred I discovered it some time later so I can't relate the event to a lease timeout but I assume that must have occurred.
It took two calls to the cable company over multiple days to get the modem configured right. Once the modem stopped dropping packets the problem with my PC stopped. Since then the PC has never lost its DHCP assigned IP address and the network has preformed flawlessly.
As you say the LAN side should continue to work even with the WAN side down. I made no changes to the router, only the cable modem provisioning changed and now the LAN side problem I was having has gone away. My guess, and this is at best a guess, is that pfsense has a bug that was triggered by the packet loss I was experiencing on the WAN. One protocol that I know is running there is DNS and it had been complaining that it was seeing loss.
I'm grateful that the router is now functioning reliably but the only thing I did to make it better was get the cable modem to pass packets more reliably. Having worked in the networking industry developing enterprise class switch/routers I know how some issues like this can be difficult to reproduce. This is why I made this post, hoping it would contain enough clues that a developer could improve their test setup and maybe find a bug in a code path that doesn't get exercised much normally. Adding a bit of debug code to randomly toss 10-20% of the traffic coming in the WAN might do this. -
@erniee They should be independent, is the thing. Though, I think there was a post in the last few weeks about someone who claimed they lost connectivity to pfSense when WAN was down/disconnected. I've never seen that.
Did you try unplugging WAN to duplicate?
When this was happening did the logs (esp. DHCP server) in pfSense show anything of note?
-
Added to what @SteveITS said :
To check if the process is still running :
Use god mode (the console ;)) or, better : do what admins do : ssh into the box (now you know why every admin has Putty installed ;)), and ones you're in : go for option 8.Now type :
top
It's a bit tricky to scroll through the list, as it refresh all the time.
Type/
(for "search")
anddhcpd
and enter
last pid: 63183; load averages: 0.14, 0.12, 0.09 up 0+00:14:04 16:38:50 90 processes: 1 running, 89 sleeping CPU: 0.0% user, 0.0% nice, 0.6% system, 0.2% interrupt, 99.2% idle Mem: 257M Active, 92M Inact, 479M Wired, 2979M Free ARC: 219M Total, 65M MFU, 146M MRU, 32K Anon, 1374K Header, 5908K Other 90M Compressed, 247M Uncompressed, 2.75:1 Ratio PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 87816 dhcpd 1 20 0 27M 16M select 0 0:00 0.01% dhcpd 87122 dhcpd 1 20 0 23M 12M select 1 0:00 0.01% dhcpd
You should see the two dhcpd processes running.
One dhcpd process is initiated by the OS, and runs under root rights.
The second one runs with lower privileges, and is actually doing the DHCP work.Me neither, I have seen 'a lot', I guess, but dhcpd failing is not part it.
The ISC DHCP deamon is much older as pfSense is, and I presume half of all router/firewall devices attached to the Internet use it.
So, consider it 'stable' and "won't crash".The dhpcd process can be taken down by the system (pfSense) of course. Like : the interface on which is listens (all the LAN and OPTx interfaces you which you activated it) went down.
Normally, when the interface goes UP, the dhcpd instances are started again.
( but then the market was swamped with they very shape switches and NICs and things became .... as stable as money can buy ^^ )As said : the logs come with an answer.
Oh, I forgot : GUI : Diagnostics > System Activity
-
@steveits agreed they are designed to be separate but I've seen plenty of bugs where one thread does something bad and prevented others from running. I looked at the logs but didn't find anything I recognized as unique/bad. I'm new to the logs so I could have missed something. At some point I cleared the logs so I could better see the next event but it never happened again. I hated it when people would clear logs and then come to me for help ;-) Sorry that I have now done this.
I'll try the disconnect the wan experiment tonight. -
If you have dhcpv6 enabled on WAN the dhcpv6 server on LAN will use the prefix it pulls. Those are coupled. But also should be independent of the v4 service.