DHCP fails silently, but works on reboot of pfSense
-
There must be something farther back in the logs that triggers all of that, though. Those are all the consequence of some kind of interface event (link up/down) or similar.
-
This is WAY off topic, so I apologize, and maybe it can be a separate topic.
@obelsen - I'm very new to the use and tech, but why would there be a real-world use for so many VLAN's - you state 200 in your original post. How/when would anybody need to use that many virtual networks? I could see a handful being useful, but why a very large amount? I'm just curious...
Jeff
-
@jimp said in DHCP fails silently, but works on reboot of pfSense:
There are no errors at all in the system log or DHCP log?
The service shows as stopped under Status > Services?
If you restart the service there, it doesn't work? (Or if it's running, stop and then start it again)
Anything else unusual when it's down, like RAM usage?
There are no errors in either the system log or the DHCP log.
It is not possible to restart the service, only by restarting the machine. The service is stopped, and is not running. RAM usage is far from a problem on our XG-2758.Are you using any other new/recent features like DNS over TLS?
No.
@jimp said in DHCP fails silently, but works on reboot of pfSense:
There must be something farther back in the logs that triggers all of that, though. Those are all the consequence of some kind of interface event (link up/down) or similar.
While this was a response to someone else, there is nothing on my machine's logs that indicates any problem at all. The service simply fails to start, but when it runs it doesn't have any issues unless something configures an interface, leading to the DHCP service silent crash.
@akuma1x said in DHCP fails silently, but works on reboot of pfSense:
This is WAY off topic, so I apologize, and maybe it can be a separate topic.
@obelsen - I'm very new to the use and tech, but why would there be a real-world use for so many VLAN's - you state 200 in your original post. How/when would anybody need to use that many virtual networks? I could see a handful being useful, but why a very large amount? I'm just curious...
Jeff
200 VLANs is hardly what I would expect as exceptional. We use it to ensure that all communication goes through our firewall, so we can filter traffic as we want per socket. Port isolation on our switches was another alternative, but gives less control.
-
So there are no log messages at all in your case? Not even any interface events or other logs that seem to repeat like the other person seeing this?
When you have dhcpd running, look at the output of
ps uxaww | grep dhcpd
and note the full command output. Next time it fails, try to run that by hand from an ssh or console shell and see if it produces any output. If it doesn't, try adding-d
to the parameters before anything else, which should have it print output to the terminal. -
@jimp said in DHCP fails silently, but works on reboot of pfSense:
So there are no log messages at all in your case? Not even any interface events or other logs that seem to repeat like the other person seeing this?
None. In the DHCP log, only events related to new leases are present. When I try to restart the service, it only logs the listening/sending interfaces. Following every restart of the service, it's nothing until the same log spam of listening/sending interfaces, ending with the sending on socket/fallback.
When you have dhcpd running, look at the output of
ps uxaww | grep dhcpd
and note the full command output. Next time it fails, try to run that by hand from an ssh or console shell and see if it produces any output. If it doesn't, try adding-d
to the parameters before anything else, which should have it print output to the terminal.The DHCP server fails only when modifying interfaces. It does not crash when it is already running. I would expect this is because the daemon is restarted, and it is unable to start again.
I will try to replicate the conditions again in a VM and run ps, however it did not enlighten me earlier when I did my own troubleshooting. -
@obelsen said in DHCP fails silently, but works on reboot of pfSense:
None. In the DHCP log, only events related to new leases are present.
What about in the main system log?
@obelsen said in DHCP fails silently, but works on reboot of pfSense:
The DHCP server fails only when modifying interfaces
Do you have any special settings on that interface? Maybe a spoofed MAC address, MTU, or other similar setting either on the parent interface (if assigned) or one of the VLANs?
-
@jimp said in DHCP fails silently, but works on reboot of pfSense:
@obelsen said in DHCP fails silently, but works on reboot of pfSense:
None. In the DHCP log, only events related to new leases are present.
What about in the main system log?
Negative. No logs at all. The only items present were standard login logs and other unrelated info.
@obelsen said in DHCP fails silently, but works on reboot of pfSense:
The DHCP server fails only when modifying interfaces
Do you have any special settings on that interface? Maybe a spoofed MAC address, MTU, or other similar setting either on the parent interface (if assigned) or one of the VLANs?
The interfaces are all standard settings with a static ip defined. No MAC spoofing or other non standard configuration.
The parent interface (LAN) also had a static IP like each vlan interface. -
I would like to add that this has been an issue for around a year, and I have previously written on the subreddit for pfSense for help to no avail.
-
Can you answer the other questions I asked in my previous reply?
-
@jimp said in DHCP fails silently, but works on reboot of pfSense:
Can you answer the other questions I asked in my previous reply?
I did, forgot to put a newline :)
-
The symptoms all fit with something causing a link loop which generally only happens on certain drivers in certain situations such as changing specific settings which cause the link to drop and come back.
That triggers the link up/down scripts, which reconfigure the interfaces, which triggers a new link event, and so on.
But that scenario would log quite a lot of info in the main system log as it happens. It wouldn't happen silently.
Are the affected NICs all
igb
interfaces? -
@jimp said in DHCP fails silently, but works on reboot of pfSense:
The symptoms all fit with something causing a link loop which generally only happens on certain drivers in certain situations such as changing specific settings which cause the link to drop and come back.
That triggers the link up/down scripts, which reconfigure the interfaces, which triggers a new link event, and so on.
But that scenario would log quite a lot of info in the main system log as it happens. It wouldn't happen silently.
Are the affected NICs all
igb
interfaces?It happens both when LAN is in the rj45 port (igb) and when moved to one of the spf+ ports (ix i believe)
-
Additionally, I forgot to mention that even restarting an openvpn server caused the problem.
Openvpn does however not cause the problem itself, as an install without openvpn showed the same issues. -
Restarting OpenVPN would also trigger a restart of some services, which could land in a similar scenario depending on the circumstances.
There must be something out of the ordinary on there that triggers it, however.
What other packages are on there? Any other services on the firewall?
I'd be interested in looking at a full copy of the config.xml if possible. You can redact some private info (passwords/certs/etc) but I'd like to see as much of it as possible. You can send it in a PM or send it to
<my forum username>@pfsense.org
and you can encrypt it with GPG/PGP if you like, there is a key on public key servers for that e-mail address. -
I'll see what I can do, I'll take a look at it tomorrow and get back to you.
-
@jimp @obelsen
Here is some additional information from my end. Rebooting the pfsense gets everything working fine again. Left things running for 2 days without changing anything and everything ran smoothly for that period of time. System Log was totally clean.
The 2 days without changing anything was the time period it took to get a second X1541 shipped out to me. I couldn't keep rebooting to bring the DHCP server back on line every time I made an interface change.
So with the second x1541 in place (I haven't set up HA yet), I am now making the interface changes on the off-line X1541 saving config and rebooting, then hot swapping to production. Then taking the now off-line X1541, restoring the config I saved with the updates. This is working for now to get me thru some immediate updates I need to make, but certainly not a long term solution.
The clarify the syslog is totally clean until I make an interface change, then the log starts immediately filling up (as posted earlier) -
@bjk said in DHCP fails silently, but works on reboot of pfSense:
@jimp @obelsen
Here is some additional information from my end. Rebooting the pfsense gets everything working fine again. Left things running for 2 days without changing anything and everything ran smoothly for that period of time. System Log was totally clean.
The 2 days without changing anything was the time period it took to get a second X1541 shipped out to me. I couldn't keep rebooting to bring the DHCP server back on line every time I made an interface change.
So with the second x1541 in place (I haven't set up HA yet), I am now making the interface changes on the off-line X1541 saving config and rebooting, then hot swapping to production. Then taking the now off-line X1541, restoring the config I saved with the updates. This is working for now to get me thru some immediate updates I need to make, but certainly not a long term solution.
The clarify the syslog is totally clean until I make an interface change, then the log starts immediately filling up (as posted earlier)I don't think it's the exact same issue, as my system log does not have any indication of an issue when the DHCP server is down (other than that my service watchdog keeps restarting DHCPD)
-
It does seem like there is some overlap tho as we are both utilizing 100+ Vlans and the DHCP server stops functioning when any updates are made to Interfaces. Or maybe I'm over simplifying? In reading your posts, I want to say I'm seeing the same problem, only I do have system log activity. If our issues are somehow related, maybe the addition to me having the log activity is a clue? probably not... just throwing it out there.
-
@jimp As I am adding VLANs on the off-line x1541, I hit a snag. I was able to create the 163rd VLAN and select "Add" in the Interface Assignments, but now this latest VLAN I added isn't showing up in the Interface Assignments tab. If I go back to the VLANs tab, the VLAN I last created is there. When I try to delete the VLAN, I receive an error "This VLAN cannot be deleted because it is still being used as an interface". Yet it isn't showing up in the Interface list. I decided to reboot (after I saved the Config). Upon reboot, the last several VLANs were missing. I restored from the back up and was able to get back to where I was before the reboot (can see the VLAN but not the Interface).
@obelsen, you didn't run into any troubles adding all your VLANs? Have I hit some limitation here? -
@bjk said in DHCP fails silently, but works on reboot of pfSense:
@jimp As I am adding VLANs on the off-line x1541, I hit a snag. I was able to create the 163rd VLAN and select "Add" in the Interface Assignments, but now this latest VLAN I added isn't showing up in the Interface Assignments tab. If I go back to the VLANs tab, the VLAN I last created is there. When I try to delete the VLAN, I receive an error "This VLAN cannot be deleted because it is still being used as an interface". Yet it isn't showing up in the Interface list. I decided to reboot (after I saved the Config). Upon reboot, the last several VLANs were missing. I restored from the back up and was able to get back to where I was before the reboot (can see the VLAN but not the Interface).
@obelsen, you didn't run into any troubles adding all your VLANs? Have I hit some limitation here?I have not encountered this error.