DHCP fails silently, but works on reboot of pfSense
-
Can you answer the other questions I asked in my previous reply?
-
@jimp said in DHCP fails silently, but works on reboot of pfSense:
Can you answer the other questions I asked in my previous reply?
I did, forgot to put a newline :)
-
The symptoms all fit with something causing a link loop which generally only happens on certain drivers in certain situations such as changing specific settings which cause the link to drop and come back.
That triggers the link up/down scripts, which reconfigure the interfaces, which triggers a new link event, and so on.
But that scenario would log quite a lot of info in the main system log as it happens. It wouldn't happen silently.
Are the affected NICs all
igb
interfaces? -
@jimp said in DHCP fails silently, but works on reboot of pfSense:
The symptoms all fit with something causing a link loop which generally only happens on certain drivers in certain situations such as changing specific settings which cause the link to drop and come back.
That triggers the link up/down scripts, which reconfigure the interfaces, which triggers a new link event, and so on.
But that scenario would log quite a lot of info in the main system log as it happens. It wouldn't happen silently.
Are the affected NICs all
igb
interfaces?It happens both when LAN is in the rj45 port (igb) and when moved to one of the spf+ ports (ix i believe)
-
Additionally, I forgot to mention that even restarting an openvpn server caused the problem.
Openvpn does however not cause the problem itself, as an install without openvpn showed the same issues. -
Restarting OpenVPN would also trigger a restart of some services, which could land in a similar scenario depending on the circumstances.
There must be something out of the ordinary on there that triggers it, however.
What other packages are on there? Any other services on the firewall?
I'd be interested in looking at a full copy of the config.xml if possible. You can redact some private info (passwords/certs/etc) but I'd like to see as much of it as possible. You can send it in a PM or send it to
<my forum username>@pfsense.org
and you can encrypt it with GPG/PGP if you like, there is a key on public key servers for that e-mail address. -
I'll see what I can do, I'll take a look at it tomorrow and get back to you.
-
@jimp @obelsen
Here is some additional information from my end. Rebooting the pfsense gets everything working fine again. Left things running for 2 days without changing anything and everything ran smoothly for that period of time. System Log was totally clean.
The 2 days without changing anything was the time period it took to get a second X1541 shipped out to me. I couldn't keep rebooting to bring the DHCP server back on line every time I made an interface change.
So with the second x1541 in place (I haven't set up HA yet), I am now making the interface changes on the off-line X1541 saving config and rebooting, then hot swapping to production. Then taking the now off-line X1541, restoring the config I saved with the updates. This is working for now to get me thru some immediate updates I need to make, but certainly not a long term solution.
The clarify the syslog is totally clean until I make an interface change, then the log starts immediately filling up (as posted earlier) -
@bjk said in DHCP fails silently, but works on reboot of pfSense:
@jimp @obelsen
Here is some additional information from my end. Rebooting the pfsense gets everything working fine again. Left things running for 2 days without changing anything and everything ran smoothly for that period of time. System Log was totally clean.
The 2 days without changing anything was the time period it took to get a second X1541 shipped out to me. I couldn't keep rebooting to bring the DHCP server back on line every time I made an interface change.
So with the second x1541 in place (I haven't set up HA yet), I am now making the interface changes on the off-line X1541 saving config and rebooting, then hot swapping to production. Then taking the now off-line X1541, restoring the config I saved with the updates. This is working for now to get me thru some immediate updates I need to make, but certainly not a long term solution.
The clarify the syslog is totally clean until I make an interface change, then the log starts immediately filling up (as posted earlier)I don't think it's the exact same issue, as my system log does not have any indication of an issue when the DHCP server is down (other than that my service watchdog keeps restarting DHCPD)
-
It does seem like there is some overlap tho as we are both utilizing 100+ Vlans and the DHCP server stops functioning when any updates are made to Interfaces. Or maybe I'm over simplifying? In reading your posts, I want to say I'm seeing the same problem, only I do have system log activity. If our issues are somehow related, maybe the addition to me having the log activity is a clue? probably not... just throwing it out there.
-
@jimp As I am adding VLANs on the off-line x1541, I hit a snag. I was able to create the 163rd VLAN and select "Add" in the Interface Assignments, but now this latest VLAN I added isn't showing up in the Interface Assignments tab. If I go back to the VLANs tab, the VLAN I last created is there. When I try to delete the VLAN, I receive an error "This VLAN cannot be deleted because it is still being used as an interface". Yet it isn't showing up in the Interface list. I decided to reboot (after I saved the Config). Upon reboot, the last several VLANs were missing. I restored from the back up and was able to get back to where I was before the reboot (can see the VLAN but not the Interface).
@obelsen, you didn't run into any troubles adding all your VLANs? Have I hit some limitation here? -
@bjk said in DHCP fails silently, but works on reboot of pfSense:
@jimp As I am adding VLANs on the off-line x1541, I hit a snag. I was able to create the 163rd VLAN and select "Add" in the Interface Assignments, but now this latest VLAN I added isn't showing up in the Interface Assignments tab. If I go back to the VLANs tab, the VLAN I last created is there. When I try to delete the VLAN, I receive an error "This VLAN cannot be deleted because it is still being used as an interface". Yet it isn't showing up in the Interface list. I decided to reboot (after I saved the Config). Upon reboot, the last several VLANs were missing. I restored from the back up and was able to get back to where I was before the reboot (can see the VLAN but not the Interface).
@obelsen, you didn't run into any troubles adding all your VLANs? Have I hit some limitation here?I have not encountered this error.
-
There is no limit on the number of VLANs you can have in the GUI. I'm not quite sure what would have happened there, but I recall seeing anything like that before. Are you certain there wasn't some unintentional overlap? IIRC input validation prevents most obvious ways of causing duplicate entries but maybe something else was going on there.
Go to Diagnostics > Backup & Restore and check the config history at each point in the changes you made, see what was added when.
-
@jimp I've reviewed the interfaces a few times and cannot see any conflict or redundancies. Reviewing the XML, the VLAN and Interface are there, just doesn't show up in the UI.
I also stepped backwards in the configs a couple times to re-add the vlan and also tried adding a completely random vlan with the same result of it not showing up after adding in the interfaces.
From my experience, it appears I've hit a limit of not being able to add any more interfaces and have them show up in the UI... tho they are showing up in the XML -
This post is deleted! -
So exactly how many VLAN tags do you have in config.xml when they stop showing up?
And can you confirm which version of pfSense you are running on that device?
-
@jimp
Hardware XG-1541 running 2.4.4-RELEASE (amd64)
built on Thu Sep 20 09:33:19 EDT 2018
FreeBSD 11.2-RELEASE-p3The GUI stopped displaying after the 163 VLAN was created. Also, please recall, I rebooted after to see if something magic would happen. And it did. The GUI was now not evern showing the 162 VLANs Ihad previously created. I didn't count, but all I could see in the GUI for interfaces was more like ~155 VLANs. I restored the config from before the reboot and then I could see the 162 VLANs again in the GUI. At this time, I was not reviewing the XML to see what it had in it. But something wonky for sure going on with the GUI not displaying all interfaces.
btw, thank you very much for your help with this.
-
I generated a config with 250 VLANs (assigned, enabled, with DHCP) and so far they all show up everywhere. The console menu, the interfaces widget, the interfaces menu, the VLAN tab list, the DHCP server, even firewall rules.
I was able to reproduce some weirdness when applying a change to an interface, though. It's like it is very slowly going through each VLAN and touching it. PHP is maxing out a whole core, but it's only doing maybe one VLAN every 2 minutes. So at this rate it would take 8h20m to process them all. Fun. :-)
So I can at least open an issue for that part, which may be the original cause of this thread. Not sure what is taking so long to process them all, or why it's touching every VLAN when changing only one. Might not make -p1 but we'll see.
-
@jimp said in DHCP fails silently, but works on reboot of pfSense:
I generated a config with 250 VLANs (assigned, enabled, with DHCP) and so far they all show up everywhere. The console menu, the interfaces widget, the interfaces menu, the VLAN tab list, the DHCP server, even firewall rules.
I was able to reproduce some weirdness when applying a change to an interface, though. It's like it is very slowly going through each VLAN and touching it. PHP is maxing out a whole core, but it's only doing maybe one VLAN every 2 minutes. So at this rate it would take 8h20m to process them all. Fun. :-)
So I can at least open an issue for that part, which may be the original cause of this thread. Not sure what is taking so long to process them all, or why it's touching every VLAN when changing only one. Might not make -p1 but we'll see.
Ah I can see that being the issue. I haven't actually tried leaving it running for that long, since people around here don't know how to set static IPs for that duration :-)
-
I put in https://redmine.pfsense.org/issues/9115 for this for now. If I notice any other clues about how it misbehaves I'll drop them on there.
In the meantime you might be able to at least do the 16/11 trick after making a change to keep it running. It does seem to apply the intended change quickly, and then gets mired in doing something else after.