DHCPD does not start, gives error
-
Hello,
we have a pfSense Cluster with 2 pfSense VMs, Primary + Secondary, using them as Captive Portal. DHCP is configured as failover pool with the Secondary as Failover Peer IP. They have both pfsense software 2.4.4_p3There is the problem, that at certain times the dhcpd server on the secondary dies while it does a restart.
dhcpd log:
Oct 2 17:29:56 dhcpd exiting. Oct 2 17:29:56 dhcpd process and the information we find helpful for debugging. Oct 2 17:29:56 dhcpd before submitting a bug. These pages explain the proper Oct 2 17:29:56 dhcpd bugs on either our web page at www.isc.org or in the README file Oct 2 17:29:56 dhcpd than a configuration issue please read the section on submitting Oct 2 17:29:56 dhcpd If you think you have received this message due to a bug rather Oct 2 17:29:56 dhcpd Can't register I/O handle for vmx1.2688: out of range Oct 2 17:29:56 dhcpd Unable to register fd with library out of range Oct 2 17:29:56 dhcpd Sending on BPF/vmx1.2688/00:50:56:86:b9:8f/172.16.64.0/21 Oct 2 17:29:56 dhcpd Listening on BPF/vmx1.2688/00:50:56:86:b9:8f/172.16.64.0/21 Oct 2 17:29:56 dhcpd Sending on BPF/vmx1.2690/00:50:56:86:b9:8f/172.16.80.0/21 Oct 2 17:29:56 dhcpd Listening on BPF/vmx1.2690/00:50:56:86:b9:8f/172.16.80.0/21 Oct 2 17:29:56 dhcpd Sending on BPF/vmx1.2687/00:50:56:86:b9:8f/172.16.56.0/21 Oct 2 17:29:56 dhcpd Listening on BPF/vmx1.2687/00:50:56:86:b9:8f/172.16.56.0/21 Oct 2 17:29:56 dhcpd Sending on BPF/vmx1.2691/00:50:56:86:b9:8f/172.16.88.0/21 Oct 2 17:29:56 dhcpd Listening on BPF/vmx1.2691/00:50:56:86:b9:8f/172.16.88.0/21 Oct 2 17:29:56 dhcpd Sending on BPF/vmx1.2693/00:50:56:86:b9:8f/172.16.104.0/21 Oct 2 17:29:56 dhcpd Listening on BPF/vmx1.2693/00:50:56:86:b9:8f/172.16.104.0/21 Oct 2 17:29:56 dhcpd Sending on BPF/vmx1.2696/00:50:56:86:b9:8f/172.16.128.0/21 Oct 2 17:29:56 dhcpd Listening on BPF/vmx1.2696/00:50:56:86:b9:8f/172.16.128.0/21 Oct 2 17:29:56 dhcpd Sending on BPF/vmx1.2692/00:50:56:86:b9:8f/172.16.96.0/21 Oct 2 17:29:56 dhcpd Listening on BPF/vmx1.2692/00:50:56:86:b9:8f/172.16.96.0/21 Oct 2 17:29:56 dhcpd Sending on BPF/vmx1.2689/00:50:56:86:b9:8f/172.16.72.0/21 Oct 2 17:29:56 dhcpd Listening on BPF/vmx1.2689/00:50:56:86:b9:8f/172.16.72.0/21 Oct 2 17:29:56 dhcpd Wrote 11661 leases to leases file. Oct 2 17:29:56 dhcpd For info, please visit https://www.isc.org/software/dhcp/ Oct 2 17:29:56 dhcpd All rights reserved. Oct 2 17:29:56 dhcpd Copyright 2004-2018 Internet Systems Consortium. Oct 2 17:29:56 dhcpd Internet Systems Consortium DHCP Server 4.3.6-P1 Oct 2 17:29:56 dhcpd PID file: /var/run/dhcpd.pid Oct 2 17:29:56 dhcpd Database file: /var/db/dhcpd.leases Oct 2 17:29:56 dhcpd Config file: /etc/dhcpd.conf Oct 2 17:29:56 dhcpd For info, please visit https://www.isc.org/software/dhcp/ Oct 2 17:29:56 dhcpd All rights reserved. Oct 2 17:29:56 dhcpd Copyright 2004-2018 Internet Systems Consortium. Oct 2 17:29:56 dhcpd Internet Systems Consortium DHCP Server 4.3.6-P1 Oct 2 17:29:52 dhcpd DHCPACK on 172.16.89.60 to 54:99:63:bc:fb:10 (Marcos-iPhone) via vmx1.2691
I can then hit the start button for the dhcpd service, and after 2-10 tries with error, the service eventually starts without error
Can't register I/O handle for vmx1.2688: out of range
Oct 2 17:29:56 dhcpd Unable to register fd with library out of rangeStrange thing is, when disabling dhcp server on that interface, the issue persists, then complaining about another interface.
I don't know exactly what the dhcpd causes to restart, but the general log shows at exactly the same time:
Oct 2 17:29:57 minicron (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated) Oct 2 17:29:57 minicron (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated) Oct 2 17:29:57 minicron (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated) Oct 2 17:29:57 minicron (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated) Oct 2 17:29:56 minicron (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated) Oct 2 17:29:56 minicron (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated) Oct 2 17:29:56 minicron (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated) Oct 2 17:29:56 minicron (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated) Oct 2 17:29:56 minicron (/etc/rc.prunecaptiveportal) terminated by signal 15 (Terminated) Oct 2 17:29:53 check_reload_status Reloading filter Oct 2 17:29:52 check_reload_status Carp backup event Oct 2 17:29:52 kernel vmx1.2696: promiscuous mode disabled Oct 2 17:29:52 kernel carp: 8@vmx1.2696: BACKUP -> INIT (hardware interface up) Oct 2 17:29:52 kernel ifa_maintain_loopback_route: deletion failed for interface vmx1.2696: 3 Oct 2 17:29:52 kernel ifa_maintain_loopback_route: deletion failed for interface vmx1.2696: 3 Oct 2 17:29:52 kernel ifa_maintain_loopback_route: deletion failed for interface vmx1.2696: 3 Oct 2 17:29:52 kernel carp: 8@vmx1.2696: INIT -> BACKUP (initialization complete) Oct 2 17:29:52 kernel vmx1.2696: promiscuous mode enabled Oct 2 17:29:52 check_reload_status Carp backup event
This happens seemingly randomly AND every day at 3:00AM, but not always causing the dhcpd to crash.
The dhcpd crash ONLY happens on the secondary.
Maybe it has to do with a config sync from the primary?
I hope someone can point me to the right direction what I may can check or improve.
best regards
-
Did you ever sort this out? My router started with the same thing out of the blue today.
-
Unfortunately not. I figured out that rebooting helps for a couple of days. We think about paid support to hopefully get rid of this issue.
regards
-
I've exactly the same problem
Randomly DHCP server crash on the secondary node.The cluster runs on version 2.4.4p3.
Does the new 2.4.5 solve this issues ?Any news ?
Thanks
-
Hi Luca,
we wait a couple of weeks before deploying 2.4.5.
I figured out that it is not randomly, but is triggered by a config save on the primary.
After deactivating the automatic config backup the crash is gone.It crashes almost always when manually saving the config on the other node.
regards
-
Thanks so much for the additional information.
So you think that the problem is related to the automatic saving of the configuration (Services -> Auto Config Backup)?
In fact, I use it on both nodes.
Have you deactivated the ACB (Active Config Backup) ? Only on the slave, only on primary on both nodes ?Very thanks
Luca
-
Well, for me it was as follows:
Whenever a config save is done on one node, be it automatic or manual, on the other node the dhcpd stops working. So ACB is now deactivated to keep the dhcpd running. Luckily I had it active on the primary only. Our problems would have been far bigger had I activated the ACB on the secondary, too (without knowing it causes the problem). I do not use it anymore. Maybe this is fixed in the new version. I will test it in our lab when I find the time.
Greetings
-
Great !
Now I disable ACB on the primary (ACB was already disabled on the secondary) and I do some tests.
Thanks
Luca -
I've tried on a two node PfSense cluster on 2.4.5.
Seems works fine !!! DHCP is solid on the slave node. -
Thank You for this valuable information!