DHCPd Failover on 2nd Subnet
-
Hi All,
I'm running 1.2-RELEASE on two nodes.
DHCPd is serving addresses on two subnets, LAN (dhcp0) and DMZ (dhcp1). I have a problem with DHCPd failover on the second failover peer, dhcp1. Failover is synchronized on dhcp0 with both nodes, pfsense-a and pfsense-b, reporting "normal" operation. Failover is not in sync on dhcp1 with both nodes reporting "recover" and their peer is "unknown-state".
The time on both systems is synchronized via NTP as confirmed by running "date" on both nodes via command line.
Here is my dhcpd.conf from pfsense-a:
option domain-name "my.domain.com";
default-lease-time 7200;
max-lease-time 86400;
authoritative;
log-facility local7;
ddns-update-style none;
one-lease-per-client true;
deny duplicates;
failover peer "dhcp0" {
primary;
address 192.168.1.2;
port 519;
peer address 192.168.1.3;
peer port 520;
max-response-delay 10;
max-unacked-updates 10;
split 128;
mclt 600;load balance max seconds 3;
}
failover peer "dhcp1" {
primary;
address 192.168.3.2;
port 519;
peer address 192.168.3.3;
peer port 520;
max-response-delay 10;
max-unacked-updates 10;
split 128;
mclt 600;load balance max seconds 3;
}
subnet 192.168.1.0 netmask 255.255.255.0 {
pool {
option domain-name-servers 192.168.1.9,192.168.1.10;
deny dynamic bootp clients;
failover peer "dhcp0";
range 192.168.1.11 192.168.1.180;
}
option routers 192.168.1.1;
option domain-name-servers 192.168.1.9,192.168.1.10;
}
host s_lan_0 {
hardware ethernet 00:17:f2:c2:5d:24;
fixed-address 192.168.1.181;
}
subnet 192.168.3.0 netmask 255.255.255.0 {
pool {
option domain-name-servers 1.2.3.4,1.2.3.4;
deny dynamic bootp clients;
deny unknown clients;
failover peer "dhcp1";
range 192.168.3.101 192.168.3.102;
}
option routers 192.168.3.1;
option domain-name-servers 1.2.3.4,1.2.3.5;
}
host s_opt2_0 {
hardware ethernet 00:11:43:3c:02:08;
fixed-address 192.168.3.101;
}
host s_opt2_1 {
hardware ethernet 00:11:43:3d:22:fd;
fixed-address 192.168.3.102;
}Here's the dhcpd.conf from pfsense-b:
option domain-name "my.domain.com";
default-lease-time 7200;
max-lease-time 86400;
authoritative;
log-facility local7;
ddns-update-style none;
one-lease-per-client true;
deny duplicates;
failover peer "dhcp0" {
secondary;
address 192.168.1.3;
port 520;
peer address 192.168.1.2;
peer port 519;
max-response-delay 10;
max-unacked-updates 10;
mclt 600;load balance max seconds 3;
}
failover peer "dhcp1" {
secondary;
address 192.168.3.3;
port 520;
peer address 192.168.3.2;
peer port 519;
max-response-delay 10;
max-unacked-updates 10;
mclt 600;load balance max seconds 3;
}
subnet 192.168.1.0 netmask 255.255.255.0 {
pool {
option domain-name-servers 192.168.1.9,192.168.1.10;
deny dynamic bootp clients;
failover peer "dhcp0";
range 192.168.1.11 192.168.1.180;
}
option routers 192.168.1.1;
option domain-name-servers 192.168.1.9,192.168.1.10;
}
host s_lan_0 {
hardware ethernet 00:17:f2:c2:5d:24;
fixed-address 192.168.1.181;
}
subnet 192.168.3.0 netmask 255.255.255.0 {
pool {
option domain-name-servers 1.2.3.4,1.2.3.5;
deny dynamic bootp clients;
deny unknown clients;
failover peer "dhcp1";
range 192.168.3.101 192.168.3.102;
}
option routers 192.168.3.1;
option domain-name-servers 1.2.3.4,1.2.3.5;
}
host s_opt2_0 {
hardware ethernet 00:11:43:3c:02:08;
fixed-address 192.168.3.101;
}
host s_opt2_1 {
hardware ethernet 00:11:43:3d:22:fd;
fixed-address 192.168.3.102;
}When I restart DHCPd on pfsense-b, I see the following in the DHCP logs on pfsense-a:
Nov 18 12:59:19 dhcpd: failover: connect: no matching state.
and on pfsense-b:
Nov 18 12:59:05 dhcpd: failover peer dhcp1: I move from recover to startup
Nov 18 12:59:20 dhcpd: failover peer dhcp1: I move from startup to recoverAny ideas about what's going on?
Thanks!
Martín -
I have the same problem. If you check the firewall logs, you'll most likely see something similar:
- To clarify, these are blocked Nov 22 23:12:57 CARP2 192.168.50.1:519 192.168.50.254:51125 TCP Nov 22 23:12:57 CARP3 192.168.7.1:519 192.168.7.254:59893 TCP Nov 22 23:12:57 CARP4 192.168.8.1:519 192.168.8.254:64975 TCP Nov 22 23:12:57 CARP5 192.168.9.1:519 192.168.9.254:52301 TCP Nov 22 23:12:57 CARP6 10.0.2.1:519 10.0.2.254:63893 TCP
For me, I've created rules on each interface that allows all firewalls to connect to each other:
TCP hFirewalls * hFirewalls * * For DHCP Failovers
hFirewall includes all firewall ips for all interfaces, as well as the CARP interface IPs. Like you, dhcp0 is in a weird state:
Primary: "dhcp0" recover-wait 2008/11/22 22:46:33 recover-wait 2008/11/22 22:46:33 Secondary: "dhcp0" recover-wait 2008/11/22 22:43:01 recover-wait 2008/11/22 22:43:01
Unfortunately no matter what I do, I cannot get pf to not block these requests. Any takers?
-
Hi jewps,
I don't see any evidence of the firewall blocking DHCP traffic. And I'm not seeing the same sates as you. I see My state "recover" and Peer state "unknown-state" on each node.
I suspect that we have different root causes, despite the fact that some symptoms appear to be similar.
Best,
Martín -
For the other subnets, I had the "recover" and "unknown-state" issue. dhcp0 is the only one that seems to be some-what working. Unfortunately I no longer have those screenshots I was going to post here and I disabled DHCP failover on these two boxes ever since my previous posting. It just wouldn't work regardless.
We may have different root causes but after doing some searching (mailing list and forums), it indicates others have had similar problems with DHCP failover.
Maybe I'll send an email to the mailing list later to see if there is a better response there.
Thanks for the follow up Martin.
PS, I didn't mean to jack your thead, I figure the problems are similar enough and if we find that the root causes are different, I'll delete my posts and move it on to a new thread. Let me know if you like me to do that anyways.
-
I have been having the same problem, dhcp0 works, but dhcp1 is constantly in recover / unknown state. There does not seem to be anything interesting in the dhcp or firewall logs either.
-
Almost 3 years later and I think I've figured out the problem.
It seems that you only specify the "Failover peer IP" on the first interface which will run DHCP server - not on any other interface. In my case the first interface is the LAN, I'm not sure how pfSense decides which is the "first."
In the correct setup, I have one Failover Group - "dhcp0" and it contains addresses from 192.168.1.0 and 192.168.3.0.
I hope this helps someone else!
Best,
Martín