DHCP failover both in recover

Ap0p0

Hi all,

I'm trying to activate DHCP failover on a CARP LAN interface but both nodes are in recover / unknown peer state

node1 - node2 - VIP

10.6.0.1/16 - 10.6.0.2/16 - 10.6.0.3/16

node1 (primary) has 10.6.0.2 as failover peer, and pfsync put 10.6.01 as failover peer on node2 automatically.

Firewall rules are dynamically added:
pass in quick on $GUESTS1006 proto { tcp udp } from 10.6.0.2 to 10.6.0.1 port = 519 tracker 1000013144 label "allow access to DHCP failover"
pass in quick on $GUESTS1006 proto { tcp udp } from 10.6.0.2 to 10.6.0.1 port = 520 tracker 1000013145 label "allow access to DHCP failover"

and

pass in quick on $GUESTS1006 proto { tcp udp } from 10.6.0.1 to 10.6.0.2 port = 519 tracker 1000013144 label "allow access to DHCP failover"
pass in quick on $GUESTS1006 proto { tcp udp } from 10.6.0.1 to 10.6.0.2 port = 520 tracker 1000013145 label "allow access to DHCP failover"

Both nodes are NTP synced (same NTP servers)

Both dhcpd.conf seems to be OK too:

default-lease-time 7200;
max-lease-time 86400;
log-facility local7;
one-lease-per-client true;
deny duplicates;
ping-check true;
update-conflict-detection false;
authoritative;
failover peer "dhcp_opt10" {
primary;
address 10.6.0.1;
port 519;
peer address 10.6.0.2;
peer port 520;
max-response-delay 10;
max-unacked-updates 10;
split 128;
mclt 600;

load balance max seconds 3;
}

subnet 10.6.0.0 netmask 255.255.0.0 {
pool {
option domain-name-servers 10.6.0.3;
deny dynamic bootp clients;
failover peer "dhcp_opt10";

range 10.6.1.1 10.6.9.255;
}

option routers 10.6.0.3;
option domain-name "office-people-doc.com";
option domain-name-servers 10.6.0.3;
max-lease-time 7200;

}

default-lease-time 7200;
max-lease-time 86400;
log-facility local7;
one-lease-per-client true;
deny duplicates;
ping-check true;
update-conflict-detection false;
authoritative;
failover peer "dhcp_opt10" {
secondary;
address 10.6.0.2;
port 520;
peer address 10.6.0.1;
peer port 519;
max-response-delay 10;
max-unacked-updates 10;

load balance max seconds 3;
}

subnet 10.6.0.0 netmask 255.255.0.0 {
pool {
option domain-name-servers 10.6.0.3;
deny dynamic bootp clients;
failover peer "dhcp_opt10";

range 10.6.1.1 10.6.9.255;
}

option routers 10.6.0.3;
option domain-name "office-people-doc.com";
option domain-name-servers 10.6.0.3;
max-lease-time 7200;

}

I removed all DHCP lease on both nodes to have them clear, but no way. Both are staying in recover mode and does not serve IPs to clients. Where am I wrong? :-)

I can see that on both nodes, nothing is received/sent on port 519 and 520 on the LAN interface. I think that's the problem but why?

Ap0p0

I found something. Both nodes are unable to communicate between them.

SNAT on loopback is translated to "interface address" so it should be good.

I did a firewall alias with both real IPs 10.6.0.1 and 10.6.0.2 and I added a rule on interface "GUESTS1006" like:
any protocol source "alias" to interface address

no way! nodes can't ping each other.

By the way, I have an other interface with CARP, on other subnet and nodes can ping each other. I can't see difference… Both interface are VLAN, CARP configuration is exactly the same, SNAT too. Diff is on firewall rules but I tried a any2any rule on GUESTS1006 and does not work. No packets matches the rule, I can't explain.