DHCPd Failover on 2nd Subnet



  • Hi All,

    I'm running 1.2-RELEASE on two nodes.

    DHCPd is serving addresses on two subnets, LAN (dhcp0) and DMZ (dhcp1).  I have a problem with DHCPd failover on the second failover peer, dhcp1.  Failover is synchronized on dhcp0 with both nodes, pfsense-a and pfsense-b, reporting "normal" operation.  Failover is not in sync on dhcp1 with both nodes reporting "recover" and their peer is "unknown-state".

    The time on both systems is synchronized via NTP as confirmed by running "date" on both nodes via command line.

    Here is my dhcpd.conf from pfsense-a:

    option domain-name "my.domain.com";
    default-lease-time 7200;
    max-lease-time 86400;
    authoritative;
    log-facility local7;
    ddns-update-style none;
    one-lease-per-client true;
    deny duplicates;
    failover peer "dhcp0" {
      primary;
      address 192.168.1.2;
      port 519;
      peer address 192.168.1.3;
      peer port 520;
      max-response-delay 10;
      max-unacked-updates 10;
      split 128;
      mclt 600;

    load balance max seconds 3;
    }
    failover peer "dhcp1" {
      primary;
      address 192.168.3.2;
      port 519;
      peer address 192.168.3.3;
      peer port 520;
      max-response-delay 10;
      max-unacked-updates 10;
      split 128;
      mclt 600;

    load balance max seconds 3;
    }
    subnet 192.168.1.0 netmask 255.255.255.0 {
            pool {
                    option domain-name-servers 192.168.1.9,192.168.1.10;
                    deny dynamic bootp clients;
                    failover peer "dhcp0";
                    range 192.168.1.11 192.168.1.180;
            }
            option routers 192.168.1.1;
            option domain-name-servers 192.168.1.9,192.168.1.10;
    }
    host s_lan_0 {
            hardware ethernet 00:17:f2:c2:5d:24;
            fixed-address 192.168.1.181;
    }
    subnet 192.168.3.0 netmask 255.255.255.0 {
            pool {
                    option domain-name-servers 1.2.3.4,1.2.3.4;
                    deny dynamic bootp clients;
                    deny unknown clients;
                    failover peer "dhcp1";
                    range 192.168.3.101 192.168.3.102;
            }
            option routers 192.168.3.1;
            option domain-name-servers 1.2.3.4,1.2.3.5;
    }
    host s_opt2_0 {
            hardware ethernet 00:11:43:3c:02:08;
            fixed-address 192.168.3.101;
    }
    host s_opt2_1 {
            hardware ethernet 00:11:43:3d:22:fd;
            fixed-address 192.168.3.102;
    }

    Here's the dhcpd.conf from pfsense-b:

    option domain-name "my.domain.com";
    default-lease-time 7200;
    max-lease-time 86400;
    authoritative;
    log-facility local7;
    ddns-update-style none;
    one-lease-per-client true;
    deny duplicates;
    failover peer "dhcp0" {
      secondary;
      address 192.168.1.3;
      port 520;
      peer address 192.168.1.2;
      peer port 519;
      max-response-delay 10;
      max-unacked-updates 10;
      mclt 600;

    load balance max seconds 3;
    }
    failover peer "dhcp1" {
      secondary;
      address 192.168.3.3;
      port 520;
      peer address 192.168.3.2;
      peer port 519;
      max-response-delay 10;
      max-unacked-updates 10;
      mclt 600;

    load balance max seconds 3;
    }
    subnet 192.168.1.0 netmask 255.255.255.0 {
            pool {
                    option domain-name-servers 192.168.1.9,192.168.1.10;
                    deny dynamic bootp clients;
                    failover peer "dhcp0";
                    range 192.168.1.11 192.168.1.180;
            }
            option routers 192.168.1.1;
            option domain-name-servers 192.168.1.9,192.168.1.10;
    }
    host s_lan_0 {
            hardware ethernet 00:17:f2:c2:5d:24;
            fixed-address 192.168.1.181;
    }
    subnet 192.168.3.0 netmask 255.255.255.0 {
            pool {
                    option domain-name-servers 1.2.3.4,1.2.3.5;
                    deny dynamic bootp clients;
                    deny unknown clients;
                    failover peer "dhcp1";
                    range 192.168.3.101 192.168.3.102;
            }
            option routers 192.168.3.1;
            option domain-name-servers 1.2.3.4,1.2.3.5;
    }
    host s_opt2_0 {
            hardware ethernet 00:11:43:3c:02:08;
            fixed-address 192.168.3.101;
    }
    host s_opt2_1 {
            hardware ethernet 00:11:43:3d:22:fd;
            fixed-address 192.168.3.102;
    }

    When I restart DHCPd on pfsense-b, I see the following in the DHCP logs on pfsense-a:

    Nov 18 12:59:19 dhcpd: failover: connect: no matching state.

    and on pfsense-b:

    Nov 18 12:59:05 dhcpd: failover peer dhcp1: I move from recover to startup
    Nov 18 12:59:20 dhcpd: failover peer dhcp1: I move from startup to recover

    Any ideas about what's going on?

    Thanks!
    Martín



  • I have the same problem. If you check the firewall logs, you'll most likely see something similar:

    
    - To clarify, these are blocked
    
    	Nov 22 23:12:57	CARP2	192.168.50.1:519	192.168.50.254:51125	TCP	
    	Nov 22 23:12:57	CARP3	192.168.7.1:519	192.168.7.254:59893	TCP	
    	Nov 22 23:12:57	CARP4	192.168.8.1:519	192.168.8.254:64975	TCP	
    	Nov 22 23:12:57	CARP5	192.168.9.1:519	192.168.9.254:52301	TCP	
    	Nov 22 23:12:57	CARP6	10.0.2.1:519	10.0.2.254:63893	TCP
    
    

    For me, I've created rules on each interface that allows all firewalls to connect to each other:

    
     TCP 	 hFirewalls 	 * 	 hFirewalls 	 * 	 * 	   	 For DHCP Failovers  
    
    

    hFirewall includes all firewall ips for all interfaces, as well as the CARP interface IPs. Like you, dhcp0 is in a weird state:

    
    Primary:
    "dhcp0" 	recover-wait 	2008/11/22 22:46:33 	recover-wait 	2008/11/22 22:46:33 
    Secondary: 
    "dhcp0" 	recover-wait 	2008/11/22 22:43:01 	recover-wait 	2008/11/22 22:43:01 
    
    

    Unfortunately no matter what I do, I cannot get pf to not block these requests. Any takers?



  • Hi jewps,

    I don't see any evidence of the firewall blocking DHCP traffic.  And I'm not seeing the same sates as you.  I see My state "recover" and Peer state "unknown-state" on each node.

    I suspect that we have different root causes, despite the fact that some symptoms appear to be similar.

    Best,
    Martín



  • For the other subnets, I had the "recover" and "unknown-state" issue. dhcp0 is the only one that seems to be some-what working. Unfortunately I no longer have those screenshots I was going to post here and I disabled DHCP failover on these two boxes ever since my previous posting. It just wouldn't work regardless.

    We may have different root causes but after doing some searching (mailing list and forums), it indicates others have had similar problems with DHCP failover.

    Maybe I'll send an email to the mailing list later to see if there is a better response there.

    Thanks for the follow up Martin.

    PS, I didn't mean to jack your thead, I figure the problems are similar enough and if we find that the root causes are different, I'll delete my posts and move it on to a new thread. Let me know if you like me to do that anyways.



  • I have been having the same problem, dhcp0 works, but dhcp1 is constantly in recover / unknown state. There does not seem to be anything interesting in the dhcp or firewall logs either.



  • Almost 3 years later and I think I've figured out the problem.

    It seems that you only specify the "Failover peer IP" on the first interface which will run DHCP server - not on any other interface.  In my case the first interface is the LAN, I'm not sure how pfSense decides which is the "first."

    In the correct setup, I have one Failover Group - "dhcp0" and it contains addresses from 192.168.1.0 and 192.168.3.0.

    I hope this helps someone else!

    Best,
    Martín


Log in to reply