CARP DHCP Failover in VLANs



  • Hi all
    I'm struggling with my CARP & DHCP Failover setup. We are using two VLANs on one interface. CARP in general is running but as soon as I enable DHCP failover for the both two VLANs both or one of them are always in state recover-wait / Partner-down (without going to normal / normal). As soon as I disable DHCP failover for one of them everything works (doesn't matter which one).

    My configuration:

    PUBLIC_VLAN:
    172.17.0.0
    Carp IP 172.17.0.1
    IP FW 1: 172.17.0.253
    IP FW 2: 172.17.0.254

    OFFICE_VLAN:
    192.168.13.0
    Carp IP: 192.168.13.1
    IP FW 1: 192.168.13.253
    IP FW 2: 192.168.13.254

    Leads to:

    dhcp_opt2 (PUBLIC_VLAN)  normal  2014/11/04 21:03:22  normal  2014/11/04 21:03:17   
    dhcp_opt3 (OFFICE_VLAN)  recover-wait  2014/11/04 21:03:16  partner-down  2014/11/04 20:23:00

    My DHCP Settings (only on part, the other is the same except the IPs):

    	 <dhcpd><opt2><range><from>172.17.0.10</from>
    				<to>172.17.0.250</to></range> 
    			<failover_peerip>172.17.0.254</failover_peerip>
    			 <dhcpleaseinlocaltime><defaultleasetime>14400</defaultleasetime>
    			 <maxleasetime><netmask><gateway>172.17.0.1</gateway>
    			 <domain><domainsearchlist><ddnsdomain><mac_allow><mac_deny><tftp><ldap><nextserver><filename><rootpath><numberoptions><enable><dnsserver>172.17.0.1</dnsserver></enable></numberoptions></rootpath></filename></nextserver></ldap></tftp></mac_deny></mac_allow></ddnsdomain></domainsearchlist></domain></netmask></maxleasetime></dhcpleaseinlocaltime></opt2> 
    		......</dhcpd>
    

    I searched the forum and checked in google. Every configuration I've found is using one single failover peer also for multiple subnets. In PFsense there is one peer for each DHCP pool. Is this correct and can it work this way or am I doing something wrong? I also tried to set the same failover peer for both dhcp pools, this seems to work but is of course overwritten by pfsense on each restart of the services. I'm really confused now because I tried so many things without success (expect using failover only for one of the servers).

    Has anybody out there a simillar setup running (multiple VLANs with CARP and DHCP failover?)

    Thanks,

    Zueri

    EDIT: Added config



  • Is nobody here using CARP, DHCP and VLANS together? Or just without problems? Or did I post in the wrong froum (still not sure if it belongse here or in the DHCP section). Any help would be good. Becuas this is the only point which is not working in my setup.

    Thanks



  • I'm using DHCP failover with CARP, and I'm also experiencing this issue. Never managed to resolve it. For me it worked sometime before I upgraded to 2.2-BETA.
    https://forum.pfsense.org/index.php?topic=81948.0

    CARP is functioning perfectly but not the DHCP failover.



  • CARP + DHCP failover + VLANs is very common. The original scenario sounds like the DHCP server can't talk between the primary and secondary on the office VLAN. Firewall rules blocking it, lack of any network connectivity between them on that VLAN (usually missing tag somewhere), or similar.



  • @cmb:

    CARP + DHCP failover + VLANs is very common. The original scenario sounds like the DHCP server can't talk between the primary and secondary on the office VLAN. Firewall rules blocking it, lack of any network connectivity between them on that VLAN (usually missing tag somewhere), or similar.

    To be short, and not hijack the thread: what firewall rules between the pfSenses are neccessary for DHCPD failover to work? I did already try to allow all traffic between them on the interfaces with DHCP server enabled, but nice to know what should be enough.



  • Secondary has to be able to reach TCP 519 on the primary, primary has to reach TCP 520 on the secondary.



  • Small update from me, I think I found the reason why it's somewhat buggy in my setup. I have updated my topic: https://forum.pfsense.org/index.php?topic=81948.0



  • Zueri,

    are you using HA Configuration Synchronization for DHCPd? I had similar problems I can't explain before I enabled it. I only turned on DHCP failover for a VLAN initially. /var/dhcpd/etc/dhcpd.conf looked fine on both firewalls despite problems.



  • OK I'm a bit further. Thanks to cmb who pointed my to the right direction regarding FW rules. I had a typo in the rules.

    Now everything seems to be working. But I still face the problem that the restart of the DHCP services is not really handled correctly. I use HA Sync for DHCP config. As soon as I change something on the master, the config is synced correctly. Afterwards the services are restarting. But the will never come out of the partner-down recover state.

    I can force the two servers to go to normal normal the way is stop both services. Then start the master and after spome seconds the secondary.

    The working but manual startup looks like this:

    Nov 15 12:08:33 	dhcpd: Wrote 0 deleted host decls to leases file.
    Nov 15 12:08:33 	dhcpd: Wrote 0 new dynamic host decls to leases file.
    Nov 15 12:08:33 	dhcpd: Wrote 189 leases to leases file.
    Nov 15 12:08:33 	dhcpd: Listening on BPF/re1_vlan10/00:0d:b9:--:--:--/192.168.13.0/24
    Nov 15 12:08:33 	dhcpd: Sending on BPF/re1_vlan10/00:0d:b9:--:--:--/192.168.13.0/24
    Nov 15 12:08:33 	dhcpd: Listening on BPF/re1_vlan2/00:0d:b9:--:--:--/172.17.0.0/24
    Nov 15 12:08:33 	dhcpd: Sending on BPF/re1_vlan2/00:0d:b9:--:--:--/172.17.0.0/24
    Nov 15 12:08:33 	dhcpd: Sending on Socket/fallback/fallback-net
    Nov 15 12:08:33 	dhcpd: failover peer dhcp_opt3: I move from partner-down to startup
    Nov 15 12:08:33 	dhcpd: failover peer dhcp_opt2: I move from partner-down to startup
    Nov 15 12:08:48 	dhcpd: failover peer dhcp_opt3: I move from startup to partner-down
    Nov 15 12:08:48 	dhcpd: failover peer dhcp_opt2: I move from startup to partner-down
    Nov 15 12:09:03 	dhcpd: failover peer dhcp_opt3: peer moves from recover to recover
    Nov 15 12:09:03 	dhcpd: failover peer dhcp_opt2: peer moves from recover to recover
    Nov 15 12:09:03 	dhcpd: failover peer dhcp_opt3: peer moves from recover to recover
    Nov 15 12:09:03 	dhcpd: Update request all from dhcp_opt3: sending update
    Nov 15 12:09:03 	dhcpd: failover peer dhcp_opt2: peer moves from recover to recover
    Nov 15 12:09:03 	dhcpd: Update request all from dhcp_opt2: sending update
    Nov 15 12:09:05 	dhcpd: Sent update done message to dhcp_opt3
    Nov 15 12:09:05 	dhcpd: failover peer dhcp_opt3: peer moves from recover to recover-done
    Nov 15 12:09:05 	dhcpd: failover peer dhcp_opt3: I move from partner-down to normal
    Nov 15 12:09:05 	dhcpd: balancing pool 801297780 192.168.13.0/24 total 176 free 100 backup 69 lts 15 max-own (+/-)17
    Nov 15 12:09:05 	dhcpd: balanced pool 801297780 192.168.13.0/24 total 176 free 85 backup 84 lts 0 max-misbal 25
    Nov 15 12:09:05 	dhcpd: Sending updates to dhcp_opt3.
    Nov 15 12:09:05 	dhcpd: Sent update done message to dhcp_opt2
    Nov 15 12:09:05 	dhcpd: failover peer dhcp_opt3: peer moves from recover-done to normal
    Nov 15 12:09:05 	dhcpd: failover peer dhcp_opt2: peer moves from recover to recover-done
    Nov 15 12:09:05 	dhcpd: failover peer dhcp_opt2: I move from partner-down to normal
    Nov 15 12:09:05 	dhcpd: balancing pool 801297600 172.17.0.0/24 total 201 free 97 backup 96 lts 0 max-own (+/-)19
    Nov 15 12:09:05 	dhcpd: balanced pool 801297600 172.17.0.0/24 total 201 free 97 backup 96 lts 0 max-misbal 29
    Nov 15 12:09:05 	dhcpd: failover peer dhcp_opt2: peer moves from recover-done to normal
    

    This time I've changed only settings of one of the DHCP servers. The settings are synced but only the office_vlan DHCP goes back to normal, the other one stays in recover mode and will never recover.

    Nov 15 12:25:21 	dhcpd: failover peer dhcp_opt2: I move from partner-down to startup
    Nov 15 12:25:21 	dhcpd: failover peer dhcp_opt3: I move from startup to partner-down
    Nov 15 12:25:21 	dhcpd: failover peer dhcp_opt2: I move from startup to partner-down
    Nov 15 12:25:22 	dhcpd: failover peer dhcp_opt3: peer moves from recover to recover
    Nov 15 12:25:22 	dhcpd: Update request all from dhcp_opt3: sending update
    Nov 15 12:25:22 	dhcpd: failover peer dhcp_opt2: peer moves from recover to recover
    Nov 15 12:25:22 	dhcpd: Update request all from dhcp_opt2: sending update
    Nov 15 12:25:26 	dhcpd: Sent update done message to dhcp_opt3
    Nov 15 12:25:26 	dhcpd: Sent update done message to dhcp_opt2
    Nov 15 12:25:26 	dhcpd: failover peer dhcp_opt3: peer moves from recover to recover-done
    Nov 15 12:25:26 	dhcpd: failover peer dhcp_opt3: I move from partner-down to normal
    Nov 15 12:25:26 	dhcpd: balancing pool 801297780 192.168.13.0/24 total 176 free 85 backup 84 lts 0 max-own (+/-)17
    Nov 15 12:25:26 	dhcpd: balanced pool 801297780 192.168.13.0/24 total 176 free 85 backup 84 lts 0 max-misbal 25
    Nov 15 12:25:26 	dhcpd: failover peer dhcp_opt2: peer moves from recover to recover-done
    Nov 15 12:25:26 	dhcpd: failover peer dhcp_opt2: I move from partner-down to normal
    Nov 15 12:25:26 	dhcpd: balancing pool 801297600 172.17.0.0/24 total 201 free 97 backup 96 lts 0 max-own (+/-)19
    Nov 15 12:25:26 	dhcpd: balanced pool 801297600 172.17.0.0/24 total 201 free 97 backup 96 lts 0 max-misbal 29
    Nov 15 12:25:26 	dhcpd: failover peer dhcp_opt3: peer moves from recover-done to normal
    Nov 15 12:25:26 	dhcpd: failover peer dhcp_opt2: peer moves from recover-done to normal
    Nov 15 12:25:29 	dhcpd: failover peer dhcp_opt3: peer moves from normal to shutdown
    Nov 15 12:25:29 	dhcpd: failover peer dhcp_opt3: I move from normal to partner-down
    Nov 15 12:25:29 	dhcpd: failover peer dhcp_opt2: peer moves from normal to shutdown
    Nov 15 12:25:29 	dhcpd: failover peer dhcp_opt2: I move from normal to partner-down
    Nov 15 12:25:31 	dhcpd: peer dhcp_opt3: disconnected
    Nov 15 12:25:31 	dhcpd: peer dhcp_opt2: disconnected
    Nov 15 12:25:33 	dhcpd: failover peer dhcp_opt3: peer moves from shutdown to recover
    Nov 15 12:25:33 	dhcpd: failover peer dhcp_opt2: peer moves from shutdown to recover
    Nov 15 12:25:33 	dhcpd: failover peer dhcp_opt3: peer moves from recover to recover
    Nov 15 12:25:33 	dhcpd: Update request all from dhcp_opt3: sending update
    Nov 15 12:25:33 	dhcpd: failover peer dhcp_opt2: peer moves from recover to recover
    Nov 15 12:25:33 	dhcpd: Update request all from dhcp_opt2: sending update
    Nov 15 12:25:35 	dhcpd: peer dhcp_opt3: disconnected
    Nov 15 12:25:35 	dhcpd: peer dhcp_opt2: disconnected
    Nov 15 12:25:38 	dhcpd: failover peer dhcp_opt3: peer moves from recover to recover
    Nov 15 12:25:38 	dhcpd: failover peer dhcp_opt2: peer moves from recover to recover
    Nov 15 12:25:38 	dhcpd: failover peer dhcp_opt3: peer moves from recover to recover
    Nov 15 12:25:38 	dhcpd: Received update request while old update still flying! Silently discarding old request.
    Nov 15 12:25:38 	dhcpd: Update request all from dhcp_opt3: sending update
    Nov 15 12:25:38 	dhcpd: failover peer dhcp_opt2: peer moves from recover to recover
    Nov 15 12:25:38 	dhcpd: Received update request while old update still flying! Silently discarding old request.
    Nov 15 12:25:38 	dhcpd: Update request all from dhcp_opt2: sending update
    Nov 15 12:25:40 	dhcpd: Sent update done message to dhcp_opt3
    Nov 15 12:25:40 	dhcpd: failover peer dhcp_opt3: peer moves from recover to recover-done
    Nov 15 12:25:40 	dhcpd: failover peer dhcp_opt3: I move from partner-down to normal
    Nov 15 12:25:40 	dhcpd: balancing pool 801297780 192.168.13.0/24 total 176 free 85 backup 84 lts 0 max-own (+/-)17
    Nov 15 12:25:40 	dhcpd: balanced pool 801297780 192.168.13.0/24 total 176 free 85 backup 84 lts 0 max-misbal 25
    Nov 15 12:25:40 	dhcpd: Sending updates to dhcp_opt3.
    Nov 15 12:25:40 	dhcpd: failover peer dhcp_opt3: peer moves from recover-done to normal
    

    Any idea? Am I still missing something?




  • The issue you described is caused in some circumstances with many instances. Pre-2.2, it restarts dhcpd 2-3 times on the secondary after syncing the config, which triggers some bug in ISC dhcpd that does exactly as you describe. That's fixed in 2.2.