Failover DHCP both primary



  • I'm seeing strange behavior from DHCP on a CARP cluster. Failover is working fine, but under the DHCP server, both say My State=recover, Peer State=unknown-state.
    Checking the /var/dhcpd/etc/dhcpd.conf shows both as primary on port 519
    Boxes were originally running RC3, but I updated both of them to RC5 to see if it helped.
    The interfaces look correct:
    primary
    fxp2: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
            options=8 <vlan_mtu>inet 10.20.1.2 netmask 0xffff0000 broadcast 10.20.255.255
            media: Ethernet autoselect (100baseTX <full-duplex>)
            status: active

    carp0: flags=49 <up,loopback,running>mtu 1500
            inet 10.20.1.1 netmask 0xffff0000
            carp: MASTER vhid 1 advbase 1 advskew 0

    secondary
    fxp2: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
            options=8 <vlan_mtu>inet 10.20.1.3 netmask 0xffff0000 broadcast 10.20.255.255
            media: Ethernet autoselect (100baseTX <full-duplex>)
            status: active

    carp0: flags=49 <up,loopback,running>mtu 1500
            inet 10.20.1.1 netmask 0xffff0000
            carp: BACKUP vhid 1 advbase 1 advskew 100

    DHCP on both nodes is configured the same, except for the Failover Peer IP
    Range of 20.1-21.254
    Gateway and DNS point to the CARP IP (.1)
    Peer on primary is .3, peer on secondary is .2
    Any thoughts? It appears something is going wrong in the services.inc check, but I can't figure out what.</up,loopback,running></full-duplex></vlan_mtu></up,broadcast,running,promisc,simplex,multicast></up,loopback,running></full-duplex></vlan_mtu></up,broadcast,running,promisc,simplex,multicast>



  • I have seen something strange regarding failover dhcp and carp too this week. As long as both machines are up it's working but once the backup machine is down the master won't hand out IPs anymore as it say "other peer holds all leases" in the logs. Haven't had time to do some more investigation yet but there indeed seems to be some glitch.



  • On my setup the services.inc check seems to be getting the wrong skew. I hacked the services.inc on the backup machine to get it to set DHCP as secondary. An ugly hack, but it did get DHCP working again.



  • Not following you… Please show us a diff of your changes.



  • I just changed the code on the backup so it would always set the config as secondary. services.orig is the original file:

    diff -rub services.orig services.inc

    –- services.orig      Fri Feb 22 14:07:18 2008
    +++ services.inc        Fri Feb 22 14:08:44 2008
    @@ -146,7 +146,7 @@
                                            if($int == $real_dhcpif) {
                                                    /* this is the interface! */
                                                    if($vipent['advskew'] < "20")
    -                                                      $skew = 0;
    +                                                      $skew = 90;
                                            }
                                    }
                            } else {



  • @dotdash:

    I just changed the code on the backup so it would always set the config as secondary. services.orig is the original file:

    diff -rub services.orig services.inc

    –- services.orig      Fri Feb 22 14:07:18 2008
    +++ services.inc        Fri Feb 22 14:08:44 2008
    @@ -146,7 +146,7 @@
                                            if($int == $real_dhcpif) {
                                                    /* this is the interface! */
                                                    if($vipent['advskew'] < "20")
    -                                                      $skew = 0;
    +                                                      $skew = 90;
                                            }
                                    }
                            } else {

    That is handled behind the scenes by the CARP sync code.  Your changes are wrong and WILL be overwrote by the next upgrade.

    I suggest you spend some time on the CARP setup document @ doc.pfsense.org and do it correctly instead of this UGLY hack.



  • I know it's an ugly hack, but it was the only way I could get DHCP working. (Other than just running it on the master and having dhcp unavailable when the master was down) I've checked the docs, but can't find where I've gone wrong. The failover actually seems to be running fine. The master shows skew of 0 and the backup shows 100, which seems to be correct.



  • Set this one up in the lab and banged on it (I was pretty sure I wasn't just doing it wrong).
    Here's what I found:
    The advskew check only goes wrong when there are type 'other' VIPs on the box.
    I have several 'other' VIPs on the WAN, and when I deleted these, the DHCP config on the backup unit correctly set itself to secondary.
    This is however, a bad solution for the production system, as the 'other' VIPs are used to failover IPs that are on additional subnets assigned to the customer (that's the subject of another rambling post).


Locked