Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    DHCP Failover with CARP - Both in Recover, Peer Unknown State

    Scheduled Pinned Locked Moved DHCP and DNS
    26 Posts 8 Posters 29.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J Offline
      JoshW
      last edited by

      Apparently the dhcp status interface somewhat flaky.

      The DHCP leases status appears to update slowly/is incorrect.  Look at the logs instead, as those appear to be correct.

      The service status of dhcpd is also incorrect.  The stop and start buttons DO work, but the dhcpd service is always shown as started.  Use top from a shell and show user dhcpd to see this behavior.

      I had to reboot the firewalls after changing dhcpd settings to get things to work correctly.  Also, I noticed that the failover IP address must be on the same subnet as the pool or the code will set both servers to secondary.

      Check your configs in /var/dhcpd/etc/dhcpd.conf

      1 Reply Last reply Reply Quote 0
      • A Offline
        acherman
        last edited by

        Thanks for your input Josh.  I have not been able to get this working.  I do have the time between these machines synced now, and as you say the dhcpd status never changes.  haha  I have rebooted these boxes a number of times - even made the changes to dhcp on each box and rebooted the primary, then the secondary a few seconds later (reboots take the same time) to make the primary come online first.  Same issue right after boot.

        My addressing is okay - boxes are 10.61.32.250/24 and 251, CARP address is 254 (for this testing interface), pool is from 10 to 20.  dhcpd.conf snip:

        failover peer "dhcp0" {
          primary;
          address 10.61.32.250;
          port 519;
          peer address 10.61.32.251;
          peer port 520;
          max-response-delay 10;
          max-unacked-updates 10;
          split 128;
          mclt 600;
        |
        |
        |
        |
        subnet 10.61.32.0 netmask 255.255.252.0 {
        	pool {
        		option domain-name-servers 10.61.32.254;
        		deny dynamic bootp clients;
        		failover peer "dhcp0";
        		range 10.61.32.10 10.61.32.20;
        	}
        	option routers 10.61.32.254;
        	option domain-name-servers 10.61.32.254;
        }
        
        1 Reply Last reply Reply Quote 0
        • J Offline
          JoshW
          last edited by

          Check the dhcpd log files on both ends to see if dhcpd is complaining about anything.

          1 Reply Last reply Reply Quote 0
          • A Offline
            acherman
            last edited by

            The only bad log entries I see are like this:

            dhcpd: failover peer dhcp0: I move from recover to startup
            dhcpd: failover peer dhcp0: I move from startup to recover

            And when a request is made I see entries like this:

            dhcpd: DHCPREQUEST for 10.61.32.20 from 00:0b:db:7e:8e:5d via em1: not responding (recovering)

            1 Reply Last reply Reply Quote 0
            • R Offline
              richardsc
              last edited by

              Just to bump this thread back up, as I've been facing the same issues as noted here in this thread.

              My setup:

              • 2x pfsense boxes doing CARP on 4 separate vlans.
              • DHCP configured on each VLAN

              Enabling failover DHCP, I would just get the same log messages as posted by acherman … and then dhcpd will not hand out leases while it's in the recover state!

              What I ended up finding:

              • check your dhcpd.conf file (/var/dhcpd/etc/dhcpd.conf) on your secondary pfsense server. I found that it was not properly receiving the "secondary" designation in the failover section.

              It seems this designation is assigned when the service is started / config is generated by the file /etc/inc/services.inc in the section beginning at line 139.

              I've not yet analyzed the code to try and figure out if there's a bug here … I think that there may be an issue with how the $skew value is being determined.

              I needed to get this working ASAP, so on my secondary firewall, I've just forced it to always be a secondary by modifying line 156 to be: $type = "secondary"; (it was always being set to primary, even though it shouldn't be…)

              Finally, I had to manually kill dhcpd on each box, remove the dhcpd.leases file on both, and then start dhcpd on the primary, then the secondary. After about 5 minutes, DHCP leases status was "normal", and now they've been running fine for several hours, after doling out nearly 100 leases.

              1 Reply Last reply Reply Quote 0
              • dotdashD Offline
                dotdash
                last edited by

                Question for richardsc- do you have any 'other' type VIPs? I had an issue like this ages back, and it was due to the other VIPs throwing off the master/backup check. I also used a cheap hack to fix the issue. The problem went away when I only had CARP VIPs.

                1 Reply Last reply Reply Quote 0
                • R Offline
                  richardsc
                  last edited by

                  @dotdash:

                  Question for richardsc- do you have any 'other' type VIPs? I had an issue like this ages back, and it was due to the other VIPs throwing off the master/backup check. I also used a cheap hack to fix the issue. The problem went away when I only had CARP VIPs.

                  nope. I only have CARP virtual VIP's.

                  If I can find time this week, I'm going to try and investigate further to find the root cause of the problem.

                  1 Reply Last reply Reply Quote 0
                  • A Offline
                    acherman
                    last edited by

                    Just for fun I upgraded both boxes to 1.2.3 RC3 today and tried this again.  I still can not get it to work properly.  I may resort to the mod mentioned above to get this working.

                    1 Reply Last reply Reply Quote 0
                    • A Offline
                      acherman
                      last edited by

                      :(  Still no go.  Ii can not get dhcp failover working.  I have accepted the fact that it is broken and I will have to manually start dhcp on the backup unit during a failure.  :'(

                      1 Reply Last reply Reply Quote 0
                      • dotdashD Offline
                        dotdash
                        last edited by

                        Check the dhcpd.conf on both boxes and verify the main is set to primary and the backup is set to secondary.

                        1 Reply Last reply Reply Quote 0
                        • E Offline
                          Eugene
                          last edited by

                          Today I tried to set it up and hit the same problem, quick tcpdump showed how it can be fixed. I've just enabled TCP ports 519 and 520 from LAN net to LAN Interface (this rule will be replicated to passive one), restarted dhcpd on Active one and that is it. It is working properly.

                          http://ru.doc.pfsense.org

                          1 Reply Last reply Reply Quote 0
                          • B Offline
                            blackb1rd
                            last edited by

                            Also got problems getting this to work with pfSense 2.0 snapshots May 9th and May 11th. After changing the line in services.inc (and removed another one) as mentioned by richard, it worked for me. Somehow the skew counter isn't working correctly, not sure how this exactly works, but I know both routers have the exact same time and timezone set. Seems to me there is some kind of bug.

                            1 Reply Last reply Reply Quote 0
                            • I Offline
                              itsmorefun
                              last edited by

                              Same issue with "2.0-BETA4 built on Mon Aug 2 21:49:34 EDT 2010 FreeBSD 8.1-RELEASE"

                              Any have dhcp-failover working?

                              Thank

                              1 Reply Last reply Reply Quote 0
                              • jimpJ Offline
                                jimp Rebel Alliance Developer Netgate
                                last edited by

                                It works fine if you have valid configurations, the problem is that certain invalid configurations can trick the logic to make it not work.

                                The usual reason is that someone is using Proxy ARP VIPs which sync to the secondary as empty, which triggers a bug in the dhcp server logic that makes it think it's primary when it's not. I thought I committed a fix for that a week or two ago.

                                If you still have the bug, I need copies of /var/dhcpd/etc/dhcpd.conf from the primary and secondary, along with at least the <virtualip>section of the primary and secondary config.xml files.

                                The "skew" on the VIPs is used to trigger the logic for slave, so if you have manually set the skew on the secondary to less than 20, that would also break it.</virtualip>

                                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                Need help fast? Netgate Global Support!

                                Do not Chat/PM for help!

                                1 Reply Last reply Reply Quote 0
                                • I Offline
                                  itsmorefun
                                  last edited by

                                  Hello,
                                  @jimp:

                                  If you still have the bug, I need copies of /var/dhcpd/etc/dhcpd.conf from the primary and secondary, along with at least the <virtualip>section of the primary and secondary config.xml files.</virtualip>

                                  pfSense LEFT dhcpd.conf:

                                  option domain-name "localdomain";
                                  option ldap-server code 95 = text;
                                  option domain-search-list code 119 = text;

                                  default-lease-time 7200;
                                  max-lease-time 86400;
                                  log-facility local7;
                                  ddns-update-style none;
                                  one-lease-per-client true;
                                  deny duplicates;
                                  ping-check true;
                                  authoritative;
                                  failover peer "dhcp0" {
                                   primary;
                                   address 192.168.3.1;
                                   port 519;
                                   peer address 192.168.3.2;
                                   peer port 520;
                                   max-response-delay 10;
                                   max-unacked-updates 10;
                                   split 128;
                                   mclt 600;

                                  load balance max seconds 3;
                                  }
                                  authoritative;
                                  failover peer "dhcp1" {
                                   primary;
                                   address 192.168.4.1;
                                   port 519;
                                   peer address 192.168.4.2;
                                   peer port 520;
                                   max-response-delay 10;
                                   max-unacked-updates 10;
                                   split 128;
                                   mclt 600;

                                  load balance max seconds 3;
                                  }
                                  subnet 192.168.3.0 netmask 255.255.255.0 {
                                  pool {
                                  option domain-name-servers 192.168.3.10;
                                  deny dynamic bootp clients;
                                  failover peer "dhcp0";
                                  range 192.168.3.100 192.168.3.199;
                                  }
                                  option routers 192.168.3.10;
                                  option domain-name-servers 192.168.3.10;

                                  }
                                  subnet 192.168.4.0 netmask 255.255.255.0 {
                                  pool {
                                  option domain-name-servers 192.168.4.10;
                                  deny dynamic bootp clients;
                                  failover peer "dhcp1";
                                  range 192.168.4.100 192.168.4.199;
                                  }
                                  option routers 192.168.4.10;
                                  option domain-name-servers 192.168.4.10;

                                  }

                                  pfSense RIGHT dhcpd.conf:

                                  option domain-name "localdomain";
                                  option ldap-server code 95 = text;
                                  option domain-search-list code 119 = text;

                                  default-lease-time 7200;
                                  max-lease-time 86400;
                                  log-facility local7;
                                  ddns-update-style none;
                                  one-lease-per-client true;
                                  deny duplicates;
                                  ping-check true;
                                  authoritative;
                                  failover peer "dhcp0" {
                                   secondary;
                                   address 192.168.3.2;
                                   port 520;
                                   peer address 192.168.3.1;
                                   peer port 519;
                                   max-response-delay 10;
                                   max-unacked-updates 10;
                                   mclt 600;

                                  load balance max seconds 3;
                                  }
                                  authoritative;
                                  failover peer "dhcp1" {
                                   secondary;
                                   address 192.168.4.2;
                                   port 520;
                                   peer address 192.168.4.1;
                                   peer port 519;
                                   max-response-delay 10;
                                   max-unacked-updates 10;
                                   mclt 600;

                                  load balance max seconds 3;
                                  }
                                  subnet 192.168.3.0 netmask 255.255.255.0 {
                                  pool {
                                  option domain-name-servers 192.168.3.10;
                                  deny dynamic bootp clients;
                                  failover peer "dhcp0";
                                  range 192.168.3.100 192.168.3.199;
                                  }
                                  option routers 192.168.3.10;
                                  option domain-name-servers 192.168.3.10;

                                  }
                                  subnet 192.168.4.0 netmask 255.255.255.0 {
                                  pool {
                                  option domain-name-servers 192.168.4.10;
                                  deny dynamic bootp clients;
                                  failover peer "dhcp1";
                                  range 192.168.4.100 192.168.4.199;
                                  }
                                  option routers 192.168.4.10;
                                  option domain-name-servers 192.168.4.10;

                                  }

                                  pfSense LEFT config.xml:

                                  <virtualip><vip><vip><mode>carp</mode>
                                  <interface>wan</interface>
                                  <vhid>1</vhid>
                                  <advskew>0</advskew>
                                  <password>wanpass</password>
                                  <descr><type>single</type>
                                  <subnet_bits>24</subnet_bits>
                                  <subnet>192.168.1.50</subnet></descr></vip>
                                  <vip><vip><mode>carp</mode>
                                  <interface>lan</interface>
                                  <vhid>2</vhid>
                                  <advskew>0</advskew>
                                  <password>lanpass</password>
                                  <descr><type>single</type>
                                  <subnet_bits>24</subnet_bits>
                                  <subnet>192.168.3.10</subnet></descr></vip>
                                  <vip><vip><mode>carp</mode>
                                  <interface>opt2</interface>
                                  <vhid>3</vhid>
                                  <advskew>0</advskew>
                                  <password>wifipass</password>
                                  <descr><type>single</type>
                                  <subnet_bits>24</subnet_bits>
                                  <subnet>192.168.4.10</subnet></descr></vip></vip></vip></vip></virtualip>

                                  pfSense RIGHT config.xml:

                                  <virtualip><vip><vip><mode>carp</mode>
                                  <interface>wan</interface>
                                  <vhid>1</vhid>
                                  <advskew>100</advskew>
                                  <password>wanpass</password>
                                  <descr><type>single</type>
                                  <subnet_bits>24</subnet_bits>
                                  <subnet>192.168.1.50</subnet></descr></vip>
                                  <vip><vip><mode>carp</mode>
                                  <interface>lan</interface>
                                  <vhid>2</vhid>
                                  <advskew>100</advskew>
                                  <password>lanpass</password>
                                  <descr><type>single</type>
                                  <subnet_bits>24</subnet_bits>
                                  <subnet>192.168.3.10</subnet></descr></vip>
                                  <vip><vip><mode>carp</mode>
                                  <interface>opt2</interface>
                                  <vhid>3</vhid>
                                  <advskew>100</advskew>
                                  <password>wifipass</password>
                                  <descr><type>single</type>
                                  <subnet_bits>24</subnet_bits>
                                  <subnet>192.168.4.10</subnet></descr></vip></vip></vip></vip></virtualip>

                                  1 Reply Last reply Reply Quote 0
                                  • jimpJ Offline
                                    jimp Rebel Alliance Developer Netgate
                                    last edited by

                                    @itsmorefun:

                                    editing…

                                    Those came through in e-mail before you edited them out, and it looks like you might have hit a bug that I fixed the other day that made them both show up as secondary instead of primary, but that shouldn't have made them in recover/peer-known state, but both in communications-interrupted state. Should be OK in current snapshots though.

                                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                    Need help fast? Netgate Global Support!

                                    Do not Chat/PM for help!

                                    1 Reply Last reply Reply Quote 0
                                    • I Offline
                                      itsmorefun
                                      last edited by

                                      @jimp:

                                      @itsmorefun:

                                      editing…

                                      Those came through in e-mail before you edited them out, and it looks like you might have hit a bug that I fixed the other day that made them both show up as secondary instead of primary, but that shouldn't have made them in recover/peer-known state, but both in communications-interrupted state. Should be OK in current snapshots though.

                                      Ok,

                                      Sorry my pfsense crashed… I am retesting :-).

                                      1 Reply Last reply Reply Quote 0
                                      • I Offline
                                        itsmorefun
                                        last edited by

                                        All work now.

                                        Thank

                                        1 Reply Last reply Reply Quote 0
                                        • First post
                                          Last post
                                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.