Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    DHCP failover not working

    CE 2.7.0 Development Snapshots (Retired)
    2
    11
    1.5k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      sef1414
      last edited by

      I've just set up a second instance of pfsense for HA. The pair is syncing, but I'm running into issues with DHCP.

      I've followed the guide for inputting the CARP VIP in the DNS Server / Gateway fields for the DHCP server, and inputting the peer failover IP.

      On the DHCP Pool Status page, interfaces reflect "recover" in the "My State" column, and "unknown-state" in the "Peer State" column.

      I've gone through this troubleshooting list and performed each step without any improvement:

      https://docs.netgate.com/pfsense/en/latest/troubleshooting/ha-dhcp-failover.html

      DHCP logs indicate an issue but don't provide much detail:

      Jan 25 06:38:08	dhcpd	95441	failover peer dhcp_opt2: I move from startup to recover
      Jan 25 06:37:53	dhcpd	95441	failover peer dhcp_opt2: host down
      Jan 25 06:37:53	dhcpd	95441	failover peer dhcp_opt2: I move from recover to startup
      

      I am running the same development snapshot (2.7.0-DEVELOPMENT (amd64)
      built on Fri Jan 20 03:01:02 UTC 2023) as I could not get the newer host with newer hardware working on 2.6

      Any suggestions would be appreciated.

      1 Reply Last reply Reply Quote 0
      • johnpozJ johnpoz moved this topic from HA/CARP/VIPs on
      • jimpJ
        jimp Rebel Alliance Developer Netgate
        last edited by

        DHCP failover works on 2.7 in general, it's running fine on a pair of systems in my lab.

        You likely have some part of the configuration that isn't resulting in the correct/expected config parameters being in the right places.

        Something to remember is that DHCP communicates with its peer on each interface separately, so make sure they can freely communicate on each local interface involved in DHCP failover between UDP port 519 (primary) and port 520 (secondary)

        If that doesn't help, post the /var/dhcpd/etc/dhcpd.conf from both nodes. You can sanitize the addresses a bit but please try to keep the last octet intact (e.g. replace 192.168.1.1 with x.x.x.1).

        Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        S 1 Reply Last reply Reply Quote 0
        • S
          sef1414
          last edited by

          @jimp

          So I was going to post the .conf files in here, but before I did that, I wanted to triple check the HA recipe steps and troubleshooting steps and make sure everything was set up properly. It does indeed appear that it is.. however, along the way I noticed that states do not appear to be syncing either.

          There is a line in the system logs:

          carp: demoted by 0 to 0 (pfsync bulk fail) 
          

          So, I am not sure if this is a related or separate issue to the DHCP syncing.

          I am fairly sure I read on a forum that is was now ok to use different NICs in HA pairs, but that's no guarantee that info was accurate..

          Does your statement from this old forum post still hold?

          "The usual reason on 2.2.x for states to not sync is that the interfaces are mismatched. States in 2.2.x are interface-bound, meaning the interface is a part of the state. For example if the primary node has igb(4) NICs and the secondary has em(4), the states can't sync."

          If so, I suppose that is causing the sync issues. If psync is not working, should I expect the culprit to be the same as DHCP syncing not working?

          1 Reply Last reply Reply Quote 0
          • S
            sef1414 @jimp
            last edited by

            @jimp

            Here are the configs just in case:

            Master

            
            option domain-name "localnet";
            option ldap-server code 95 = text;
            option domain-search-list code 119 = text;
            option arch code 93 = unsigned integer 16; # RFC4578
            
            default-lease-time 7200;
            max-lease-time 86400;
            log-facility local7;
            one-lease-per-client true;
            deny duplicates;
            update-conflict-detection false;
            authoritative;
            failover peer "dhcp_opt14" {
              primary;
              address 192.168.91.3;
              port 519;
              peer address 192.168.91.2;
              peer port 520;
              max-response-delay 10;
              max-unacked-updates 10;
              split 128;
              mclt 600;
            
              load balance max seconds 3;
            }
            
            failover peer "dhcp_opt15" {
              primary;
              address 192.168.35.3;
              port 519;
              peer address 192.168.35.2;
              peer port 520;
              max-response-delay 10;
              max-unacked-updates 10;
              split 128;
              mclt 600;
            
              load balance max seconds 3;
            }
            
            failover peer "dhcp_opt16" {
              primary;
              address 10.0.66.103;
              port 519;
              peer address 10.0.66.102;
              peer port 520;
              max-response-delay 10;
              max-unacked-updates 10;
              split 128;
              mclt 600;
            
              load balance max seconds 3;
            }
            
            failover peer "dhcp_opt17" {
              primary;
              address 192.168.56.103;
              port 519;
              peer address 192.168.56.102;
              peer port 520;
              max-response-delay 10;
              max-unacked-updates 10;
              split 128;
              mclt 600;
            
              load balance max seconds 3;
            }
            
            failover peer "dhcp_opt18" {
              primary;
              address 192.168.76.3;
              port 519;
              peer address 192.168.76.2;
              peer port 520;
              max-response-delay 10;
              max-unacked-updates 10;
              split 128;
              mclt 600;
            
              load balance max seconds 3;
            }
            

            Secondary:

            
            option domain-name "localnet";
            option ldap-server code 95 = text;
            option domain-search-list code 119 = text;
            option arch code 93 = unsigned integer 16; # RFC4578
            
            default-lease-time 7200;
            max-lease-time 86400;
            log-facility local7;
            one-lease-per-client true;
            deny duplicates;
            update-conflict-detection false;
            authoritative;
            failover peer "dhcp_lan" {
              secondary;
              address 192.168.1.2;
              port 520;
              peer address 192.168.1.3;
              peer port 519;
              max-response-delay 10;
              max-unacked-updates 10;
              
              load balance max seconds 3;
            }
            
            failover peer "dhcp_opt2" {
              secondary;
              address 192.168.20.1;
              port 520;
              peer address 192.168.20.3;
              peer port 519;
              max-response-delay 10;
              max-unacked-updates 10;
              
              load balance max seconds 3;
            }
            
            failover peer "dhcp_opt14" {
              secondary;
              address 192.168.91.1;
              port 520;
              peer address 192.168.91.3;
              peer port 519;
              max-response-delay 10;
              max-unacked-updates 10;
              
              load balance max seconds 3;
            }
            
            failover peer "dhcp_opt15" {
              secondary;
              address 192.168.35.1;
              port 520;
              peer address 192.168.35.3;
              peer port 519;
              max-response-delay 10;
              max-unacked-updates 10;
              
              load balance max seconds 3;
            }
            
            failover peer "dhcp_opt16" {
              secondary;
              address 10.0.66.1;
              port 520;
              peer address 10.0.66.103;
              peer port 519;
              max-response-delay 10;
              max-unacked-updates 10;
              
              load balance max seconds 3;
            }
            
            failover peer "dhcp_opt17" {
              secondary;
              address 192.168.56.1;
              port 520;
              peer address 192.168.56.103;
              peer port 519;
              max-response-delay 10;
              max-unacked-updates 10;
              
              load balance max seconds 3;
            }
            
            failover peer "dhcp_opt18" {
              secondary;
              address 192.168.76.1;
              port 520;
              peer address 192.168.76.3;
              peer port 519;
              max-response-delay 10;
              max-unacked-updates 10;
              
              load balance max seconds 3;
            }
            
            
            

            Didn't include all the static mappings with hostnames, but I can if those are needed.

            1 Reply Last reply Reply Quote 0
            • jimpJ
              jimp Rebel Alliance Developer Netgate
              last edited by

              At the very least you have some mismatches in the config that are likely causing you problems. The config for the secondary has pools for dhcp_lan and dhcp_opt2 which are not on the primary.

              Also in some of these pools the secondary has its own address as .1 but the primary has the peer address as .2. Normally you'd see the it be .2/.3 and .3/.2 as they should be using their own interface addresses in these cases, not VIPs.

              Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

              Need help fast? Netgate Global Support!

              Do not Chat/PM for help!

              1 Reply Last reply Reply Quote 0
              • S
                sef1414
                last edited by

                @jimp

                Thanks for the follow up, much appreciated. I think the mismatch was due to some ill timed testing where I had tried re-adding interfaces without re-enabling CARP.

                I went ahead and stripped it down to just one interface for simplicity, and ran through the troubleshooting steps again.

                Master

                option domain-name "localnet";
                option ldap-server code 95 = text;
                option domain-search-list code 119 = text;
                option arch code 93 = unsigned integer 16; # RFC4578
                
                default-lease-time 7200;
                max-lease-time 86400;
                log-facility local7;
                one-lease-per-client true;
                deny duplicates;
                update-conflict-detection false;
                authoritative;
                failover peer "dhcp_opt15" {
                  primary;
                  address 192.168.35.3;
                  port 519;
                  peer address 192.168.35.2;
                  peer port 520;
                  max-response-delay 10;
                  max-unacked-updates 10;
                  split 128;
                  mclt 600;
                
                  load balance max seconds 3;
                }
                

                Secondary:

                option domain-name "localnet";
                option ldap-server code 95 = text;
                option domain-search-list code 119 = text;
                option arch code 93 = unsigned integer 16; # RFC4578
                
                default-lease-time 7200;
                max-lease-time 86400;
                log-facility local7;
                one-lease-per-client true;
                deny duplicates;
                update-conflict-detection false;
                authoritative;
                failover peer "dhcp_opt15" {
                  secondary;
                  address 192.168.35.1;
                  port 520;
                  peer address 192.168.35.3;
                  peer port 519;
                  max-response-delay 10;
                  max-unacked-updates 10;
                  
                  load balance max seconds 3;
                }
                

                I believe the "address" on the master should be the CARP VIP, but maybe I'm mistaken. I did follow the guide for setting up the DHCP server

                Here are my setting for the interface / DHCP / CARP VIP on the master:

                6e29f8a9-aa79-4e95-808d-920cc69ac189-image.png

                0702a3d5-dbb9-47fe-a23b-ec0716baa3d6-image.png

                9c0fa87a-4170-464e-bf70-401ca6a1a4b2-image.png

                Here are the logs after my starting DHCP daemons:

                Master:

                Jan 26 14:29:44	dhcpleases	251	Sending HUP signal to dns daemon(86952)
                Jan 26 14:29:44	dhcpd	84618	failover peer dhcp_opt15: I move from startup to recover
                Jan 26 14:29:29	dhcpleases	251	Sending HUP signal to dns daemon(86952)
                Jan 26 14:29:29	dhcpd	84618	Server starting service.
                Jan 26 14:29:29	dhcpd	84618	failover peer dhcp_opt15: I move from recover to startup
                Jan 26 14:29:29	dhcpd	84618	Sending on Socket/fallback/fallback-net
                

                Secondary:

                Jan 26 14:30:02	dhcpd	41294	failover peer dhcp_opt15: I move from startup to recover
                Jan 26 14:29:47	dhcpleases	5555	Sending HUP signal to dns daemon(32665)
                Jan 26 14:29:47	dhcpleases	5555	Sending HUP signal to dns daemon(32665)
                Jan 26 14:29:47	dhcpd	41294	Server starting service.
                Jan 26 14:29:47	dhcpd	41294	failover peer dhcp_opt15: host unreachable
                Jan 26 14:29:47	dhcpd	41294	failover peer dhcp_opt15: I move from recover to startup
                Jan 26 14:29:47	dhcpd	41294	Sending on Socket/fallback/fallback-net
                
                1 Reply Last reply Reply Quote 0
                • jimpJ
                  jimp Rebel Alliance Developer Netgate
                  last edited by

                  Something still isn't right there. If the VIP is .1 then neither of them should be using that as their "address" in the subnet for DHCP.

                  The config should show address <-> peer in both directions, like this:

                  Primary:

                  failover peer "dhcp_lan" {
                    primary;
                    address 10.11.0.2;
                    port 519;
                    peer address 10.11.0.3;
                    peer port 520;
                    max-response-delay 10;
                    max-unacked-updates 10;
                    split 128;
                    mclt 600;
                  
                    load balance max seconds 3;
                  }
                  

                  Secondary:

                  failover peer "dhcp_lan" {
                    secondary;
                    address 10.11.0.3;
                    port 520;
                    peer address 10.11.0.2;
                    peer port 519;
                    max-response-delay 10;
                    max-unacked-updates 10;
                  
                    load balance max seconds 3;
                  }
                  

                  Note that it's 10.11.0.2:519 <-> 10.11.0.3:520 both ways.

                  I'm not sure how that secondary is pulling the VIP for its own address there.

                  Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                  Need help fast? Netgate Global Support!

                  Do not Chat/PM for help!

                  S 1 Reply Last reply Reply Quote 0
                  • S
                    sef1414 @jimp
                    last edited by

                    @jimp

                    Hmm ok. Changing it manually just gets overwritten, as I expected. Any thoughts on where to go from here?

                    For the host unreachable messages, do I need some explicit firewall rule to pass the traffic on that interface? I wouldn't think I would since its not mentioned in docs and the traffic is on the some interface.

                    1 Reply Last reply Reply Quote 0
                    • jimpJ
                      jimp Rebel Alliance Developer Netgate
                      last edited by

                      Not sure why it's picking the VIP there, but it might be related to https://redmine.pfsense.org/issues/11545 -- I don't think I've ever seen that be triggered by a CARP VIP though, especially not that reliably.

                      I'd take a closer look at the interface and VIP settings and see if anything stands out there.

                      Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                      Need help fast? Netgate Global Support!

                      Do not Chat/PM for help!

                      S 1 Reply Last reply Reply Quote 0
                      • S
                        sef1414 @jimp
                        last edited by

                        @jimp

                        Alright. No joy on re-saving interface / VIP. Pretty sure I have everything configured correctly, have run through too many times to count.

                        Any shot its caused by different phsyical NIC models?

                        1 Reply Last reply Reply Quote 0
                        • jimpJ
                          jimp Rebel Alliance Developer Netgate
                          last edited by

                          No, the NIC models only affect state sync, not DHCP sync. And even then the state sync isn't affected anymore since we moved back away from interface-bound states.

                          Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                          Need help fast? Netgate Global Support!

                          Do not Chat/PM for help!

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.