• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

DHCP failover not working

Scheduled Pinned Locked Moved CE 2.7.0 Development Snapshots (Retired)
11 Posts 2 Posters 1.6k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J
    jimp Rebel Alliance Developer Netgate
    last edited by Jan 25, 2023, 7:59 PM

    DHCP failover works on 2.7 in general, it's running fine on a pair of systems in my lab.

    You likely have some part of the configuration that isn't resulting in the correct/expected config parameters being in the right places.

    Something to remember is that DHCP communicates with its peer on each interface separately, so make sure they can freely communicate on each local interface involved in DHCP failover between UDP port 519 (primary) and port 520 (secondary)

    If that doesn't help, post the /var/dhcpd/etc/dhcpd.conf from both nodes. You can sanitize the addresses a bit but please try to keep the last octet intact (e.g. replace 192.168.1.1 with x.x.x.1).

    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

    Need help fast? Netgate Global Support!

    Do not Chat/PM for help!

    S 1 Reply Last reply Jan 25, 2023, 11:51 PM Reply Quote 0
    • S
      sef1414
      last edited by Jan 25, 2023, 11:48 PM

      @jimp

      So I was going to post the .conf files in here, but before I did that, I wanted to triple check the HA recipe steps and troubleshooting steps and make sure everything was set up properly. It does indeed appear that it is.. however, along the way I noticed that states do not appear to be syncing either.

      There is a line in the system logs:

      carp: demoted by 0 to 0 (pfsync bulk fail) 
      

      So, I am not sure if this is a related or separate issue to the DHCP syncing.

      I am fairly sure I read on a forum that is was now ok to use different NICs in HA pairs, but that's no guarantee that info was accurate..

      Does your statement from this old forum post still hold?

      "The usual reason on 2.2.x for states to not sync is that the interfaces are mismatched. States in 2.2.x are interface-bound, meaning the interface is a part of the state. For example if the primary node has igb(4) NICs and the secondary has em(4), the states can't sync."

      If so, I suppose that is causing the sync issues. If psync is not working, should I expect the culprit to be the same as DHCP syncing not working?

      1 Reply Last reply Reply Quote 0
      • S
        sef1414 @jimp
        last edited by Jan 25, 2023, 11:51 PM

        @jimp

        Here are the configs just in case:

        Master

        
        option domain-name "localnet";
        option ldap-server code 95 = text;
        option domain-search-list code 119 = text;
        option arch code 93 = unsigned integer 16; # RFC4578
        
        default-lease-time 7200;
        max-lease-time 86400;
        log-facility local7;
        one-lease-per-client true;
        deny duplicates;
        update-conflict-detection false;
        authoritative;
        failover peer "dhcp_opt14" {
          primary;
          address 192.168.91.3;
          port 519;
          peer address 192.168.91.2;
          peer port 520;
          max-response-delay 10;
          max-unacked-updates 10;
          split 128;
          mclt 600;
        
          load balance max seconds 3;
        }
        
        failover peer "dhcp_opt15" {
          primary;
          address 192.168.35.3;
          port 519;
          peer address 192.168.35.2;
          peer port 520;
          max-response-delay 10;
          max-unacked-updates 10;
          split 128;
          mclt 600;
        
          load balance max seconds 3;
        }
        
        failover peer "dhcp_opt16" {
          primary;
          address 10.0.66.103;
          port 519;
          peer address 10.0.66.102;
          peer port 520;
          max-response-delay 10;
          max-unacked-updates 10;
          split 128;
          mclt 600;
        
          load balance max seconds 3;
        }
        
        failover peer "dhcp_opt17" {
          primary;
          address 192.168.56.103;
          port 519;
          peer address 192.168.56.102;
          peer port 520;
          max-response-delay 10;
          max-unacked-updates 10;
          split 128;
          mclt 600;
        
          load balance max seconds 3;
        }
        
        failover peer "dhcp_opt18" {
          primary;
          address 192.168.76.3;
          port 519;
          peer address 192.168.76.2;
          peer port 520;
          max-response-delay 10;
          max-unacked-updates 10;
          split 128;
          mclt 600;
        
          load balance max seconds 3;
        }
        

        Secondary:

        
        option domain-name "localnet";
        option ldap-server code 95 = text;
        option domain-search-list code 119 = text;
        option arch code 93 = unsigned integer 16; # RFC4578
        
        default-lease-time 7200;
        max-lease-time 86400;
        log-facility local7;
        one-lease-per-client true;
        deny duplicates;
        update-conflict-detection false;
        authoritative;
        failover peer "dhcp_lan" {
          secondary;
          address 192.168.1.2;
          port 520;
          peer address 192.168.1.3;
          peer port 519;
          max-response-delay 10;
          max-unacked-updates 10;
          
          load balance max seconds 3;
        }
        
        failover peer "dhcp_opt2" {
          secondary;
          address 192.168.20.1;
          port 520;
          peer address 192.168.20.3;
          peer port 519;
          max-response-delay 10;
          max-unacked-updates 10;
          
          load balance max seconds 3;
        }
        
        failover peer "dhcp_opt14" {
          secondary;
          address 192.168.91.1;
          port 520;
          peer address 192.168.91.3;
          peer port 519;
          max-response-delay 10;
          max-unacked-updates 10;
          
          load balance max seconds 3;
        }
        
        failover peer "dhcp_opt15" {
          secondary;
          address 192.168.35.1;
          port 520;
          peer address 192.168.35.3;
          peer port 519;
          max-response-delay 10;
          max-unacked-updates 10;
          
          load balance max seconds 3;
        }
        
        failover peer "dhcp_opt16" {
          secondary;
          address 10.0.66.1;
          port 520;
          peer address 10.0.66.103;
          peer port 519;
          max-response-delay 10;
          max-unacked-updates 10;
          
          load balance max seconds 3;
        }
        
        failover peer "dhcp_opt17" {
          secondary;
          address 192.168.56.1;
          port 520;
          peer address 192.168.56.103;
          peer port 519;
          max-response-delay 10;
          max-unacked-updates 10;
          
          load balance max seconds 3;
        }
        
        failover peer "dhcp_opt18" {
          secondary;
          address 192.168.76.1;
          port 520;
          peer address 192.168.76.3;
          peer port 519;
          max-response-delay 10;
          max-unacked-updates 10;
          
          load balance max seconds 3;
        }
        
        
        

        Didn't include all the static mappings with hostnames, but I can if those are needed.

        1 Reply Last reply Reply Quote 0
        • J
          jimp Rebel Alliance Developer Netgate
          last edited by Jan 26, 2023, 1:29 PM

          At the very least you have some mismatches in the config that are likely causing you problems. The config for the secondary has pools for dhcp_lan and dhcp_opt2 which are not on the primary.

          Also in some of these pools the secondary has its own address as .1 but the primary has the peer address as .2. Normally you'd see the it be .2/.3 and .3/.2 as they should be using their own interface addresses in these cases, not VIPs.

          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

          Need help fast? Netgate Global Support!

          Do not Chat/PM for help!

          1 Reply Last reply Reply Quote 0
          • S
            sef1414
            last edited by Jan 26, 2023, 9:40 PM

            @jimp

            Thanks for the follow up, much appreciated. I think the mismatch was due to some ill timed testing where I had tried re-adding interfaces without re-enabling CARP.

            I went ahead and stripped it down to just one interface for simplicity, and ran through the troubleshooting steps again.

            Master

            option domain-name "localnet";
            option ldap-server code 95 = text;
            option domain-search-list code 119 = text;
            option arch code 93 = unsigned integer 16; # RFC4578
            
            default-lease-time 7200;
            max-lease-time 86400;
            log-facility local7;
            one-lease-per-client true;
            deny duplicates;
            update-conflict-detection false;
            authoritative;
            failover peer "dhcp_opt15" {
              primary;
              address 192.168.35.3;
              port 519;
              peer address 192.168.35.2;
              peer port 520;
              max-response-delay 10;
              max-unacked-updates 10;
              split 128;
              mclt 600;
            
              load balance max seconds 3;
            }
            

            Secondary:

            option domain-name "localnet";
            option ldap-server code 95 = text;
            option domain-search-list code 119 = text;
            option arch code 93 = unsigned integer 16; # RFC4578
            
            default-lease-time 7200;
            max-lease-time 86400;
            log-facility local7;
            one-lease-per-client true;
            deny duplicates;
            update-conflict-detection false;
            authoritative;
            failover peer "dhcp_opt15" {
              secondary;
              address 192.168.35.1;
              port 520;
              peer address 192.168.35.3;
              peer port 519;
              max-response-delay 10;
              max-unacked-updates 10;
              
              load balance max seconds 3;
            }
            

            I believe the "address" on the master should be the CARP VIP, but maybe I'm mistaken. I did follow the guide for setting up the DHCP server

            Here are my setting for the interface / DHCP / CARP VIP on the master:

            6e29f8a9-aa79-4e95-808d-920cc69ac189-image.png

            0702a3d5-dbb9-47fe-a23b-ec0716baa3d6-image.png

            9c0fa87a-4170-464e-bf70-401ca6a1a4b2-image.png

            Here are the logs after my starting DHCP daemons:

            Master:

            Jan 26 14:29:44	dhcpleases	251	Sending HUP signal to dns daemon(86952)
            Jan 26 14:29:44	dhcpd	84618	failover peer dhcp_opt15: I move from startup to recover
            Jan 26 14:29:29	dhcpleases	251	Sending HUP signal to dns daemon(86952)
            Jan 26 14:29:29	dhcpd	84618	Server starting service.
            Jan 26 14:29:29	dhcpd	84618	failover peer dhcp_opt15: I move from recover to startup
            Jan 26 14:29:29	dhcpd	84618	Sending on Socket/fallback/fallback-net
            

            Secondary:

            Jan 26 14:30:02	dhcpd	41294	failover peer dhcp_opt15: I move from startup to recover
            Jan 26 14:29:47	dhcpleases	5555	Sending HUP signal to dns daemon(32665)
            Jan 26 14:29:47	dhcpleases	5555	Sending HUP signal to dns daemon(32665)
            Jan 26 14:29:47	dhcpd	41294	Server starting service.
            Jan 26 14:29:47	dhcpd	41294	failover peer dhcp_opt15: host unreachable
            Jan 26 14:29:47	dhcpd	41294	failover peer dhcp_opt15: I move from recover to startup
            Jan 26 14:29:47	dhcpd	41294	Sending on Socket/fallback/fallback-net
            
            1 Reply Last reply Reply Quote 0
            • J
              jimp Rebel Alliance Developer Netgate
              last edited by Jan 27, 2023, 4:21 PM

              Something still isn't right there. If the VIP is .1 then neither of them should be using that as their "address" in the subnet for DHCP.

              The config should show address <-> peer in both directions, like this:

              Primary:

              failover peer "dhcp_lan" {
                primary;
                address 10.11.0.2;
                port 519;
                peer address 10.11.0.3;
                peer port 520;
                max-response-delay 10;
                max-unacked-updates 10;
                split 128;
                mclt 600;
              
                load balance max seconds 3;
              }
              

              Secondary:

              failover peer "dhcp_lan" {
                secondary;
                address 10.11.0.3;
                port 520;
                peer address 10.11.0.2;
                peer port 519;
                max-response-delay 10;
                max-unacked-updates 10;
              
                load balance max seconds 3;
              }
              

              Note that it's 10.11.0.2:519 <-> 10.11.0.3:520 both ways.

              I'm not sure how that secondary is pulling the VIP for its own address there.

              Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

              Need help fast? Netgate Global Support!

              Do not Chat/PM for help!

              S 1 Reply Last reply Jan 28, 2023, 7:10 PM Reply Quote 0
              • S
                sef1414 @jimp
                last edited by Jan 28, 2023, 7:10 PM

                @jimp

                Hmm ok. Changing it manually just gets overwritten, as I expected. Any thoughts on where to go from here?

                For the host unreachable messages, do I need some explicit firewall rule to pass the traffic on that interface? I wouldn't think I would since its not mentioned in docs and the traffic is on the some interface.

                1 Reply Last reply Reply Quote 0
                • J
                  jimp Rebel Alliance Developer Netgate
                  last edited by Jan 30, 2023, 1:53 PM

                  Not sure why it's picking the VIP there, but it might be related to https://redmine.pfsense.org/issues/11545 -- I don't think I've ever seen that be triggered by a CARP VIP though, especially not that reliably.

                  I'd take a closer look at the interface and VIP settings and see if anything stands out there.

                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                  Need help fast? Netgate Global Support!

                  Do not Chat/PM for help!

                  S 1 Reply Last reply Jan 31, 2023, 8:18 PM Reply Quote 0
                  • S
                    sef1414 @jimp
                    last edited by Jan 31, 2023, 8:18 PM

                    @jimp

                    Alright. No joy on re-saving interface / VIP. Pretty sure I have everything configured correctly, have run through too many times to count.

                    Any shot its caused by different phsyical NIC models?

                    1 Reply Last reply Reply Quote 0
                    • J
                      jimp Rebel Alliance Developer Netgate
                      last edited by Feb 1, 2023, 1:20 PM

                      No, the NIC models only affect state sync, not DHCP sync. And even then the state sync isn't affected anymore since we moved back away from interface-bound states.

                      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                      Need help fast? Netgate Global Support!

                      Do not Chat/PM for help!

                      1 Reply Last reply Reply Quote 0
                      11 out of 11
                      • First post
                        11/11
                        Last post
                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                        This community forum collects and processes your personal information.
                        consent.not_received