• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

DHCP failover not working

Scheduled Pinned Locked Moved CE 2.7.0 Development Snapshots (Retired)
11 Posts 2 Posters 1.6k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S
    sef1414
    last edited by Jan 25, 2023, 1:43 PM

    I've just set up a second instance of pfsense for HA. The pair is syncing, but I'm running into issues with DHCP.

    I've followed the guide for inputting the CARP VIP in the DNS Server / Gateway fields for the DHCP server, and inputting the peer failover IP.

    On the DHCP Pool Status page, interfaces reflect "recover" in the "My State" column, and "unknown-state" in the "Peer State" column.

    I've gone through this troubleshooting list and performed each step without any improvement:

    https://docs.netgate.com/pfsense/en/latest/troubleshooting/ha-dhcp-failover.html

    DHCP logs indicate an issue but don't provide much detail:

    Jan 25 06:38:08	dhcpd	95441	failover peer dhcp_opt2: I move from startup to recover
    Jan 25 06:37:53	dhcpd	95441	failover peer dhcp_opt2: host down
    Jan 25 06:37:53	dhcpd	95441	failover peer dhcp_opt2: I move from recover to startup
    

    I am running the same development snapshot (2.7.0-DEVELOPMENT (amd64)
    built on Fri Jan 20 03:01:02 UTC 2023) as I could not get the newer host with newer hardware working on 2.6

    Any suggestions would be appreciated.

    1 Reply Last reply Reply Quote 0
    • J johnpoz moved this topic from HA/CARP/VIPs on Jan 25, 2023, 1:49 PM
    • J
      jimp Rebel Alliance Developer Netgate
      last edited by Jan 25, 2023, 7:59 PM

      DHCP failover works on 2.7 in general, it's running fine on a pair of systems in my lab.

      You likely have some part of the configuration that isn't resulting in the correct/expected config parameters being in the right places.

      Something to remember is that DHCP communicates with its peer on each interface separately, so make sure they can freely communicate on each local interface involved in DHCP failover between UDP port 519 (primary) and port 520 (secondary)

      If that doesn't help, post the /var/dhcpd/etc/dhcpd.conf from both nodes. You can sanitize the addresses a bit but please try to keep the last octet intact (e.g. replace 192.168.1.1 with x.x.x.1).

      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

      Need help fast? Netgate Global Support!

      Do not Chat/PM for help!

      S 1 Reply Last reply Jan 25, 2023, 11:51 PM Reply Quote 0
      • S
        sef1414
        last edited by Jan 25, 2023, 11:48 PM

        @jimp

        So I was going to post the .conf files in here, but before I did that, I wanted to triple check the HA recipe steps and troubleshooting steps and make sure everything was set up properly. It does indeed appear that it is.. however, along the way I noticed that states do not appear to be syncing either.

        There is a line in the system logs:

        carp: demoted by 0 to 0 (pfsync bulk fail) 
        

        So, I am not sure if this is a related or separate issue to the DHCP syncing.

        I am fairly sure I read on a forum that is was now ok to use different NICs in HA pairs, but that's no guarantee that info was accurate..

        Does your statement from this old forum post still hold?

        "The usual reason on 2.2.x for states to not sync is that the interfaces are mismatched. States in 2.2.x are interface-bound, meaning the interface is a part of the state. For example if the primary node has igb(4) NICs and the secondary has em(4), the states can't sync."

        If so, I suppose that is causing the sync issues. If psync is not working, should I expect the culprit to be the same as DHCP syncing not working?

        1 Reply Last reply Reply Quote 0
        • S
          sef1414 @jimp
          last edited by Jan 25, 2023, 11:51 PM

          @jimp

          Here are the configs just in case:

          Master

          
          option domain-name "localnet";
          option ldap-server code 95 = text;
          option domain-search-list code 119 = text;
          option arch code 93 = unsigned integer 16; # RFC4578
          
          default-lease-time 7200;
          max-lease-time 86400;
          log-facility local7;
          one-lease-per-client true;
          deny duplicates;
          update-conflict-detection false;
          authoritative;
          failover peer "dhcp_opt14" {
            primary;
            address 192.168.91.3;
            port 519;
            peer address 192.168.91.2;
            peer port 520;
            max-response-delay 10;
            max-unacked-updates 10;
            split 128;
            mclt 600;
          
            load balance max seconds 3;
          }
          
          failover peer "dhcp_opt15" {
            primary;
            address 192.168.35.3;
            port 519;
            peer address 192.168.35.2;
            peer port 520;
            max-response-delay 10;
            max-unacked-updates 10;
            split 128;
            mclt 600;
          
            load balance max seconds 3;
          }
          
          failover peer "dhcp_opt16" {
            primary;
            address 10.0.66.103;
            port 519;
            peer address 10.0.66.102;
            peer port 520;
            max-response-delay 10;
            max-unacked-updates 10;
            split 128;
            mclt 600;
          
            load balance max seconds 3;
          }
          
          failover peer "dhcp_opt17" {
            primary;
            address 192.168.56.103;
            port 519;
            peer address 192.168.56.102;
            peer port 520;
            max-response-delay 10;
            max-unacked-updates 10;
            split 128;
            mclt 600;
          
            load balance max seconds 3;
          }
          
          failover peer "dhcp_opt18" {
            primary;
            address 192.168.76.3;
            port 519;
            peer address 192.168.76.2;
            peer port 520;
            max-response-delay 10;
            max-unacked-updates 10;
            split 128;
            mclt 600;
          
            load balance max seconds 3;
          }
          

          Secondary:

          
          option domain-name "localnet";
          option ldap-server code 95 = text;
          option domain-search-list code 119 = text;
          option arch code 93 = unsigned integer 16; # RFC4578
          
          default-lease-time 7200;
          max-lease-time 86400;
          log-facility local7;
          one-lease-per-client true;
          deny duplicates;
          update-conflict-detection false;
          authoritative;
          failover peer "dhcp_lan" {
            secondary;
            address 192.168.1.2;
            port 520;
            peer address 192.168.1.3;
            peer port 519;
            max-response-delay 10;
            max-unacked-updates 10;
            
            load balance max seconds 3;
          }
          
          failover peer "dhcp_opt2" {
            secondary;
            address 192.168.20.1;
            port 520;
            peer address 192.168.20.3;
            peer port 519;
            max-response-delay 10;
            max-unacked-updates 10;
            
            load balance max seconds 3;
          }
          
          failover peer "dhcp_opt14" {
            secondary;
            address 192.168.91.1;
            port 520;
            peer address 192.168.91.3;
            peer port 519;
            max-response-delay 10;
            max-unacked-updates 10;
            
            load balance max seconds 3;
          }
          
          failover peer "dhcp_opt15" {
            secondary;
            address 192.168.35.1;
            port 520;
            peer address 192.168.35.3;
            peer port 519;
            max-response-delay 10;
            max-unacked-updates 10;
            
            load balance max seconds 3;
          }
          
          failover peer "dhcp_opt16" {
            secondary;
            address 10.0.66.1;
            port 520;
            peer address 10.0.66.103;
            peer port 519;
            max-response-delay 10;
            max-unacked-updates 10;
            
            load balance max seconds 3;
          }
          
          failover peer "dhcp_opt17" {
            secondary;
            address 192.168.56.1;
            port 520;
            peer address 192.168.56.103;
            peer port 519;
            max-response-delay 10;
            max-unacked-updates 10;
            
            load balance max seconds 3;
          }
          
          failover peer "dhcp_opt18" {
            secondary;
            address 192.168.76.1;
            port 520;
            peer address 192.168.76.3;
            peer port 519;
            max-response-delay 10;
            max-unacked-updates 10;
            
            load balance max seconds 3;
          }
          
          
          

          Didn't include all the static mappings with hostnames, but I can if those are needed.

          1 Reply Last reply Reply Quote 0
          • J
            jimp Rebel Alliance Developer Netgate
            last edited by Jan 26, 2023, 1:29 PM

            At the very least you have some mismatches in the config that are likely causing you problems. The config for the secondary has pools for dhcp_lan and dhcp_opt2 which are not on the primary.

            Also in some of these pools the secondary has its own address as .1 but the primary has the peer address as .2. Normally you'd see the it be .2/.3 and .3/.2 as they should be using their own interface addresses in these cases, not VIPs.

            Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

            Need help fast? Netgate Global Support!

            Do not Chat/PM for help!

            1 Reply Last reply Reply Quote 0
            • S
              sef1414
              last edited by Jan 26, 2023, 9:40 PM

              @jimp

              Thanks for the follow up, much appreciated. I think the mismatch was due to some ill timed testing where I had tried re-adding interfaces without re-enabling CARP.

              I went ahead and stripped it down to just one interface for simplicity, and ran through the troubleshooting steps again.

              Master

              option domain-name "localnet";
              option ldap-server code 95 = text;
              option domain-search-list code 119 = text;
              option arch code 93 = unsigned integer 16; # RFC4578
              
              default-lease-time 7200;
              max-lease-time 86400;
              log-facility local7;
              one-lease-per-client true;
              deny duplicates;
              update-conflict-detection false;
              authoritative;
              failover peer "dhcp_opt15" {
                primary;
                address 192.168.35.3;
                port 519;
                peer address 192.168.35.2;
                peer port 520;
                max-response-delay 10;
                max-unacked-updates 10;
                split 128;
                mclt 600;
              
                load balance max seconds 3;
              }
              

              Secondary:

              option domain-name "localnet";
              option ldap-server code 95 = text;
              option domain-search-list code 119 = text;
              option arch code 93 = unsigned integer 16; # RFC4578
              
              default-lease-time 7200;
              max-lease-time 86400;
              log-facility local7;
              one-lease-per-client true;
              deny duplicates;
              update-conflict-detection false;
              authoritative;
              failover peer "dhcp_opt15" {
                secondary;
                address 192.168.35.1;
                port 520;
                peer address 192.168.35.3;
                peer port 519;
                max-response-delay 10;
                max-unacked-updates 10;
                
                load balance max seconds 3;
              }
              

              I believe the "address" on the master should be the CARP VIP, but maybe I'm mistaken. I did follow the guide for setting up the DHCP server

              Here are my setting for the interface / DHCP / CARP VIP on the master:

              6e29f8a9-aa79-4e95-808d-920cc69ac189-image.png

              0702a3d5-dbb9-47fe-a23b-ec0716baa3d6-image.png

              9c0fa87a-4170-464e-bf70-401ca6a1a4b2-image.png

              Here are the logs after my starting DHCP daemons:

              Master:

              Jan 26 14:29:44	dhcpleases	251	Sending HUP signal to dns daemon(86952)
              Jan 26 14:29:44	dhcpd	84618	failover peer dhcp_opt15: I move from startup to recover
              Jan 26 14:29:29	dhcpleases	251	Sending HUP signal to dns daemon(86952)
              Jan 26 14:29:29	dhcpd	84618	Server starting service.
              Jan 26 14:29:29	dhcpd	84618	failover peer dhcp_opt15: I move from recover to startup
              Jan 26 14:29:29	dhcpd	84618	Sending on Socket/fallback/fallback-net
              

              Secondary:

              Jan 26 14:30:02	dhcpd	41294	failover peer dhcp_opt15: I move from startup to recover
              Jan 26 14:29:47	dhcpleases	5555	Sending HUP signal to dns daemon(32665)
              Jan 26 14:29:47	dhcpleases	5555	Sending HUP signal to dns daemon(32665)
              Jan 26 14:29:47	dhcpd	41294	Server starting service.
              Jan 26 14:29:47	dhcpd	41294	failover peer dhcp_opt15: host unreachable
              Jan 26 14:29:47	dhcpd	41294	failover peer dhcp_opt15: I move from recover to startup
              Jan 26 14:29:47	dhcpd	41294	Sending on Socket/fallback/fallback-net
              
              1 Reply Last reply Reply Quote 0
              • J
                jimp Rebel Alliance Developer Netgate
                last edited by Jan 27, 2023, 4:21 PM

                Something still isn't right there. If the VIP is .1 then neither of them should be using that as their "address" in the subnet for DHCP.

                The config should show address <-> peer in both directions, like this:

                Primary:

                failover peer "dhcp_lan" {
                  primary;
                  address 10.11.0.2;
                  port 519;
                  peer address 10.11.0.3;
                  peer port 520;
                  max-response-delay 10;
                  max-unacked-updates 10;
                  split 128;
                  mclt 600;
                
                  load balance max seconds 3;
                }
                

                Secondary:

                failover peer "dhcp_lan" {
                  secondary;
                  address 10.11.0.3;
                  port 520;
                  peer address 10.11.0.2;
                  peer port 519;
                  max-response-delay 10;
                  max-unacked-updates 10;
                
                  load balance max seconds 3;
                }
                

                Note that it's 10.11.0.2:519 <-> 10.11.0.3:520 both ways.

                I'm not sure how that secondary is pulling the VIP for its own address there.

                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                S 1 Reply Last reply Jan 28, 2023, 7:10 PM Reply Quote 0
                • S
                  sef1414 @jimp
                  last edited by Jan 28, 2023, 7:10 PM

                  @jimp

                  Hmm ok. Changing it manually just gets overwritten, as I expected. Any thoughts on where to go from here?

                  For the host unreachable messages, do I need some explicit firewall rule to pass the traffic on that interface? I wouldn't think I would since its not mentioned in docs and the traffic is on the some interface.

                  1 Reply Last reply Reply Quote 0
                  • J
                    jimp Rebel Alliance Developer Netgate
                    last edited by Jan 30, 2023, 1:53 PM

                    Not sure why it's picking the VIP there, but it might be related to https://redmine.pfsense.org/issues/11545 -- I don't think I've ever seen that be triggered by a CARP VIP though, especially not that reliably.

                    I'd take a closer look at the interface and VIP settings and see if anything stands out there.

                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                    Need help fast? Netgate Global Support!

                    Do not Chat/PM for help!

                    S 1 Reply Last reply Jan 31, 2023, 8:18 PM Reply Quote 0
                    • S
                      sef1414 @jimp
                      last edited by Jan 31, 2023, 8:18 PM

                      @jimp

                      Alright. No joy on re-saving interface / VIP. Pretty sure I have everything configured correctly, have run through too many times to count.

                      Any shot its caused by different phsyical NIC models?

                      1 Reply Last reply Reply Quote 0
                      • J
                        jimp Rebel Alliance Developer Netgate
                        last edited by Feb 1, 2023, 1:20 PM

                        No, the NIC models only affect state sync, not DHCP sync. And even then the state sync isn't affected anymore since we moved back away from interface-bound states.

                        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                        Need help fast? Netgate Global Support!

                        Do not Chat/PM for help!

                        1 Reply Last reply Reply Quote 0
                        7 out of 11
                        • First post
                          7/11
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                          This community forum collects and processes your personal information.
                          consent.not_received