Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    DHCP Stuck in Recover

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    5 Posts 4 Posters 5.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S Offline
      Salmon
      last edited by

      The DHCP server on both nodes is stuck in the 'recover' state. I did search first and have tried all the stop/starting recommended but nothing seems to work.

      Following https://doc.pfsense.org/index.php/Configuring_pfSense_Hardware_Redundancy_(CARP) I have set up HA between two pfSense VMs running within VirtualBox and have a Lubuntu VM within their network.

      The network is setup so that:

      • WAN is NAT

      • LAN is Internal Network 'pfsense'

      • OPT1 is Internal Network 'pfsenseCARP'

      Version for both nodes:

      Version 2.2-RELEASE (amd64)
      built on Thu Jan 22 14:03:54 CST 2015
      FreeBSD 10.1-RELEASE-p4

      pfsense01.local: /var/dhcpd/etc/dhcpd.conf

      option domain-name "local";
      option ldap-server code 95 = text;
      option domain-search-list code 119 = text;
      option arch code 93 = unsigned integer 16; # RFC4578

      default-lease-time 7200;
      max-lease-time 86400;
      log-facility local7;
      one-lease-per-client true;
      deny duplicates;
      ping-check true;
      update-conflict-detection false;
      authoritative;
      failover peer "dhcp_lan" {
        primary;
        address 192.168.1.1;
        port 519;
        peer address 192.168.1.2;
        peer port 520;
        max-response-delay 10;
        max-unacked-updates 10;
        split 128;
        mclt 600;

      load balance max seconds 3;
      }

      subnet 192.168.1.0 netmask 255.255.255.0 {
      pool {
      option domain-name-servers 192.168.1.10;
      deny dynamic bootp clients;
      failover peer "dhcp_lan";
      range 192.168.1.100 192.168.1.245;
      }

      option routers 192.168.1.10;
      option domain-name-servers 192.168.1.10;

      }

      pfsense02.local: /var/dhcpd/etc/dhcpd.conf

      option domain-name "local";
      option ldap-server code 95 = text;
      option domain-search-list code 119 = text;
      option arch code 93 = unsigned integer 16; # RFC4578

      default-lease-time 7200;
      max-lease-time 86400;
      log-facility local7;
      one-lease-per-client true;
      deny duplicates;
      ping-check true;
      update-conflict-detection false;
      authoritative;
      failover peer "dhcp_lan" {
        secondary;
        address 192.168.1.2;
        port 520;
        peer address 192.168.1.1;
        peer port 519;
        max-response-delay 10;
        max-unacked-updates 10;
       
        load balance max seconds 3;
      }

      subnet 192.168.1.0 netmask 255.255.255.0 {
      pool {
      option domain-name-servers 192.168.1.10;
      deny dynamic bootp clients;
      failover peer "dhcp_lan";
      range 192.168.1.100 192.168.1.245;
      }

      option routers 192.168.1.10;
      option domain-name-servers 192.168.1.10;

      }

      pfsense01.local: pfsense01.local: /cf/conf/config.xml (VirtualIP Section)

      <virtualip><vip><mode>carp</mode>
      <interface>lan</interface>
      <vhid>1</vhid>
      <advskew>0</advskew>
      <advbase>1</advbase>
      <password>pf</password>
      <descr><type>single</type>
      <subnet_bits>24</subnet_bits>
      <subnet>192.168.1.10</subnet></descr></vip></virtualip>

      pfsense02.local: pfsense01.local: /cf/conf/config.xml (VirtualIP Section)

      <virtualip><vip><mode>carp</mode>
      <interface>lan</interface>
      <vhid>1</vhid>
      <advskew>100</advskew>
      <advbase>1</advbase>
      <password>pf</password>
      <descr><type>single</type>
      <subnet_bits>24</subnet_bits>
      <subnet>192.168.1.10</subnet></descr></vip></virtualip>

      The only thread I found that talked directly about this issue was from 6 years ago and said the problem was resolved but it seems to be a different issue I'm having. https://forum.pfsense.org/index.php?topic=18285.0

      EDIT:/ One thing I've noticed that seems off is a lot of entries like this in the firewall log:

      block/1000107060 Feb 8 12:01:19 lo0 192.168.1.1:519 192.168.1.10:59293 TCP:SA

      1 Reply Last reply Reply Quote 0
      • S Offline
        Salmon
        last edited by

        I don't know exactly what fixed this issue but I did this:

        Disabled the firewall on both nodes (pfctl -d)
        Turned off the DHCP service on both
        Turned on the DHCP service on node1
        Waited a long time (forgot about it so was probably around 10 minutes)
        Turned on the DHCP service on node2
        Waited about 2 minutes
        Enabled firewall (pftcl -e)

        Now the DHCP service is reporting normal operation and getting DHCP leases seems to work after failover.

        1 Reply Last reply Reply Quote 0
        • C Offline
          cthomas
          last edited by

          I have issues with this occasionally as well. Generally speaking, shutting down the dhcpd service on both firewalls and bringing them back up one at a time, about 5-10 seconds apart seems to do the trick.

          1 Reply Last reply Reply Quote 0
          • L Offline
            ljorgensen
            last edited by

            It seems the DHCP failover does not work properly when a large amount of leases is in use.

            1 Reply Last reply Reply Quote 0
            • N Offline
              Nico37
              last edited by

              Another important point to check when using DHCP failover which can have an impact on the recover/normal mode is the adskew advertisement.
              As mentionned on the GUI:

              Ensure one machine's advskew<20 (and the other is >20).

              On th virtual CARP IP I would check if the primary firewall respect this.
              I previsouly had issues with the DHCP service going into recover mode because of this, since I set all the CARP on the primary node to skew 0 everything is stable.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.