<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[DHCP Servers in HA Recovery State]]></title><description><![CDATA[<p dir="auto">Hello, I had a working system in HA, but after some trivial address range changes to one of the DHCP servers, all of them are now stuck in recovery.</p>
<p dir="auto">I have 5 networks, and 3 DHCP servers. All DHCP servers are on vLANs, but the untagged management network they are carried on is NOT offering DHCP.<br />
CARP is working well and I can shift Master behaviour back and forth between two units.</p>
<p dir="auto">I've read all the troubleshooting for DHCP issues with HA, restarted each side, stood on my head and wiped the leases file.</p>
<p dir="auto">After looking at port 519/520, I found that each side keeps transmitting SYNs to each other with a number of TCP Retransmission (as found when decoding in Wireshark, but not clearly shown here.)<br />
I don't have any good traffic to look at but this is a bit suspect.</p>
<pre><code>23:58:27.196467 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.20.0.2.5069 &gt; 10.20.0.3.519: Flags [S], cksum 0x145b (incorrect -&gt; 0xe5bb), seq 4108732373, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val 4215210363 ecr 0], length 0
23:58:43.114475 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.20.0.2.7616 &gt; 10.20.0.3.519: Flags [S], cksum 0x145b (incorrect -&gt; 0x3ef4), seq 450303092, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val 275027612 ecr 0], length 0
23:58:44.115396 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.20.0.2.7616 &gt; 10.20.0.3.519: Flags [S], cksum 0x145b (incorrect -&gt; 0x3b0b), seq 450303092, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val 275028613 ecr 0], length 0
23:58:45.059483 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.20.0.3.32726 &gt; 10.20.0.2.519: Flags [S], cksum 0x2e4a (correct), seq 34146524, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val 772295666 ecr 0], length 0
23:58:46.327090 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.20.0.2.7616 &gt; 10.20.0.3.519: Flags [S], cksum 0x145b (incorrect -&gt; 0x3268), seq 450303092, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val 275030824 ecr 0], length 0
23:58:50.527367 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.20.0.2.7616 &gt; 10.20.0.3.519: Flags [S], cksum 0x145b (incorrect -&gt; 0x2200), seq 450303092, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val 275035024 ecr 0], length 0
23:58:58.755973 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.20.0.2.7616 &gt; 10.20.0.3.519: Flags [S], cksum 0x145b (incorrect -&gt; 0x01db), seq 450303092, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val 275043253 ecr 0], length 0
</code></pre>
<p dir="auto">Any suggestions for how to debug this? (other than disable/re-enable)</p>
<p dir="auto">Thanks in advance,<br />
Daryl</p>
]]></description><link>https://forum.netgate.com/topic/183007/dhcp-servers-in-ha-recovery-state</link><generator>RSS for Node</generator><lastBuildDate>Tue, 09 Jun 2026 21:55:44 GMT</lastBuildDate><atom:link href="https://forum.netgate.com/topic/183007.rss" rel="self" type="application/rss+xml"/><pubDate>Sat, 23 Sep 2023 07:31:40 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to DHCP Servers in HA Recovery State on Sat, 23 Sep 2023 21:28:03 GMT]]></title><description><![CDATA[<p dir="auto">Solved. (Although I still have temporary outages after each DHCP configuration change.)</p>
<p dir="auto">TLDR; I had the skew on my VIP addresses set to 100 and 200 instead of 0 and 100. Unfortunately I hadn't noticed this since the DHCP failover was working as expected prior to the config change. The notes in <a href="https://docs.netgate.com/pfsense/en/latest/troubleshooting/ha-dhcp-failover.html" target="_blank" rel="noopener noreferrer nofollow ugc">troubleshooting HA DHCP failover</a> are worth careful study.</p>
<p dir="auto">The smoking gun in my post above is that all the communication is on port 519 and not a mix of 519 and 520. This was caused by both of my pfSense units believing they were secondary (since the skews were high). This was found by browsing /var/dhcpd/etc/dhcpd/conf and looking under the clause marked "failover peer".</p>
<p dir="auto">Thanks! Daryl</p>
]]></description><link>https://forum.netgate.com/post/1126970</link><guid isPermaLink="true">https://forum.netgate.com/post/1126970</guid><dc:creator><![CDATA[splinegear]]></dc:creator><pubDate>Sat, 23 Sep 2023 21:28:03 GMT</pubDate></item></channel></rss>