DHCP Failover with CARP - Both in Recover, Peer Unknown State
-
Question for richardsc- do you have any 'other' type VIPs? I had an issue like this ages back, and it was due to the other VIPs throwing off the master/backup check. I also used a cheap hack to fix the issue. The problem went away when I only had CARP VIPs.
nope. I only have CARP virtual VIP's.
If I can find time this week, I'm going to try and investigate further to find the root cause of the problem.
-
Just for fun I upgraded both boxes to 1.2.3 RC3 today and tried this again. I still can not get it to work properly. I may resort to the mod mentioned above to get this working.
-
:( Still no go. Ii can not get dhcp failover working. I have accepted the fact that it is broken and I will have to manually start dhcp on the backup unit during a failure. :'(
-
Check the dhcpd.conf on both boxes and verify the main is set to primary and the backup is set to secondary.
-
Today I tried to set it up and hit the same problem, quick tcpdump showed how it can be fixed. I've just enabled TCP ports 519 and 520 from LAN net to LAN Interface (this rule will be replicated to passive one), restarted dhcpd on Active one and that is it. It is working properly.
-
Also got problems getting this to work with pfSense 2.0 snapshots May 9th and May 11th. After changing the line in services.inc (and removed another one) as mentioned by richard, it worked for me. Somehow the skew counter isn't working correctly, not sure how this exactly works, but I know both routers have the exact same time and timezone set. Seems to me there is some kind of bug.
-
Same issue with "2.0-BETA4 built on Mon Aug 2 21:49:34 EDT 2010 FreeBSD 8.1-RELEASE"
Any have dhcp-failover working?
Thank
-
It works fine if you have valid configurations, the problem is that certain invalid configurations can trick the logic to make it not work.
The usual reason is that someone is using Proxy ARP VIPs which sync to the secondary as empty, which triggers a bug in the dhcp server logic that makes it think it's primary when it's not. I thought I committed a fix for that a week or two ago.
If you still have the bug, I need copies of /var/dhcpd/etc/dhcpd.conf from the primary and secondary, along with at least the <virtualip>section of the primary and secondary config.xml files.
The "skew" on the VIPs is used to trigger the logic for slave, so if you have manually set the skew on the secondary to less than 20, that would also break it.</virtualip>
-
Hello,
@jimp:If you still have the bug, I need copies of /var/dhcpd/etc/dhcpd.conf from the primary and secondary, along with at least the <virtualip>section of the primary and secondary config.xml files.</virtualip>
pfSense LEFT dhcpd.conf:
option domain-name "localdomain";
option ldap-server code 95 = text;
option domain-search-list code 119 = text;default-lease-time 7200;
max-lease-time 86400;
log-facility local7;
ddns-update-style none;
one-lease-per-client true;
deny duplicates;
ping-check true;
authoritative;
failover peer "dhcp0" {
primary;
address 192.168.3.1;
port 519;
peer address 192.168.3.2;
peer port 520;
max-response-delay 10;
max-unacked-updates 10;
split 128;
mclt 600;load balance max seconds 3;
}
authoritative;
failover peer "dhcp1" {
primary;
address 192.168.4.1;
port 519;
peer address 192.168.4.2;
peer port 520;
max-response-delay 10;
max-unacked-updates 10;
split 128;
mclt 600;load balance max seconds 3;
}
subnet 192.168.3.0 netmask 255.255.255.0 {
pool {
option domain-name-servers 192.168.3.10;
deny dynamic bootp clients;
failover peer "dhcp0";
range 192.168.3.100 192.168.3.199;
}
option routers 192.168.3.10;
option domain-name-servers 192.168.3.10;}
subnet 192.168.4.0 netmask 255.255.255.0 {
pool {
option domain-name-servers 192.168.4.10;
deny dynamic bootp clients;
failover peer "dhcp1";
range 192.168.4.100 192.168.4.199;
}
option routers 192.168.4.10;
option domain-name-servers 192.168.4.10;}
pfSense RIGHT dhcpd.conf:
option domain-name "localdomain";
option ldap-server code 95 = text;
option domain-search-list code 119 = text;default-lease-time 7200;
max-lease-time 86400;
log-facility local7;
ddns-update-style none;
one-lease-per-client true;
deny duplicates;
ping-check true;
authoritative;
failover peer "dhcp0" {
secondary;
address 192.168.3.2;
port 520;
peer address 192.168.3.1;
peer port 519;
max-response-delay 10;
max-unacked-updates 10;
mclt 600;load balance max seconds 3;
}
authoritative;
failover peer "dhcp1" {
secondary;
address 192.168.4.2;
port 520;
peer address 192.168.4.1;
peer port 519;
max-response-delay 10;
max-unacked-updates 10;
mclt 600;load balance max seconds 3;
}
subnet 192.168.3.0 netmask 255.255.255.0 {
pool {
option domain-name-servers 192.168.3.10;
deny dynamic bootp clients;
failover peer "dhcp0";
range 192.168.3.100 192.168.3.199;
}
option routers 192.168.3.10;
option domain-name-servers 192.168.3.10;}
subnet 192.168.4.0 netmask 255.255.255.0 {
pool {
option domain-name-servers 192.168.4.10;
deny dynamic bootp clients;
failover peer "dhcp1";
range 192.168.4.100 192.168.4.199;
}
option routers 192.168.4.10;
option domain-name-servers 192.168.4.10;}
pfSense LEFT config.xml:
<virtualip><vip><vip><mode>carp</mode>
<interface>wan</interface>
<vhid>1</vhid>
<advskew>0</advskew>
<password>wanpass</password>
<descr><type>single</type>
<subnet_bits>24</subnet_bits>
<subnet>192.168.1.50</subnet></descr></vip>
<vip><vip><mode>carp</mode>
<interface>lan</interface>
<vhid>2</vhid>
<advskew>0</advskew>
<password>lanpass</password>
<descr><type>single</type>
<subnet_bits>24</subnet_bits>
<subnet>192.168.3.10</subnet></descr></vip>
<vip><vip><mode>carp</mode>
<interface>opt2</interface>
<vhid>3</vhid>
<advskew>0</advskew>
<password>wifipass</password>
<descr><type>single</type>
<subnet_bits>24</subnet_bits>
<subnet>192.168.4.10</subnet></descr></vip></vip></vip></vip></virtualip>pfSense RIGHT config.xml:
<virtualip><vip><vip><mode>carp</mode>
<interface>wan</interface>
<vhid>1</vhid>
<advskew>100</advskew>
<password>wanpass</password>
<descr><type>single</type>
<subnet_bits>24</subnet_bits>
<subnet>192.168.1.50</subnet></descr></vip>
<vip><vip><mode>carp</mode>
<interface>lan</interface>
<vhid>2</vhid>
<advskew>100</advskew>
<password>lanpass</password>
<descr><type>single</type>
<subnet_bits>24</subnet_bits>
<subnet>192.168.3.10</subnet></descr></vip>
<vip><vip><mode>carp</mode>
<interface>opt2</interface>
<vhid>3</vhid>
<advskew>100</advskew>
<password>wifipass</password>
<descr><type>single</type>
<subnet_bits>24</subnet_bits>
<subnet>192.168.4.10</subnet></descr></vip></vip></vip></vip></virtualip> -
editing…
Those came through in e-mail before you edited them out, and it looks like you might have hit a bug that I fixed the other day that made them both show up as secondary instead of primary, but that shouldn't have made them in recover/peer-known state, but both in communications-interrupted state. Should be OK in current snapshots though.
-
editing…
Those came through in e-mail before you edited them out, and it looks like you might have hit a bug that I fixed the other day that made them both show up as secondary instead of primary, but that shouldn't have made them in recover/peer-known state, but both in communications-interrupted state. Should be OK in current snapshots though.
Ok,
Sorry my pfsense crashed… I am retesting :-).
-
All work now.
Thank