Secondary pfSense rendomly setting itself as CARP MASTER
-
What shows on Status > CARP on the secondary?
-
@Derelict Okay, then since the VRRP and CARP ID seem to be different, there is no apparent need for me to change anything.
I agree that I don't need to change anything if there is no evidence.
The CARP status on the secondary shows now all Backup, and when the problem happens, it shows the WAN as Master. I have 19 VLANs configured with CARP
Here is the current status:
-
Any weird messages like a demotion set on that page or anything?
When a secondary that should be BACKUP goes to MASTER it is prettty much invariable that it has stopped receiving heartbeats on that network from the master.
What does this show on both primary and secondary:
sysctl -a | grep carp
-
@Derelict No weird messages, and I don't recall seeing any of such messages when the secondary is acting up. Nonetheless, I will reproduce the problem again tonight and show you if any message like that shows up.
This is why I was surprised to see the secondary is picking up the messages (from a packet capture) from the primary when they are both showing up as master, and yet, it seems for some reason is not going back to be Backup.
Here is the result of the commands on both. The primary is pfSense01 and the secondary is pfSense02
[2.4.4-RELEASE][admin@pfSense01.localdomain]/root: sysctl -a | grep carp
device carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 0
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 0[2.4.4-RELEASE][admin@pfSense02.localdomain]/root: sysctl -a | grep carp
device carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 0
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 0 -
That all looks perfectly normal.
The capture posted clearly shows the secondary using an advskew of 240, yet you say the advskews are all set at 100 as they should be.
Another interesting data point would be the output of
ifconfig -a
at the time. -
@Derelict If you see the other captures posted, it shows a skew of 100 configured on the secondary's GUI, it's strange that it says 240 through the console.
Also, I logged into the primary, and it triggered the secondary picking up as Master, so I had to reboot it, but I got you a screenshot first, and the capture of the commands you asked, while the secondary was acting as Master.
This was while Both, the primary and the secondary were showing up as Master for the WAN:
[2.4.4-RELEASE][admin@pfSense01.localdomain]/root: sysctl -a | grep carp
device carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 0
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 0[2.4.4-RELEASE][admin@pfSense02.localdomain]/root: sysctl -a | grep carp
device carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 0
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 0For the interface settings, I created a pastebin since the output is giant, and to not completely clog this thread:
Again, While the secondary was acting as Masterhttps://pastebin.com/4dek4qGJ
-
The most relevant that I see is with the WAN interface:
The primary:
hn1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=48001b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,LINKSTATE,TXCSUM_IPV6>
ether 00:15:5d:34:44:15
hwaddr 00:15:5d:34:44:15
inet6 fe80::215:5dff:fe34:4415%hn1 prefixlen 64 scopeid 0x6
inet 205.251.108.165 netmask 0xffffffe0 broadcast 205.251.108.191
inet 205.251.108.169 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.170 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.171 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.172 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.173 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.174 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.175 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.176 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.177 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.178 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.179 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.180 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.181 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.182 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.183 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.184 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.164 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (10Gbase-T <full-duplex>)
status: active
carp: MASTER vhid 22 advbase 1 advskew 0The secondary:
hn1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=48001b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,LINKSTATE,TXCSUM_IPV6>
ether 00:15:5d:b5:91:0f
hwaddr 00:15:5d:b5:91:0f
inet6 fe80::215:5dff:feb5:910f%hn1 prefixlen 64 scopeid 0x6
inet 205.251.108.166 netmask 0xffffffe0 broadcast 205.251.108.191
inet 205.251.108.169 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.170 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.172 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.173 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.174 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.175 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.176 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.177 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.178 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.179 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.180 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.181 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.182 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.183 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.184 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.164 netmask 0xffffffe0 broadcast 205.251.108.191 vhid 22
inet 205.251.108.171 netmask 0xffffffe0 broadcast 205.251.108.191
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (10Gbase-T <full-duplex>)
status: active
carp: MASTER vhid 22 advbase 1 advskew 254For some reason, the last inet (205.251.108.171) does not show with a vhid on the secondary, and the first one is the interface IP address, so I think it's normal it doesn't show a vhid
-
The secondary thinks it has an interface down and has been demoted (hence advskew 254)
You have something screwed up somewhere. Sorry but with what we have that is the best I can do here.
My guess is something in Hyper-V. Hard to say. But these problems are almost always Layer 2 problems.
-
@Derelict Just so I understand it, this interface down would be the virtual one ended in 171 since it's not showing the vhid? Because there is only one interface, the WAN, and it's showing as active. The others are IP Alias of that interface made in pfSense
The only reason I'm doubting it could anything in Hyper-v is because this same machines were all working fine until I switched to a different datacenter provider, so there's got to be a change somewhere or they are messing something up with some traffic, or I configured something wrong.
Even now that it's showing as backup, it's showing an advskew of 254
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect (10Gbase-T <full-duplex>) status: active carp: BACKUP vhid 22 advbase 1 advskew 254
-
On a healthy system the primary would be showing skew 0, the secondary skew 100.
Check the system log for entries related to why it is changing the skew to 254.