Both nodes believe they are master

dkitchen

I've been working on this for several hours so any help much appreciated.

Have a two node PF cluster set up across two VMware nodes with several virtual NICs, these are attached to a vswitch on various different VLANs and trunked back to a cisco 6500. The WAN interface is an L3 VLAN on the 6500 while the other ones are simply L2 VLANs.

It was all working fine until I removed a VIP, then we started seeing packetloss on the WAN interface. On further inspection it appeared that the VIPs on the WAN interface were becoming master on both primary/secondary.

I thought we had run into the issue with VMware vswitches where the multicast packets come back to you because of the NIC teaming, we were seeing blocked multicast traffic in the f/w logs. Subsequently enabled Net.ReversePathFwdCheckPromisc as suggested, traffic no longer appears in the logs but the problem is still occurring.

Doing a packet capture on the backup shows it is receiving the carp advertisements so I have no idea why it would be trying to fail over.

16:38:54.718901 IP carp-virtual-ip > 224.0.0.18: VRRPv2, Advertisement, vrid 11, prio 240, authtype none, intvl 10s, length 36
16:38:54.718993 IP carp-virtual-ip > 224.0.0.18: VRRPv2, Advertisement, vrid 10, prio 240, authtype none, intvl 10s, length 36
16:38:54.718999 IP carp-virtual-ip > 224.0.0.18: VRRPv2, Advertisement, vrid 9, prio 240, authtype none, intvl 10s, length 36
16:38:54.719005 IP carp-virtual-ip > 224.0.0.18: VRRPv2, Advertisement, vrid 8, prio 240, authtype none, intvl 10s, length 36
16:38:54.719011 IP carp-virtual-ip > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 240, authtype none, intvl 10s, length 36
16:38:54.719017 IP carp-virtual-ip > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 240, authtype none, intvl 10s, length 36
16:38:54.719023 IP carp-virtual-ip > cisco-6500-gateway: ICMP echo request, id 22298, seq 3072, length 44
16:38:54.719029 IP carp-virtual-ip > cisco-6500-gateway: ICMP echo request, id 22298, seq 3328, length 44
16:38:54.719035 IP carp-virtual-ip > cisco-6500-gateway: ICMP echo request, id 22298, seq 3584, length 44
16:38:54.719041 IP carp-virtual-ip > cisco-6500-gateway: ICMP echo request, id 22298, seq 3840, length 44
16:38:54.719047 IP carp-virtual-ip > cisco-6500-gateway: ICMP echo request, id 22298, seq 4096, length 44
16:38:54.719053 ARP, Request who-has cisco-6500-gateway tell primary-firewall, length 28
16:38:54.719059 ARP, Request who-has cisco-6500-gateway tell primary-firewall, length 28
16:38:54.719065 ARP, Request who-has cisco-6500-gateway tell primary-firewall, length 28

Only thing I can note is that there is ICMP heading to the 6500 from the carp virtual IP, surely if this host is a backup and not a master it should not be sending traffic that originates from that address?

Any advice appreciated.

Thanks.

Dan

dkitchen

I've been looking at this further, some other people are reporting similar issues where the CARP traffic is being NAT'ed.

I reset NAT to use the default rules, it appears the CARP packets are still being NAT'd to one of the VIPs on the outside.

Additionally, if I change the NAT rule to use the interface address we have no connectivity at all.

Very bizzare, any ideas anyone?

cmb

Your outbound NAT rules cannot NAT CARP, make sure they're configured accordingly to not match "any" source, or the interface IP as source.

mdima

Hey,
with 2.0.1 the NAT is not a problem anymore. If you still have the problem, maybe could be a "traffic shaping" queue. I mean, the CARP traffic can be dropped under heavy traffic, and this can bring to an inconsistent CARP status between the master and the slave box.

I am not sure, I figure out that could be a traffic shaping problem today… this is my post:
http://forum.pfsense.org/index.php/topic,45045.0.html

Ciao,
Michele