Pinging CARP - ICMP DUP reply



  • Hello,

    I have a strange CARP issue resulting in DUP ICMP requests when pinging a CARP address.

    My Setup:

    I have two (two node) pfSense 2.0.3 clusters each configured with several CARP addresses, and syncing properly across the pfSync interface.

    Both cluster pairs are running in VMware ESXi 5.1 and I've made sure to enable the Net.ReversePathFwdCheckPromisc = 1 switch on both ESXi nodes (part of vCenter cluster).  I'm using a VDS with port groups assigned to pfSense clusters, the port groups have promiscuous mode and the other security settings enabled.  The ESXi hosts have been rebooted after the addition of the net.reversepath… switch.

    Both firewall pairs have any - any rules on interfaces depicting ping requests below:

    Lets call the two cluster pairs are PUB  and  INFRA.

    From PUB, I ping a interface (not CARP) on INFRA and am able to successfully get a ping response.  From PUB when I ping a CARP address on INFRA, I get a DUP! ping response.

    Similar to the following:

    64 bytes from 10.1.250.1: icmp_seq=195 ttl=64 time=0.241 ms (DUP!)
    64 bytes from 10.1.250.1: icmp_seq=196 ttl=64 time=0.269 ms
    64 bytes from 10.1.250.1: icmp_seq=196 ttl=64 time=0.376 ms (DUP!)
    64 bytes from 10.1.250.1: icmp_seq=197 ttl=64 time=0.304 ms
    64 bytes from 10.1.250.1: icmp_seq=197 ttl=64 time=0.376 ms (DUP!)
    64 bytes from 10.1.250.1: icmp_seq=198 ttl=64 time=0.234 ms
    64 bytes from 10.1.250.1: icmp_seq=198 ttl=64 time=0.262 ms (DUP!)
    64 bytes from 10.1.250.1: icmp_seq=199 ttl=64 time=0.271 ms
    64 bytes from 10.1.250.1: icmp_seq=199 ttl=64 time=0.377 ms (DUP!)
    

    From INFRA, I ping a interface (not CARP) on PUB and am able to successfully get a ping response.  From INFRA when I ping a CARP address on PUB, I get a DUP! ping response.

    From either PUB or INFRA when I ping its respective CARP address I do not get a duplicate response.  I only get a DUP ICMP response when pinging the other cluster's pair CARP address.

    Looking for some guidance in understanding / resolving this.

    Thanks for your help.



  • I've made progress in troubleshooting this problem, no resolution yet.  Here are my findings:

    • I have two ESXi hosts running 5.1 Update 1.
    • If the active CARP node (master) is on the same host as a Windows 2008R2 Server VM I get DUP ping packets when VPNed through the firewall.
    • If the active CARP node (master) is NOT on the same host as a Windows 2008R2 Server VM pings come through normally.
    • I have the pfSense WAN and LAN interfaces setup in their own dedicated port groups with promiscuous modes enabled.
    • I have the Windows LAN interface in a port group on the same VLAN as the pfSense LAN with promiscuous modes enabled disabled.
    • Both ESX servers (I've reverified this yesterday) have the net.reverse… flag set and have been rebooted.  I've even reset the flags with reboots to make sure the change took effect.
    • The solution now is to separate pfSense VMs from these Windows VMs (make sure they are on separate hosts)

    I would like to understand the root cause of this and ultimately seek some direction to a resolution.

    Thank you.



  • @deeepdish:

    • If the active CARP node (master) is on the same host as a Windows 2008R2 Server VM I get DUP ping packets when VPNed through the firewall.

    Sounds a bit like the reverse of this:
    http://doc.pfsense.org/index.php/CARP_Secondary_Unreachable_Over_VPN

    What does the ARP table say after those DUP packets? Is the Windows Server adding ARP for both VMs?



  • I get dupe replies when I ping the Virtual CARP IP (internal and external interfaces in my setup) while running pfSense virtualized in SmartOS without any Windows hosts present.



  • I have a similar problem and very similar setup - 3xESXi hosts, master and backup CARP pfsense firewalls

    We have multiple subnets which all go through the firewall.
    If I ping anything outside of the subnet which needs to go over the gateway (master pfsense firewall) then I get the DUP! issue when the master firewall is on the same ESXi host as the servers which are pinging to and from each other.

    I have set promiscuous on all port-groups/switches and the net.reverse… flag is set on all ESXi hosts and rebooted.

    I cannot use your workaround of keeping the firewalls on a separate host as I need to distribute my VMs on all host.

    Linux hosts are in one subnet and windows in another. I'm going to try this with linux in both to see if this could be a windows issue.

    Any thoughts?



  • Afaik Net.ReversePathFwdCheckPromisc = 1 should only be necessary when running redundant uplinks from the vSwitch to a physical switch.

    Either way, have you been able to obtain a packet capture from both cluster members, the windows server, and your client when the problem occurs?
    That should help figuring out who is doing what when.


  • Rebel Alliance Developer Netgate

    Confirming via packet capture on each leg would be helpful. We've had a couple reports of this but no solid leads, it either clears itself up when other things (e.g. switched) are changed/upgraded or goes away before the source can be found.

    One customer's packet captures suggested it was the request being duplicated, and pfSense was just responding to each request it saw.



  • Hi,

    I can arrange packet captures today, I assume you need captures on the server sending the ping and on the end receiving the ping?

    Thanks



  • Since I opened this thread months ago, we have gone through many iterations / escalations with pfSense support as well as VMware.  For the record, we are strictly working with VMware ESX 5.1, not sure if the same symptoms / solution applies to other platforms / versions.

    In short:  It's a vSwitch / VDS limitation.  It's a forward only switch, and the issue is around multiple uplinks.  My network guys can provide a better analysis, however the solution was ensure that the switch any CARP traffic is connected to only has one uplink to the outside.  Multiple uplinks (even in standby mode) resulted in packet duplication.

    We now have a dedicated firewall ESXi cluster, running multiple instances of pfSesne (CARP) and respective uplinks groups only have one port defined.  To provide redundancy we have > 1 ESXi node, HA / DRS enabled, with appropriate affinity groups keeping our firewall services highly available.

    As a side note:  CARP, VRRP, HSRP are similar in their origin, operation and implementation.  We have a VRRP cluster (keepalived) running without any issues on the same vSwitch / VDS infrastructure that the problematic pfSense CARP cluster resulted in packet duplication.  Perhaps there's something in CARP that can / should be tweaked?



  • @deeepdish:

    Since I opened this thread months ago, we have gone through many iterations / escalations with pfSense support as well as VMware.  For the record, we are strictly working with VMware ESX 5.1, not sure if the same symptoms / solution applies to other platforms / versions.

    In short:  It's a vSwitch / VDS limitation.  It's a forward only switch, and the issue is around multiple uplinks.  My network guys can provide a better analysis, however the solution was ensure that the switch any CARP traffic is connected to only has one uplink to the outside.  Multiple uplinks (even in standby mode) resulted in packet duplication.

    We now have a dedicated firewall ESXi cluster, running multiple instances of pfSesne (CARP) and respective uplinks groups only have one port defined.  To provide redundancy we have > 1 ESXi node, HA / DRS enabled, with appropriate affinity groups keeping our firewall services highly available.

    As a side note:  CARP, VRRP, HSRP are similar in their origin, operation and implementation.  We have a VRRP cluster (keepalived) running without any issues on the same vSwitch / VDS infrastructure that the problematic pfSense CARP cluster resulted in packet duplication.  Perhaps there's something in CARP that can / should be tweaked?

    This is interesting.
    We have basically the same setup here, but use the multiple uplinks for bandwidth and failover purposes and hence need them. Do you have any more info from the network guys about this or perhaps someone can pinpoint the issue further so that we can solve it while keeping multiple uplinks?



  • @deeepdish:

    Since I opened this thread months ago, we have gone through many iterations / escalations with pfSense support as well as VMware.  For the record, we are strictly working with VMware ESX 5.1, not sure if the same symptoms / solution applies to other platforms / versions.

    In short:  It's a vSwitch / VDS limitation.  It's a forward only switch, and the issue is around multiple uplinks.  My network guys can provide a better analysis, however the solution was ensure that the switch any CARP traffic is connected to only has one uplink to the outside.  Multiple uplinks (even in standby mode) resulted in packet duplication.

    We now have a dedicated firewall ESXi cluster, running multiple instances of pfSesne (CARP) and respective uplinks groups only have one port defined.  To provide redundancy we have > 1 ESXi node, HA / DRS enabled, with appropriate affinity groups keeping our firewall services highly available.

    As a side note:  CARP, VRRP, HSRP are similar in their origin, operation and implementation.  We have a VRRP cluster (keepalived) running without any issues on the same vSwitch / VDS infrastructure that the problematic pfSense CARP cluster resulted in packet duplication.  Perhaps there's something in CARP that can / should be tweaked?

    I'm having a similar issue, but can't afford to have a dedicated cluster for the pfSense instances. So I configured the dvSwitch ports used by the pfSenses so that they use only one uplink (and only on these ports, since we need uplink redundancy for the other vms), and the duplicate pings immediately stopped. So far so good !



  • Nevermind, DUPs are back…



  • I know this is an old thread. I got similar issue, would like to share how i fixed this. I just disable ipv4 and ipv6 in the host nic that causes the dup icmp.



  • On vphere edit your distributed port group then 'teaming and failover' and on failover order :

    • Active Uplinks : uplink 1
    • Standby uplinks : uplink 2


  • Hi,

    I had the same problem, using VIP + CARP, followed all best practices for pfSense and still got DUP echo reply.
    Thanks to camembert, problem for me was uplink 1 and 2 set to "active", after uplink 2 was set to standy, it worked fine.



  • Worked fine to me, I changed the teaming only in Port Group used by CARP (not distributed vswitch)!

    Thanks friends!



  • You can have both uplinks active if you enable this advanced host parameter: Net.ReversePathFwdCheckPromisc  (see pfSense Troubleshooting guide)

    By the way I discovered today that if your VM has "VM DirectPath IO" enabled it bypass this parameter and you will have duplicated packet again.