NAT issue with CARP cluster
-
Hello. I have 2 XenServer 6.5 physical hosts and each of them is running a virtual machine with pfSense 2.2.4. Each pfSense has 4 network adapters attached to it: 1 WAN, 1 SYNC (LAN for pfsync), 1 LAN for the LAN network and finally 1 internal network that's visible only in the XenServer where the particular pfSense is running. Unfortunately I have absolutely no control over the WAN and LAN networks as those are provided to me by my service provider. I can just assign particular IPs from those networks and use them. Anyway, I've configured a CARP cluster between the two firewalls. There are 2 virtual IPs - one for the WAN interfaces and one for the LAN interfaces. In addition to this I've created a bridge between the LAN network and the internal networks on each of the pfSense firewalls. Note that the internal networks are in a different network and subnet than the LAN network one. Finally I've created another virtual IP which is on the LAN interfaces of both firewalls, but is actually from the internal network (I'll name it VLAN VIP).
Everything described above works as expected and my CARP cluster is synchronized properly. I've also configured my NAT rules so that my outbound WAN traffic is always through the WAN VIP. I have 2 virtual machines (1 on each XenServer) that only have a network adapter from the XenServer internal networks. They use the VLAN VIP as a gateway and they can see each other, because both internal networks are actually bridged with my LAN network on both firewalls. And here is the problem - the virtual machine that's under my master pfSense has Internet access, i.e. NAT is working correctly, and the one that's under my slave pfSense doesn't have Internet access. What's even more interesting is the fact that when I run a traceroute from both virtual machines to an address in Internet, both traceroutes succeed. In addition to this when I ping a WAN address from the working virtual machine, everything works fine. When I try to ping a WAN address from the non-working virtual machine, the first ping request succeeds and all the rest are lost.
Could you be able to point out anything that I might be missing in this rather unusual and complicated configuration? I'll be happy to share more details if that would help in solving the issue.
-
any update on this?
We had something similar go on when using the secondary gateway as master. Would work for a short amount of time and then end up failing.
I am leaning towards an ARP cache issue on the FIOS modem/router combo that we pass through