Forcing traffic that comes in on an interface to leave on the same interface in Azure
-
I'm using pfSense in Azure, and I am using the HAProxy package to distribute traffic internally. Externally we have the MS load balancer that sprays traffic to the pfSense boxes, with Client IP affinity.
If I use HAProxy in a single NIC configuration, that is we have just the single pfSEnse WAN interface tied to a single Azure Network Interface, then it all works fine. In this configuration I have HAProxy listening on the WAN interface using an artificial port (port 1080) and it sends traffic to the real servers on HTTP, the MS load balancers listen publicly on port 80 and spray the traffic between the two pfSense boxes that are identically configured on port 1080.
To summarise we go [PUBLIC MS LOAD BALANCER] (port 80) -> [pfSense HAProxy] (port 1080) -> [real servers] (port 80) and if I connect to the website externally this all works fine. This is working only in that single NIC configuration, the problem comes if I want to put in a second NIC and move the HAProxy service to listen on the second NIC.
What I see from Microsoft side is that the pfSense box is not responding, so presumably they are sending the health probe checks to the 2nd NIC but not receiving a satisfactory response that port 1080 is open. When I'm connected to Azure, the pfSense works fine on the 2nd NIC (put in the 2nd NIC IP and connect directly to port 1080) and a website comes up...but MS just doesn't like it.
A little diag shows me that when packets come in from MS to the 2nd NIC, the return packet goes out using the IP of the 2nd NIC but it goes out through the primary NIC. My limited understanding of Microsoft Azures odd way of routing traffic, it probably doesn't like this. I had seen this by running tcpdump, if I monitor the 2nd NIC for incoming health probes (tcpdump -i hn1) I will see something like :-
10:52:42.475772 IP 168.63.129.16.55291 > [NIC2IP].1080: Flags [SEW], seq 3334240397, win 8192, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
That's a packet from Microsofts standard IP for health probes coming in to the 2nd NIC, but I never see a reply packet.
Now If I monitor the primary NIC for similar conversations I see this :-
10:52:30.493419 IP [NIC2IP].1080 > 168.63.129.16.55176: Flags [S.E], seq 1659822301, ack 2945126825, win 65228, options [mss 1460,nop,wscale 7,sackOK,eol], length 0
So I think that the outgoing traffic, although originating from the correct IP in tcpdump, seems to be sent out the wrong NIC. This could be deemed as mac address spoofing, which isn't supported in Azure...or just some kind of asymetric routing or something?
I don't really know, but is there anything we can specifically change on pfSEnse to force the traffic to go back on the same NIC?
-
I've proved out that the problem lies with the split routing, by issuing the following commands :-
route add -host 168.63.129.16 -ifp hn1 [msdefaultGW]
route add -host [externalfixedIP] -ifp hn1 [msdefaultGW]This is locking down that any packets going out of the box towards the MS probe IP and a fixed IP on the internet are routed through the 2nd interface.
This all works fine, except that I can't work with individual static routes I need any traffic that comes in on the 2nd NIC to go back out the 2nd NIC.