Azure Load Balancer Probe IP Routing
-
We're running into some problems deploying an active / active pfSense+ config in Azure while utilizing both internal and public Azure standard load balancers.
Azure uses the IP address 168.63.129.16 for its load balancer probe, VM agent (waagent), and a few other things. By default, Azure adds a route to this IP on the primary VM interface (in this case, WAN). This route allows the VM to grab the waagent config, periodically check in to Azure via HTTP requests, and also reply to probe requests for the public load balancer (WAN side). So far, so good.
The problem arises when we try to add the internal load balancer (ILB) into the mix. Since the ILB needs to probe the LAN interfaces with this same 168.63.129.16 IP address, the firewalls do receive the SYN on the LAN interface but reply back via the WAN interface, so the ILB never thinks the internal pool is up.
In researching other posts about this issue, one common recommendation is to manually create a new gateway in pfSense pointing to the Azure LAN subnet gateway IP with monitoring disabled, then create a static route for this 168.63.129.16 IP pointing to the newly created gateway. You also have to manually delete the auto created WAN routes to this IP via the shell and create new routes for this IP pointing to the Azure LAN subnet gateway address. We created an rc.d script to automatically perform these modifications at bootup.
This then got us into a position where both the LAN and WAN interfaces were properly responding to both the internal and public load balancer probe requests, but it also broke the Azure VM agent (waagent). The VM was no longer able to initiate a new connection to the 168.63.129.16 address for agent communications, since it was trying to route out the secondary LAN interface instead of VM's primary SDN interface (WAN). Without a functioning Azure VM agent, we can't perform VM backups and a lot of functionality is broken.
Our feeling at this point is that we need to leave the default Azure assigned routes in place to initiate new traffic to 168.63.129.16 via the WAN interface, and the public load balancer and Azure agent will continue to function fine. What we need is to have inbound requests from 168.63.129.16 on the LAN interface travel back out the LAN interface via the Azure LAN subnet gateway IP.
We've tried using policy based routing (PBR) by setting the gateway option to the manually created Azure LAN subnet gateway in the LAN rule for incoming probe requests, but this didn't work (PBR only works with WAN interfaces?) We also tried testing some NAT and port forward configurations to trick the ILB into thinking the probe port is available, but didn't have any luck.
Other firewall platforms mention the use of multiple virtual routers / routing tables to solve this issue, but pfSense doesn't have either option available.
Anything else we can try at this point?
Thanks,