Simple Firewall/OpenVPN/CARP/NAT/Hairpin/VLAN/Loopback question
-
OK, I lied. This question is not simple at all.
We have a two HA/CARP configured firewalls and two ISPs. Because of the two ISPs, I have OpenVPN listening on the loopback address, and an incoming port forward from a CARP address on each ISP (WAN side) forwarded to the loopback address. Remote users can connect using either address, and the client is configured using both public IP addresses (not DNS) for transparent and near instantaneous failover if one of the ISPs goes down. It all works quite well.
We have a WLAN in our office that puts people on a BYOD VLAN without direct access to our company network. Internet only, like a guest network. The goal is to require VPN authentication for untrusted machines, and the allowed traffic is locked down through rules on the OpenVPN interface. Prior to changing the OpenVPN server to listen on the loopback address, users were able to connect to this WLAN, VPN into the the WAN side of the firewall, then access the corporate network. It was a little weird, because the subnet of the BYOD VLAN is within the network space that exists behind the pfSense boxes, but it always worked. It also makes VPN deployment simple, because they have only one VPN configuration that they can use from home, but if they bring in a laptop, and they're authorized, they can get into the corporate network. Our corporate network provides WLAN access only to domain-joined machines.
So for this example, let's say our entire office network space is 10.0.0.0/19, our BYOD VLAN is 10.0.8.0/24, the VPN VLAN is 10.0.9.0/24, and the corporate network is 10.0.16.0/23. It makes remote routing easier, because all of the address space is behind a single network block, but it's weird, because I'm pushing a 10.0.0.0/19 route to a VPN client that has a VPN address within that space. But it works for remote users, and it worked recently for users on the BYOD VLAN. It doesn't seem to cause other problems, and I don't really think it's directly causing this issue, but maybe it's relevant.
Since changing OpenVPN to bind to the loopback address, users on the BYOD VLAN can no longer connect to the VPN. The VPN client attempts to make a connection to the WAN CARP address, then just times out. I don't think that the VPN is attempting to respond, but TCPDump doesn't work on a loopback address, so I can't tell if the incoming request is even hitting the OpenVPN interface on port 1194. And it's UDP, so I can't learn anything from a port probe. I tried a temporary "allow all" on the BYOD VLAN, and it still didn't work, and all traffic is allowed on the WAN side on port 1194, so I don't think it's a firewall rule.
I'm at a loss, and I don't know what tools I could use to track down the issue. I don't know if it's hitting port 1194 on the WAN side, and the return traffic isn't being routed correctly, or if there is some sort of problem with NAT reflection, maybe? I tried using different types of NAT reflection for that port forward, and none of them seemed to have any impact. Like I said, it worked when it was a direct interface listening on a WAN address, so maybe it has something to do with CARP?
Any direction would be greatly appreciated, and if there's any more information that would be helpful to know, I can add it.