Problems with OpenVPN routing with hub and spoke configuration
-
I have three sites. HQ, Branch 1, and Branch 2. HQ is the hub. There is a site to site connection between Branch 1 <> Hub and Branch 2 <> Hub.
Traffic can move between Branch 1 and Hub and Branch 2 and Hub. What is not working is traffic moving from Branch 1 -> Hub -> Branch 2. Also, traffic moving from Branch 2 -> Hub -> Branch 1 doesn't work either.
I do not have a single server at HQ for both Branch 1 and Branch 2. HQ has a separate server for each Branch. The reason I'm doing this is that I'm using data channel offload (DCO). DCO does not work when you have multiple clients per server according to the documentation.
Here is the server config for HQ <> Branch 1 Server:
IPv4 Tunnel Network: 172.31.4.0/24 (I understand I don't need /24, force of habit. /29 would work just as well, but it probably doesn't matter here).
IPv4 Local Networks: 10.2.0.0/16 (LAN in HQ), 10.8.0.0/16 (LAN in Branch 2)
IPv4 Remote Network: 10.4.0.0/16 (LAN in Branch 1)
The tunnel network IP address is 172.31.4.1Server config for HQ <> Branch 2 Server:
IPv4 Tunnel Network: 172.31.8.0/24
IPv4 Local Networks: 10.2.0.0/16 (LAN in HQ), 10.4.0.0/16 (LAN in Branch 1)
IPv4 Remote Network: 10.8.0.0/16 (LAN in Branch 2)
The tunnel network IP address is 172.31.8.1Client Config in Branch 1:
IPv4 Tunnel Network: blank, pushed from server at HQ
IPv4 Remote Networks: 10.2.0.0/16 (LAN in HQ), 10.8.0.0/16 (LAN in Branch 2)
The tunnel network IP address is 172.31.4.2Client Config in Branch 2:
IPv4 Tunnel Network: blank, pushed from server at HQ
IPv4 Remote Networks: 10.2.0.0/16 (LAN in HQ), 10.4.0.0/16 (LAN in Branch 1)
The tunnel network IP address is 172.31.8.2There are no client specific overrides in use on the router at HQ.
Now, the firewall actually does seem to pass the traffic one direction. From a host in Branch 2, I can ping a host in Branch 1, and the router in Branch 1 records the ICMP traffic coming in and allows the traffic. However, no replies to the pings come back to the host in Branch 2. It is also worth noting that the router in Branch 1 sees the incoming traffic as coming from 172.31.8.2, the tunnel IP address of the router at Branch 2. I would've expected to see the ip address of the host in Branch 2.
Pings from HQ to hosts in either Branch 1 or Branch 2 return as expected.
I presume this is some kind of reverse routing mismatch, but I don't see it looking at the routing tables.
Edited to add: It is worth noting that client to site vpn clients (also OpenVPN) connecting to HQ from a remote location can have their traffic routed correctly. If I initiate the client to site connection from my workstation at a hotel to HQ, then traffic traveling to HQ, Branch 1, or Branch 2 will work fine. Traffic coming in to HQ through a client to site tunnel routes as you would want it to into the site to site tunnels.
-
I believe I solved my own problem. Posting the solution here for anyone else who may encounter a similar problem in the future.
It occurred to me that each VPN server at HQ defined a separate tunnel network. Upon further examination, there were no routing table entries on the router at HQ to move traffic from the tunnel network for branch 1 to the tunnel network for branch 2, and vice versa.
Tunnel network for branch 1 is 172.31.4.0/24. For branch 2 its 172.31.8.0/24
For both servers defined at HQ, in IPv4 local networks, I put in an additional entry. 172.31.0.0/16. This subnet covers all possible tunnel networks I might define that start with 172.31.X.X.
This resolved the issue. Traffic can now move from branch 1, to hq, to branch 2 vice versa without issue. I do not know for sure if this solution is "proper", but I do know that it works and it does this by creating the needed routing table entries to move traffic from one tunnel network to another.
This was never an issue when I had a single server with many clients, because all clients existed in a single tunnel network, but when you have one client to one server, they all have separate tunnel networks, making the extra routing entries a necessity.
The only reason I bothered with this is to use DCO, and it does make a big difference for our offsite backups, so it was worth the trouble.
-
This post is deleted!