FRR Routing - force same way back for incoming traffic
-
I take care of a total of 8 firewalls, spread around different sites. We connect them to each other, so that we can access various resources from other places. We have some problems, so I have set up a test system which resembles the real case, but is simpler and thus also easier to understand what is going on..
I have four firewalls (old ALIX machines) running pfSense 2.4.4-RELEASE-p3 (amd64). They are on a local network, so that the WAN for each looks like a local address (10.xx.xx.xx). I call them fw1, fw2, fw3 and fw4. They have internal networks 192.168.11.1, 192.168.12.1, 192.168.13.1 and 192.168.14.1, respectively.
I have connected them via VPN, as in the real system, via:
IPSec , with Phase 2 Routed (VTI).
I'm running FRR OSPF to share routing information.The configuration is:
fw1 is connected to fw2
fw2 is connected to fw3
fw3 is connected to fw4
fw4 is connected to fw1
giving a "full circle", and two paths between each pair of firewalls.I had a long text describing my setup in more detail, but when I tested the error didn't occur anymore. It might be gone, it might be due to mixing "Redistribute connected networks" and defining OSPF interfaces (which I've fixed), or might be due to a wrong interface setting, but I'm not sure - it might also be intermittent.
Short problem: I observed traffic taking a different route back compared to the route it took going in. IE ssh packets originating from ssh port 22, on their own, on a firewall which didn't see the incoming packet - and thus blocked these packets, resulting in no connection.
Has anyone had this problem, and knows when and why? I've seen it in my production system also. Is there a way to force packets to take the same way back, without figuring out which combination of cost metrics works (in the production system, this is not quite so easy - there's no "circle" in that case.
-
A similar problem is described here
-
The problem with the ring setup is a device connected firewall 1 talking to a device connected to firewall 3 has an equal chance of using path 1->2->3 as path 1->4->3 so you have traffic asymmetry which a stateful firewall has difficulty with since it needs to see a SYN->SYN/ACK->ACK handshake.
If you setup your VPN as full-mesh (all nodes must have a tunnel to all other nodes), then any one site is always one hop away from any other site. Other routes "exist" but they are more than one hop away, thus less preferred so you don't typically have asymmetric traffic problems. unless something goes wrong.
With 4 nodes its not too bad you only need 6 tunnels, but at 15 nodes it is quite challenging as you would need 105 distinct tunnels, and you'd certainly be wanting some form of automation to create/maintain it. Formula is: (num_nodes^2 - num_nodes) /2 -
Yes, the ring setup is to simplify and to test any possible solutions. The real system has 8 nodes, so 28 tunnels according to your equation - so some connections will go over double hops, especially those which do not see a lot of traffic. Obviously, finding a cost distribution which fixes the issue is not that easy either with 8 nodes.
If it were possible to have ACK traffic be forced to go the same way SYN traffic came, problem solved. Jimp hinted at something in the other thread, with a solution in 2.5.0, but it doesn't seem to work yet. (See here).