Routing-problem with site-to-site-connection via multiple VPN
I have a problem with 2 pfSense-Boxes, that drives me crazy now for a few days. Hopefully one of the experts here will see immediately what is wrong or missing.
2 sites with pfSense-Boxes in identical configuration.
On every site we have one low latency SDSL-connection and 2 fast VDSL-connections with relatively bad latency.
On site A we have a Windows-Domain with file server, mail server, etc. AND an ELASTIX-Voip-Server.
There are different subnets on each site.
To route the voip-traffic between the two sites (elastix – site B and back) through the SDSL-connection and any other traffic between the 2 sites through the 2 VDSL-connections.
What was done so far:
3 OpenVPN site-to-site connections have been established. The first over the SDSL-connections and two over the VDSL-connections. For every connection a gateway was set up to be able to handle the connections by policy-based routing. The 2 VDSL-OpenVPNs are bundled as a gateway-group for load-balancing and failover.
To have a stable test-enviroment, I forced the SDSL-OpenVPN to be established first, so that I know the entry in the routing table leads traffic between the two subnets through this tunnel.
Two firewall rules under the “LAN”-tab have been established on every site.
First rule on site A: Every traffic with source elastix-machine and destination subnet site B has to go through the gateway of the SDSL-OpenVPN-connection. This rule is the topmost in the table.
Second rule on site A: Every traffic with source subnet A and destination subnet B has to go through the OpenVPN-gateway-group.
Corresponding rule on site B:
First: Every traffic with destination elastix-machine and source subnet site B has to go through the gateway of the SDSL-OpenVPN-connection. This rule is the topmost in the table.
Second rule on site B: Every traffic with source subnet B and destination subnet A has to go through the OpenVPN-gateway-group.
What is happening:
Not every traffic seems to be handled by the rules, only some traffic. But the errors are stable and reproducable, so I hope the reason can be figured out.
File transfer initiated from site A, files being pushed to site B: Traffic goes through the right connection (the OpenVPN-gateway-group), but I can already see in the traffic meter, that the acknowledge-packages come back over the wrong (the SDSL-OpenVPN) connection.
File transfer initiated from site A, files being pulled from site B: Traffic goes through the wrong connection (the SDSL-OpenVPN), the acknowledge-packages go back over the right (the OpenVPN-gateway-group) connection.
File transfers from initiated from site B show the exactly same behavior. Pushing files - data over right connection, acknowledgment-packages wrong connection. Pulling files - data over wrong connection, acknowledgment-packages right connection.
The same happens with the voip-traffic from or to the elastix-machine. Depending from which site the call was initiated, incoming and outgoing traffic is split between SDSL and VDSL.
So what can I do to solve the problem? Any help is greatly appreciated!
The packets coming in the opposite direction do not get processed through the rule-set, because there is already state set up for them by the packets that first arrived across the OpenVPN link. So the rules you have on the opposite-end LAN that feed into the appropriate gateway/group for the OpenVPN you want, do not happen.
The problem will be related to the "pf" "reply-to" option, that needs to be on the rules that allowed the traffic in to the opposite site on the OpenVPN - I suspect that the rule generation in pfSense does not do that quite right in some circumstances???
Have a look in /tmp/rules.debug for "reply-to" and you might be able to work out what is going on.
pf man page: https://www.freebsd.org/cgi/man.cgi?query=pf.conf&sektion=5
also has documentation about reply-to.
Thank you very much for your answer and hints, Phil!
I'll have a look at this today or tomorrow, and as soon as there is something new, I'll report back.
It seems that your guess was right, Phil.
As I could see in the rules.debug-file, there is no "reply-to" on all my rules related to the routing.
Is there any possibility to force pfSense to add this option? Perhaps to manually add the option?
Or, as workaround, to reset the states of the packets coming back, so that they are correctly handled by the rules on the other site?
I just had a play with my home pfSense site-to-site client up to one of our offices. Looking at the code in filter.inc, the rule needs to be:
- On an interface that has a gateway.
- The gateway IP needs to be known.
I did this:
- Interfaces->(assign) an interface to the OpenVPN site-to-site client
- Enable the interface, and give it an IP address and add a gateway that will match the IP address and gateway that the underlying OpenVPN tunnel will end up with (e.g. In my case Interface IP 10.49.255.2/24 and gateway 10.49.255.1)
- After save/apply the interface changes, it did not work straight away. I edited the OpenVPN client (changing nothing) and save.
- Firewall->Rules - add a pass rule on this new interface.
On the dashboard the interface and gateway show up. The gateway has RTT and loss figures, so apinger is happily monitoring it.
There is now a rule in /tmp/rules.debug
pass in quick on $OPT8 reply-to ( ovpnc1 10.49.255.1 ) inet from any to any tracker 1419949944 keep state label "USER_RULE: Allow all on OPT8 from ICO"
It seems a shame that the system cannot automatically work out the underlying OpenVPN tunnel IP addresses. After enabling the interface with IPv4 type none, a gateway automatically appears on-the-fly, but the code in filter.inc cannot find the underlying OpenVPN tunnel gateway end IP to use for the reply-to. This message appears in the system log:
/rc.filter_configure_sync: Could not find IPv4 gateway for interface (opt8).
and that is why I was forced to manually set an interface IPv4 address and gateway.
Note: Both ends of this are on 2.2-RC, so I am not sure if this will all work on 2.1.5, but I expect so.
I can still reach the office at the other end of the OpenVPN, so it did not break. If I had multiple OpenVPN links on different paths between home and office then I could test if the reply-to actually makes the traffic follow the required links.
Hopefully you can make something workable out of this.
first of all: Thank you so much for your efforts!
At first look, my configuration looks exactly the same as yours.
Every interface has a gateway. As gateway address I always took the address of the tunnel endpoint of the other site.
I will also give an example: In my case Interface IP 10.0.3.3/24 and gateway 10.0.3.2 (Server site), Interface IP 10.0.3.4/24 and gateway 10.0.3.1 (Client site)
I knew also your point 3) in the configuration, that I had to simply edit and save the OpenVPN configuration without making changes to make the interface work.
Also the pass rules for the interfaces are in place.
At this pass rule in the rules.debug I can see the reply-to option as well.
But in my case the reply-to does not appear in any of the rules I established under the LAN-tab for forcing the specified traffic through the appropriate tunnel.
And that seems to be the reason for the problems I encounter.
Is the mistake I made to set up the rules under the LAN-tab? For me that seems to be the logical place. Is there a knot in my brain somewhere?
Looking at a few debug.config files from pfSense-machines now, I had to realize, that there is nowhere a reply-to at LAN-rules.
There should be no need for reply-to on LAN because returning traffic for a LAN client has only 1 way to be delivered anyhow - directly on the LAN.
Put a pass rule on each of the OpenVPN interfaces you have created (not on the generic OpenVPN rules tab). That will allow traffic arriving from remote sites across the OpenVPN. As long as the OpenVPN interface has an IP address and gateway, those rule/s should get "reply-to" on them, and so the traffic back that matches the states created by those rule/s will be routed routed across the right OpenVPN link, as per the "reply-to".
I have these pass-rules on every OpenVPN-interface from the very beginning…
… and they all have reply-to, as I can see in the rules.debug.
Could be the OpenVPN protocol or device mode of any relevance?