[SOLVED] Policy-Based Routing Not Consistently Going Out the Specified Gateway

Finger79

System: 2.4.0-RELEASE

Setup:

I have two OpenVPN client tunnels set up in a Tier 1 Gateway Group. Let's just call it the VPN Gateway.
I have LAN firewall rules set up so that traffic will go out the VPN Gateway gateway.
I want to make an exception for a tablet I'm using. I set up a DHCP reservation so that this tablet always gets issued a .103 IP address.
I have another LAN firewall rule set up so that if the Source is .103 to go out the WANGW Gateway, as opposed to the VPN Gateway gateway. This exception rule is higher than the rules below that normally route traffic through the VPNs.
Upon a fresh reboot of pfSense, the exception appears to work, and all traffic from this tablet go out the naked WAN. After some period of time, all traffic from the tablet goes out over the VPN, totally ignoring the firewall rule and policy-based routing.

Something is overriding the system routing table and the way policy-based routing should work.

luckman212

I've experienced a similar issue on 2.4.1. Still trying to track down whether it's a real issue or just something borked in my config. Either way, next time this happens to you, could you ssh in and grab a copy of /tmp/rules.debug and pastebin it?

Velcro

I was trying to policy route myself…here is a link of a conversation with some possible solutions. Not exactly the same but might help: https://forum.pfsense.org/index.php?topic=137498.msg752159#msg752159

Another suggestion is maybe making a "block rule" for your "WAN only" devices IP immediately after your "allow WAN deice rules"?

I can't speak to if this is a 2.4 upgrade issue, I have policy routing and haven't seen this issue, however I have seperate interfaces mostly.

V

Finger79

@luckman212:

I've experienced a similar issue on 2.4.1. Still trying to track down whether it's a real issue or just something borked in my config. Either way, next time this happens to you, could you ssh in and grab a copy of /tmp/rules.debug and pastebin it?

What part of rules.debug would help? I'd have to censor some of it.

Finger79

@V3lcr0:

Another suggestion is maybe making a "block rule" for your "WAN only" devices IP immediately after your "allow WAN deice rules"?

Probably wouldn't work since the firewall rule has already been triggered.

In other words, I have a "Allow Tablet out Naked WAN" rule for .103 traffic to go out the WANGW gateway, and it shows up properly in my logs. It simply lies and sends it out the VPN gateway instead. :P

@V3lcr0:

I can't speak to if this is a 2.4 upgrade issue, I have policy routing and haven't seen this issue, however I have seperate interfaces mostly.

This issue happened in 2.3.4-p1 as well. I was hoping 2.4.0 would fix it.

luckman212

@Finger79:

What part of rules.debug would help? I'd have to censor some of it.

Well definitely the #gateways section at least, probably the #aliases and #System aliases and # User Aliases as well.
You can change IPs or redact if you want

Finger79


#System aliases

loopback = "{ lo0 }"
WAN = "{ igb0 }"
LAN = "{ igb1 }"
PIA_1 = "{ ovpnc1 }"
PIA_2 = "{ ovpnc2 }"
OpenVPN = "{ openvpn }"

# Gateways
GWPIA_2_VPNV4 = " route-to ( ovpnc2 10.42.12.5 ) "
GWPIA_1_VPNV4 = " route-to ( ovpnc1 10.46.10.5 ) "
GWWANGW = " route-to ( igb0 [Public WAN IP] ) "
GWPIA_GROUP = "  route-to { ( ovpnc1 10.46.10.5 ) ( ovpnc2 10.42.12.5 )  }  round-robin  "

I'm not really using user aliases in any policy-based routing. Just the .103 for the tablet.

luckman212

That looks correct - next time the Policy Routing stops working (I'm guessing it's after one of the gateways flaps) & before doing any manual intervention, try to dump /tmp/rules.debug again and see how they compare to what you have above.

Finger79

@luckman212:

That looks correct - next time the Policy Routing stops working (I'm guessing it's after one of the gateways flaps) & before doing any manual intervention, try to dump /tmp/rules.debug again and see how they compare to what you have above.

It's currently not working. :)

luckman212

Hmm that is really odd. Is a screenshot of your LAN rules possible? Do you also have an outbound NAT rule to rewrite traffic for the .103 to GWWANGW?

You could also start a ping from the tablet and then run pfctl on pfSense to have a look at the states…

on tablet

ping 75.75.75.75

on pfSense

pfctl -vv -ss | grep -A2 '75.75.75.75'

It will show you the rule# that is being hit, let's say it was rule 193 - you can check that…

pfctl -vv -sr | grep '@193'

Finger79

@luckman212:

Hmm that is really odd. Is a screenshot of your LAN rules possible? Do you also have an outbound NAT rule to rewrite traffic for the .103 to GWWANGW?

Screenshot of the rule attached. It's the one that gets triggered (as expected).

No outband NAT rule just for the .103, but I do have 3 separate outbound NAT rules for the LAN subnet to go out WAN, VPN1, and VPN2.

@luckman212:

You could also start a ping from the tablet and then run pfctl on pfSense to have a look at the states…

on tablet
ping 75.75.75.75
on pfSense
pfctl -vv -ss | grep -A2 '75.75.75.75'
It will show you the rule# that is being hit, let's say it was rule 193 - you can check that…
pfctl -vv -sr | grep '@193'

My Tablet rule was only configured to pass IPv4 TCP/UDP (which covers the apps I want to go through the normal WAN), so I had to modify the rule to IPv4 * to include ICMP.

As soon as I applied that rule, everything worked as expected. Pinging 75.75.75.75 went through the tablet rule, and every "What is my IP" website on the tablet returned my ISP WAN's address.

I reset all states just to be sure, and then it reverted all traffic from the tablet through the VPN. Damn it!

Seems like the OpenVPN routing rules use stronger magic than policy-based routing rules and override any gateway I set.

tablet_lan_rule.png_thumb

luckman212

Are your OpenVPN clients set to "Don't pull routes" ? (they should be…)

Finger79

Here's some of the LAN rules (farther down the rule list) for most of the traffic. For example, the "Allow Web Traffic" rule sends all 80/443 traffic out the VPN gateway.

lan_pass_rules.png_thumb

Finger79

@luckman212:

Are your OpenVPN clients set to "Don't pull routes" ? (they should be…)

I've played with that setting in the past and had undesirable results. Similar to the guy in this thread, my real IP started leaking instead of going out the VPN.

I'll try it again.

luckman212

I think you definitely need route-nopull to be checked. If your DNS is "leaking" then you need to adjust the DNS settings on your Tablet so the lookups policy route via the tunnel. If you're using pfSense as the DNS server for the tablet, this isn't going to work since Unbound or dnsmasq upstream queries will be sourced from the firewall itself. The simple fix for that is to manually set a DNS server on the tablet and make sure your policy route traps that traffic (udp/53).

Finger79

No, not DNS leaking, all traffic leaking such as through 80/443. Even though I have a "NO_WAN_EGRESS" policy-based filtering setup, it doesn't seem to work when I don't pull routes from the VPN provider. (Very not cool!)

I'm getting inconsistent results right now on my laptop on various "What is my IP?" sites.

1) Real WAN IP address (undesirable)
2) VPN connection 1 (expected)
3) VPN connection 2 (expected)

My DNS does not appear to be leaking at all, which is good. All DNS through unbound goes out through the VPN, as configured.

luckman212

Ah ok sorry I saw "leaking" and jumped to conclusion you were talking about DNS since that is usually what people refer to when talking about leaks.

Did you try the pfctl commands from a few posts ago? And you double checked your outbound NAT rules?

Finger79

I reverted back to having both OpenVPN client connections pulling routes. Having them not pull routes was 100 times more undesirable since it exposed my entire LAN through the normal gateway and the VPN gateway randomly.

Finger79

@luckman212:

Ah ok sorry I saw "leaking" and jumped to conclusion you were talking about DNS since that is usually what people refer to when talking about leaks.

Did you try the pfctl commands from a few posts ago? And you double checked your outbound NAT rules?

Not sure what to look for in the outbound NAT rules.
LAN to WAN
LAN to VPN1
LAN to VPN2

I don't have a NAT rule just for the .103 exception… should be included in the above 3 rules.

I did the pfctl command and got inconsistent results. Sometimes it would go out the VPN and other times it would go out the WAN.

Derelict

You people who want to take a complicated setup like policy routing multiple openvpn connections then blame the software when it doesn't work yet obviously have no real grasp on what really needs to happen to make it work simply floor me.

In order for tagging and matching along the NO_WAN_EGRESS vein to work, EVERY packet that should go over the VPN must be tagged.

That is going to be a crap shoot without enabling "don't pull routes."

You are going to have to know exactly how to structure your rules in either case.

Two choices:

Enable "don't pull routes" and policy route VPN traffic

Don't enable "don't pull routes" and policy route clear internet traffic.

With multiple VPN providers, not enabling "don't pull routes" is going to be very complicated because they will both want to enable the 0/1 and 128/1 rules.