[SOLVED] Policy-Based Routing Not Consistently Going Out the Specified Gateway



  • System:  2.4.0-RELEASE

    Setup:

    • I have two OpenVPN client tunnels set up in a Tier 1 Gateway Group.  Let's just call it the VPN Gateway.

    • I have LAN firewall rules set up so that traffic will go out the VPN Gateway gateway.

    • I want to make an exception for a tablet I'm using.  I set up a DHCP reservation so that this tablet always gets issued a .103 IP address.

    • I have another LAN firewall rule set up so that if the Source is .103 to go out the WANGW Gateway, as opposed to the VPN Gateway gateway.  This exception rule is higher than the rules below that normally route traffic through the VPNs.

    • Upon a fresh reboot of pfSense, the exception appears to work, and all traffic from this tablet go out the naked WAN.  After some period of time, all traffic from the tablet goes out over the VPN, totally ignoring the firewall rule and policy-based routing.

    Something is overriding the system routing table and the way policy-based routing should work.



  • I've experienced a similar issue on 2.4.1. Still trying to track down whether it's a real issue or just something borked in my config. Either way, next time this happens to you, could you ssh in and grab a copy of /tmp/rules.debug and pastebin it?



  • I was trying to policy route myself…here is a link of a conversation with some possible solutions. Not exactly the same but might help: https://forum.pfsense.org/index.php?topic=137498.msg752159#msg752159

    Another suggestion is maybe making a "block rule" for your "WAN only" devices IP immediately after your "allow WAN deice rules"?

    I can't speak to if this is a 2.4 upgrade issue, I have policy routing and haven't seen this issue, however I have seperate interfaces mostly.

    V



  • @luckman212:

    I've experienced a similar issue on 2.4.1. Still trying to track down whether it's a real issue or just something borked in my config. Either way, next time this happens to you, could you ssh in and grab a copy of /tmp/rules.debug and pastebin it?

    What part of rules.debug would help?  I'd have to censor some of it.



  • @V3lcr0:

    Another suggestion is maybe making a "block rule" for your "WAN only" devices IP immediately after your "allow WAN deice rules"?

    Probably wouldn't work since the firewall rule has already been triggered.

    In other words, I have a "Allow Tablet out Naked WAN" rule for .103 traffic to go out the WANGW gateway, and it shows up properly in my logs.  It simply lies and sends it out the VPN gateway instead.  :P

    @V3lcr0:

    I can't speak to if this is a 2.4 upgrade issue, I have policy routing and haven't seen this issue, however I have seperate interfaces mostly.

    This issue happened in 2.3.4-p1 as well.  I was hoping 2.4.0 would fix it.



  • @Finger79:

    What part of rules.debug would help?  I'd have to censor some of it.

    Well definitely the #gateways section at least, probably the #aliases and #System aliases and # User Aliases as well.
    You can change IPs or redact if you want



  • 
    #System aliases
    
    loopback = "{ lo0 }"
    WAN = "{ igb0 }"
    LAN = "{ igb1 }"
    PIA_1 = "{ ovpnc1 }"
    PIA_2 = "{ ovpnc2 }"
    OpenVPN = "{ openvpn }"
    
    # Gateways
    GWPIA_2_VPNV4 = " route-to ( ovpnc2 10.42.12.5 ) "
    GWPIA_1_VPNV4 = " route-to ( ovpnc1 10.46.10.5 ) "
    GWWANGW = " route-to ( igb0 [Public WAN IP] ) "
    GWPIA_GROUP = "  route-to { ( ovpnc1 10.46.10.5 ) ( ovpnc2 10.42.12.5 )  }  round-robin  "
    

    I'm not really using user aliases in any policy-based routing.  Just the .103 for the tablet.



  • That looks correct - next time the Policy Routing stops working (I'm guessing it's after one of the gateways flaps) & before doing any manual intervention, try to dump /tmp/rules.debug again and see how they compare to what you have above.



  • @luckman212:

    That looks correct - next time the Policy Routing stops working (I'm guessing it's after one of the gateways flaps) & before doing any manual intervention, try to dump /tmp/rules.debug again and see how they compare to what you have above.

    It's currently not working.  :)



  • Hmm that is really odd. Is a screenshot of your LAN rules possible? Do you also have an outbound NAT rule to rewrite traffic for the .103 to GWWANGW?

    You could also start a ping from the tablet and then run pfctl on pfSense to have a look at the states…

    on tablet

    ping 75.75.75.75
    

    on pfSense

    pfctl -vv -ss | grep -A2 '75.75.75.75'
    

    It will show you the rule# that is being hit, let's say it was rule 193 - you can check that…

    pfctl -vv -sr | grep '@193'
    


  • @luckman212:

    Hmm that is really odd. Is a screenshot of your LAN rules possible? Do you also have an outbound NAT rule to rewrite traffic for the .103 to GWWANGW?

    Screenshot of the rule attached.  It's the one that gets triggered (as expected).

    No outband NAT rule just for the .103, but I do have 3 separate outbound NAT rules for the LAN subnet to go out WAN, VPN1, and VPN2.

    @luckman212:

    You could also start a ping from the tablet and then run pfctl on pfSense to have a look at the states…

    on tablet

    ping 75.75.75.75
    

    on pfSense

    pfctl -vv -ss | grep -A2 '75.75.75.75'
    

    It will show you the rule# that is being hit, let's say it was rule 193 - you can check that…

    pfctl -vv -sr | grep '@193'
    

    My Tablet rule was only configured to pass IPv4 TCP/UDP (which covers the apps I want to go through the normal WAN), so I had to modify the rule to IPv4 * to include ICMP.

    As soon as I applied that rule, everything worked as expected.  Pinging 75.75.75.75 went through the tablet rule, and every "What is my IP" website on the tablet returned my ISP WAN's address.

    I reset all states just to be sure, and then it reverted all traffic from the tablet through the VPN.  Damn it!

    Seems like the OpenVPN routing rules use stronger magic than policy-based routing rules and override any gateway I set.




  • Are your OpenVPN clients set to "Don't pull routes" ? (they should be…)



  • Here's some of the LAN rules (farther down the rule list) for most of the traffic.  For example, the "Allow Web Traffic" rule sends all 80/443 traffic out the VPN gateway.




  • @luckman212:

    Are your OpenVPN clients set to "Don't pull routes" ? (they should be…)

    I've played with that setting in the past and had undesirable results.  Similar to the guy in this thread, my real IP started leaking instead of going out the VPN.

    I'll try it again.



  • I think you definitely need route-nopull to be checked. If your DNS is "leaking" then you need to adjust the DNS settings on your Tablet so the lookups policy route via the tunnel. If you're using pfSense as the DNS server for the tablet, this isn't going to work since Unbound or dnsmasq upstream queries will be sourced from the firewall itself. The simple fix for that is to manually set a DNS server on the tablet and make sure your policy route traps that traffic (udp/53).



  • No, not DNS leaking, all traffic leaking such as through 80/443.  Even though I have a "NO_WAN_EGRESS" policy-based filtering setup, it doesn't seem to work when I don't pull routes from the VPN provider.  (Very not cool!)

    I'm getting inconsistent results right now on my laptop on various "What is my IP?" sites.

    1)  Real WAN IP address (undesirable)
    2)  VPN connection 1 (expected)
    3)  VPN connection 2 (expected)

    My DNS does not appear to be leaking at all, which is good.  All DNS through unbound goes out through the VPN, as configured.



  • Ah ok sorry I saw "leaking" and jumped to conclusion you were talking about DNS since that is usually what people refer to when talking about leaks.

    Did you try the pfctl commands from a few posts ago? And you double checked your outbound NAT rules?



  • I reverted back to having both OpenVPN client connections pulling routes.  Having them not pull routes was 100 times more undesirable since it exposed my entire LAN through the normal gateway and the VPN gateway randomly.



  • @luckman212:

    Ah ok sorry I saw "leaking" and jumped to conclusion you were talking about DNS since that is usually what people refer to when talking about leaks.

    Did you try the pfctl commands from a few posts ago? And you double checked your outbound NAT rules?

    Not sure what to look for in the outbound NAT rules.
    LAN to WAN
    LAN to VPN1
    LAN to VPN2

    I don't have a NAT rule just for the .103 exception… should be included in the above 3 rules.

    I did the pfctl command and got inconsistent results.  Sometimes it would go out the VPN and other times it would go out the WAN.


  • Netgate

    You people who want to take a complicated setup like policy routing multiple openvpn connections then blame the software when it doesn't work yet obviously have no real grasp on what really needs to happen to make it work simply floor me.

    In order for tagging and matching along the NO_WAN_EGRESS vein to work, EVERY packet that should go over the VPN must be tagged.

    That is going to be a crap shoot without enabling "don't pull routes."

    You are going to have to know exactly how to structure your rules in either case.

    Two choices:

    Enable "don't pull routes" and policy route VPN traffic

    Don't enable "don't pull routes" and policy route clear internet traffic.

    With multiple VPN providers, not enabling "don't pull routes" is going to be very complicated because they will both want to enable the 0/1 and 128/1 rules.



  • Anything in your OpenVPN logs?

    My brain started working again & I remembered that you are trying to put your default LAN behind the OpenVPN tunnel and just make an exception for the Tablet.  That's the reverse of what I do (I have an alias for devices that I want "hidden" and everything else just uses default routing).  So in your case you should probably set the default route of your pfSense is set to the VPN gateway, otherwise your DNS traffic will leak – or, you can set DHCP to hand out an upstream DNS server to LAN clients e.g. 8.8.8.8, etc.

    As to why you are getting inconsistent results, I am at a bit of a loss. Maybe this needs a fresh set of eyes. Anyone else got any ideas?



  • @Derelict:

    You people who want to take a complicated setup like policy routing multiple openvpn connections then blame the software when it doesn't work yet obviously have no real grasp on what really needs to happen to make it work simply floor me.

    In order for tagging and matching along the NO_WAN_EGRESS vein to work, EVERY packet that should go over the VPN must be tagged.

    That is going to be a crap shoot without enabling "don't pull routes."

    I think you're misunderstanding.  The policy-based filtering floating rule works perfectly (matches the previously tagged packets).  That's not what this thread is about.

    @Derelict:

    In order for tagging and matching along the NO_WAN_EGRESS vein to work, EVERY packet that should go over the VPN must be tagged.

    Yep, no issues here.  Tagging is working as expected.

    @Derelict:

    You people who want to take a complicated setup like policy routing multiple openvpn connections

    The VPN connections are in one gateway group.  The policy-route is set to send all traffic out the VPN gateway.  That's working perfectly.

    The only thing that's not consistently working is the policy-route for one device on the LAN to go out the normal WAN gateway.


  • Netgate

    OK where is that rule in relation to all your other rules? You have yet to show that.

    I was more commenting on the nonsense like this:

    No, not DNS leaking, all traffic leaking such as through 80/443.  Even though I have a "NO_WAN_EGRESS" policy-based filtering setup, it doesn't seem to work when I don't pull routes from the VPN provider.  (Very not cool!)

    It works perfectly when configured correctly.



  • @Derelict:

    OK where is that rule in relation to all your other rules? You have yet to show that.

    I was more commenting on the nonsense like this:

    No, not DNS leaking, all traffic leaking such as through 80/443.  Even though I have a "NO_WAN_EGRESS" policy-based filtering setup, it doesn't seem to work when I don't pull routes from the VPN provider.  (Very not cool!)

    It works perfectly when configured correctly.

    Screenshots of that posted earlier.  The "Allow Web Traffic" rule sets the policy-based filtering tag "NO_WAN_EGRESS" and also sets the policy-based routing gateway to the VPN_Gateway.

    The only time that traffic leaked out the naked WAN was when I told both client VPN connections to not pull routes.  Then I got inconsistent results:  some traffic went out the VPN, and other traffic went out the WAN.  It was random.



  • @luckman212:

    So in your case you should probably set the default route of your pfSense is set to the VPN gateway, otherwise your DNS traffic will leak – or, you can set DHCP to hand out an upstream DNS server to LAN clients e.g. 8.8.8.8, etc.

    DNS seems to work perfectly.  Unbound sends all traffic through the VPN tunnels and never out the naked WAN.  (That interface is unchecked.)


  • Netgate

    You are all over the place. that rule routes traffic to PIA for, presumably, destinations 80 and 443.

    All other traffic will go out the default gateway (or the OpenVPN connection that happens to have been able to set the 0.0.0.0/1 and 128.0.0.0/1 rules, which as you found in the other thread, will be the first OpenVPN connection without "don't pull routes" set that connects. The other one will receive errors when trying to set those routes.)

    You are indicating there is a problem with some other host that is unable to go out WAN. Where is that rule in relation to all the other rules?

    Not blocking out things that really don't matter might help people help you.



  • @Derelict:

    You are indicating there is a problem with some other host that is unable to go out WAN.

    Negative.  It's not unable to go out WAN.  It just doesn't do it consistently.

    @Derelict:

    Where is that rule in relation to all the other rules?

    Answered earlier:
    @Finger79:

    Here's some of the LAN rules (farther down the rule list) for most of the traffic.  For example, the "Allow Web Traffic" rule sends all 80/443 traffic out the VPN gateway.

    The .103 tablet exception rule matches first.  It just doesn't consistently send traffic out the WAN.  Sometimes it does, other times it doesn't.  But the rule always fires.


  • Netgate

    Is WANGW flapping?

    System > Logs, Gateways



  • @Derelict:

    Is WANGW flapping?

    System > Logs, Gateways

    I don't see it mentioned anywhere in the Gateway logs.

    I had to turn off WANGW gateway monitoring (meaning it's always considered "Up").  The dpinger pings may have pissed it off.  I'll turn it back on and see if that helps.


  • Netgate

    Right. If that was happening when you were seeing "random" routing then the same principles that make "NO_WAN_EGRESS" necessary would apply equally to WANGW if it was flagged as down. In that case you would need "NO_VPN_EGRESS."



  • I get that. Fortunately, gateway flapping appears to not be the reason behind this.

    When I do a fresh reboot of pfSense, the tablet traffic consistently goes out the WAN, as expected.  Some time later (or some event later), it decides to go out the VPN only, even though the rule still fires that specifies WANGW.  It's just ignoring the gateway in the rule but still logging the rule as having fired.  Weird.


  • Netgate

    Doubtful. There is some other reason the traffic is not matching that policy routing rule - else it would be policy routed accordingly.



  • @Derelict:

    Doubtful. There is some other reason the traffic is not matching that policy routing rule - else it would be policy routed accordingly.

    Then why does the rule match in the Firewall Logs?

    As you can see, the rule is very simple.  If the source is .103 IPv4, then policy route it through WANGW.

    This works perfectly after a fresh reboot of pfSense.  Then like I said, after some time or some event, it no longer goes out the WANGW and goes out the VPN.  Something is overriding the routing portion of the firewall rule.



  • Just did some more, all from Firefox on the tablet:

    Shows real WAN IP
    Google "What is my IP"
    iplocation.com
    whatismyip.net
    privateinternetaccess.com
    whatismyip.org
    ExpressVPN.com
    MXtoolbox.com
    ip-address.org
    iplocation.net
    findipinfo.com
    myipaddress.com

    Shows VPN IP
    TorGuard.net
    DuckDuckGo "What is my IP"
    whatismyipaddress.com
    BearsMyIP.com
    ipchicken.com
    ipaddress.pro



  • @Finger79:

    Then why does the rule match in the Firewall Logs?

    As you can see, the rule is very simple.  If the source is .103 IPv4, then policy route it through WANGW.

    Do you have a kill switch rule below the policy route for .103 to block all traffic? You need this in case WANGW is down, because rules will be skipped if the GW is flapping, could explain your inconsistent results…


  • Netgate

    When it stops working, run this:

    pfctl -vvsr | grep -A3 XX.XX.XX.103

    Here: I'll show you one of mine. I'm not afraid of leaking inside addresses:

    $ pfctl -vvsr | grep -A3 192.168.223.6

    @307(1493852191) pass in quick on igb1.223 route-to (ovpnc3 172.29.114.130) inet from 192.168.223.6 to <openvpn_lan:2>flags S/SA keep state label "USER_RULE: Route OpenVPN Addresses Through OpenVPN"
      [ Evaluations: 2386      Packets: 0        Bytes: 0          States: 0    ]
      [ Inserted: pid 21796 State Creations: 0    ]

    Anyway, that will show the EXACT rules in the active rule set that have anything to do with that address at that specific point in time.</openvpn_lan:2>



  • @124(10000001) pass in log quick on igb1 inet from 192.168.100.103 to <negate_net   ="" works:0="">flags S/SA keep state label "NEGATE_ROUTE: Negate policy routing for de                                                                                                                              stination"
      [ Evaluations: 2572      Packets: 0        Bytes: 0          States: 0    ]
      [ Inserted: pid 54887 State Creations: 0    ]
    @125(1505701172) pass in log quick on igb1 route-to (igb0 xxx.xxx.xxx.xxx public IP) inet from                                                                                                                                192.168.100.103 to any flags S/SA keep state label "USER_RULE: Tablet Out Naked WAN"
      [ Evaluations: 865      Packets: 55591    Bytes: 19139183    States: 5    ]
      [ Inserted: pid 54887 State Creations: 807  ]
    @126(1458032398) block return in log quick on igb1 inet from any to <pfb_africa_   ="" v4:6176="">label "USER_RULE: pfb_Africa"</pfb_africa_ ></negate_net >


  • Netgate

    Looks good to me unless negate_networks includes the wrong destinations, which is pretty unlikely.

    Or if there is a rule that matches that source address that won't be shown there.

    I'd be happy to look at /tmp/rules.debug if you want to PM it.



  • Somewhat redacted and edited:

    < /tmp/rules.debug removed >



  • Wonder if this table should be empty:

    Diagnostics -> Tables
    negate_networks

    No entries exist in this table.