Wierd Routing behaviour that I may need to work around [solved]

JustConfused

Simplified Network View:

The basic premis is that all traffic from specific hosts is policy based routed at the pfSense firewall. At the moment this is a simple anything from .32 or .36 is routed out of the OpenVPN Gateway whilst stuff from (for example the .10) is routed normally unencrypted onto the internet. I can then get more granular over time, but I need to start somewhere

The pfsense config is really quite simple at the moment.
1 LAN Interface, 1 WAN Interface
1 OpenVPN Tunnel, 1 VPN Interface, 1 VPN Gateway
Rules
MASQ rules on WAN and VPN Interface
PBR rule on LAN (and this may be the immediate issue) pointing certain ip addresses to VPN Gateway. Traffic is tagged
Floating Rule to block tagged traffic from going out of the WAN

With the network in this state containers cannot ping other LAN devices, but can ping the gateway. DNS resolution also does not work as TrueNAS DNS Servers are LAN based. This is suboptimal (as in useless)

If I switch off the VPN Gateway (as in disable it) then containers work properly (but not encrypted obviously)
If I switch the VPN Gateway back on then containers don't work again.

Note that this and "Apply Changes" is all that is needed.

Running a traceroute from the Heimdall container (it has nslookup, ping and traceroute) I get

Traffic is leaving the container and going straight to the firewall which then routes / redirects the traffic back to the actual destination. This implies that all traffic leaving a container goes to the firewall first and then back into the network. Surely this is wrong on all sorts of levels and not the way that traffic should be routed.

I think what is happening is that the firewall is, through the policy based route, grabbing the traffic and trying to push it out of the VPN Interface - which is entirely the wrong direction - and I think I have confirmed that with some other tests (confirmed)

However the bigger question is why is all the local traffic being sent to the firewall rather than being dumped onto the LAN properly - which I feel must be a bug in TrueNAS unless someone can correct me.

Let me be clear I don't think this is a bug in pfSense (although it might be PBR being over enthusiastic). I do think this is a bug somewhere in the TrueNAS Scale ecosystem, but I don't know enough about it to say where. However I may need to work around this in the short to medium term. I am posting here because:
a. It may interest someone
b. Looking for advice - does this sound right
c. Looking for advice on a pfsense workaround (I have some ideas) - but I am all up for simpler ones
d. I have spent several days tearing my almost non-existent hair out over this

mariatech

@justconfused
A few general things to look at:

Everything on 192.168.38.0 should be able to reach each other without the firewall knowing about it. Are the firewall and switch separate? Can you unplug the pfSense from the switch and have the remaining devices ping/traceroute each other? How are the other devices configured, DHCP, static? Is the switch managed, does it have any VLANs, port isolation, or routed ports? Ping/traceroute from the TN Scale hosts, if possible.

The first hop from the container is 172.16.0.1. Thus the container must have an address 172.16.0.X, right? Is that address visible to the rest of the network, or is TN peforming NAT? If so, is that desired? Is there a way to look at TN's routing table (perhaps netstat -nr)

I'm assuming you haven't subnetted 192.168.38.0? Are all the machines using the same netmask 255.255.255.0?

Does the traceroute change when you have the VPN on or off?

JustConfused

@mariatech - some good questions

"Everything on 192.168.38.0 should be able to reach each other without the firewall knowing about it." - I agree - and this works normally for everything else
"Are the firewall and switch separate? " - Yes - everything here is physical (apart from the containers). There is VMWare - but not involved.
"Can you unplug the pfSense from the switch and have the remaining devices ping/traceroute each other?" - good logical question - I will test - see below
"How are the other devices configured, DHCP, static?" - All theses physical devices are static. There is DHCP, but not involved
"Is the switch managed, does it have any VLANs, port isolation, or routed ports?" - all switches are manageable (but I do very little with them). There are VLANs (iSCSI is on its own VLAN & switch) - but none are involved here. No port isolation or routed ports. The reason for multiple switches is that some of this kit is in the garage and the firewall is in the house. Some kit is 10Gb, some kit is 1Gb. There is both fibre and copper between the house and the garage
"Ping/traceroute from the TN Scale hosts, if possible." - normal and correct behaviour - goes straight to the desired host and not via the firewall.
"The first hop from the container is 172.16.0.1. Thus the container must have an address 172.16.0.X, right?" - Yes
"Is that address visible to the rest of the network, or is TN peforming NAT?" - TN is performing NAT - at least that was my understanding
"If so, is that desired?" - there is no option for anything else. This is not portainer (perhaps unfortunately)
"Is there a way to look at TN's routing table (perhaps netstat -nr)"? Yes - nothing unusual here

root@NewNAS[~]# netstat -nr
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         192.168.38.15   0.0.0.0         UG        0 0          0 enp7s0f4
172.16.0.0      0.0.0.0         255.255.0.0     U         0 0          0 kube-bridge 
172.22.16.0     0.0.0.0         255.255.255.0   U         0 0          0 enp7s0f4d1
192.168.38.0    0.0.0.0         255.255.255.0   U         0 0          0 enp7s0f4
root@NewNAS[~]#

172.22.16.0 = iSCSI network
172.16.0.0 is reserved for all things kubernetes / containery

"I'm assuming you haven't subnetted 192.168.38.0?" - Correct. No wierd subnetting. I am lazy and don't like to work that out if I can avoid it and I have no need for it
"Are all the machines using the same netmask 255.255.255.0?" - Yes
Does the traceroute change when you have the VPN on or off?" - Yes. I am assuming by VPN On/Off that you mean when I toggle the gateway. - see below

Longer Answers below
Traceroute to 192.168.38.10 with Gateway On

Traceroute to 192.168.38.10 with Gateway disabled

Testing removing the firewall (physical unplug of internal NIC)
Traceroute (and ping) to 192.168.38.10

Heimdall container is 172.16.3.165 with a 255.255.0.0 subnet. Gateway is 172.16.0.1

My current conclusion is still the same as in my OP. The routing of the container traffic is / must be a bug in TrueNAS as its highly suboptimal and doubles traffic to the firewall that doesn't need to be there. This bug is shown up by the PBR in pfSense grabbing the traffic (correctly really) and pushing it into the VPN

johnpoz

@justconfused said in Wierd Routing behaviour that I may need to work around:

However the bigger question is why is all the local traffic being sent to the firewall rather than being dumped onto the LAN properly - which I feel must be a bug in TrueNAS

Yeah your 172.16.0.1 is container network, but that should be natted to your dockers host IP..

So I run a heimdall container and if grab a console on it..

root@heimdall:/# ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:AC:11:00:02  
          inet addr:172.17.0.2  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:22 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1916 (1.8 KiB)  TX bytes:238 (238.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

This is on a host 192.168.9.10, if I traceroute to something on the 192.168.9/24 network I do show hitting 172.16.0.1 but then directly to the 192.168.9.100 address for example..

traceroute to 192.168.9.100 (192.168.9.100), 30 hops max, 46 byte packets
 1  172.17.0.1 (172.17.0.1)  0.008 ms  0.007 ms  0.004 ms
 2  i9-win.local.lan (192.168.9.100)  0.505 ms  *  0.337 ms

If I ping the pfsense box on the 192.168.9 from the heimall container it looks likes coming from the docker host 192.168.9.10

If I had to guess, and I am sure not a docker guru for sure.. I run them on my synology nas this seems to be something odd in your docker networking..

JustConfused

So, how can I use pfSense to work around this issue? (if I can)

I have discovered that if I disable the policy based routing down the tunnel then the router diverts the traffic back where it needs to go correctly, although it shouldn't need to

Try 1:
I have tried adding a LAN Gateway (sort of a dummy really)

and then adding a new PBR in front of the rule causing the issue

But this doesn't work - I suspect because the LAN_Gateway isn't really a gateway as it has no address

Try 2:
Basically the same but point the gateway at 192.168.38.11 (a different AD Server)
I wasn't expecting this to work - and I cannot say that I am suprised that it doesn't

That unfortunately tapped me out for ideas at the moment (barring a major network redesign). Maybe put another firewall/router in the middle with no PBR and shove my real firewall out 1 network step. I have the kit available (the good news), could just use pfSense with the FW turned off.

johnpoz

@justconfused not sure why you think this has anything to do with pfsense.

I think maybe your coming at it from the wrong direction. Because your having problems accessing it from your network?

With your example of heimdall - This is accessed from the network, it doesn't normally access stuff directly - does it? mine sure isn't for anything I am doing with it. Well maybe it does, since it shows like my overseerr how much its processing.. And my tautulli if any streams are running. But those are just setup in heimdall to the fqdn of those services. Have never looked into exactly how it gets that info - if it initiates the connection from itself, or if it just presents my browser with how to get that info.

So for example I access it from my networks via the docker hosts fqdn on port 80

http://nas.local.lan:8056/

This presents me with the interface..

Are you trying to run something on this heimdall docker that needs to access something on its own? A connection it started? Is something not specifically working?

Again not a docker guru - but if the docker creates a connection to anything outside the host, be it the hosts network, or some other network.. That should be natted to the hosts IP..

For example if I get a ping going from heimdall docker to something on the hosts network 192.168.9 pfsense never even sees that traffic.. Because the docker host sends it on to my host on the 192.168.9.100 directly..

You have something odd in your docker setup in my opinion. And zero to do with pfsense.. Sure if pfsense gets sent traffic it will try and route it.. But your host on your network X, should never send traffic to pfsense if some docker is trying to talk to some IP the docker host is directly connected too.

If your trying to get to something from your docker on another network other than the docker hosts network, then sure it would be sent to pfsense, from the docker host IP.. And pfsense would route it best it can. If you have a policy route sending traffic somewhere then it would go there vs just some other network on pfsense.

You need to allow normal routing on pfsense. This would be a policy route bypass

https://docs.netgate.com/pfsense/en/latest/multiwan/policy-route.html#bypassing-policy-routing

But from your drawing with the 192.168.38 devices them talking to each other should never send any traffic to pfsense.. Devices on the same network talking have nothing to do with pfsense - the gateway to get off said network.

JustConfused

@johnpoz
I don't think this is pfSense's fault at all. pfSense is doing nothing wrong. Also don't get hung up on Heimdall, its just an available container with nslookup, ping and traceroute available inside the container (most don't have that much)

The fault IS with the kubernetes setup - but that isn't very configurable by the user (aka Me) and whilst I have logged a ticket with IX I suspect this will take a while.

I am just hoping I can bodge around the issue with pfsense (which is configurable) until TN gets fixed.

I shall give your idea a try. Bypassing Policy Routing

And it works - it was what I was trying to achieve - but:

Simpler
Works - at least in initial testing

Thank you

mariatech

@justconfused I agree. The only way pfSense could get hold of that traffic is by poisoning Scale's ARP cache, or MAC flooding the switch, and by physically unplugging it we have eliminated even that! A packet with 192.168.38.10 as a destination somehow ends up with pfSense's MAC address. There must be some policy in Scale that's bypassing the host's routing table.

Speaking of not being configurable, something like this must be happening on my Chromebook. Crostini has an address that looks like it's on the local prefix, but Chrome inserts itself as an extra hop between it and the rest of the LAN. And it's not reachable from the LAN until I create some state by initiating an outward connection from the VM.

Anyhoo, Good Luck!

JustConfused

Can this thread be marked as solved somehow?

From pfSense's PoV it is solved by enabling a workaround. The problem still exists - but its not pfSense's issue

Thanks to all the contributed

luckman212

@justconfused i'm on my mobile right now so I'll admit, I didn't read this with a fine tooth comb but I will say: I ran into a very strange and similar sounding issue some months ago involving docker containers and networking. it boiled down to the fact that I wasn't aware that the default internal docker network was 172.17.0.0/16. this was causing routing and VPN issues for me that I could not figure out for days. eventually I ended up changing some of my VPN transit networks to work around that and everything was smooth.

not sure if this applies at all but wanted to throw it out there.

johnpoz

@justconfused I can marked it solved for you. So your currently working? While as I said I really don't think this is a pfsense issue - want to help in anyway we can to get you working.