Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Wierd Routing behaviour that I may need to work around [solved]

    Scheduled Pinned Locked Moved Routing and Multi WAN
    solved
    11 Posts 4 Posters 1.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J
      JustConfused
      last edited by johnpoz

      Simplified Network View:
      ed96a3ad-7652-43b6-add6-8c1552bd9295-image.png
      The basic premis is that all traffic from specific hosts is policy based routed at the pfSense firewall. At the moment this is a simple anything from .32 or .36 is routed out of the OpenVPN Gateway whilst stuff from (for example the .10) is routed normally unencrypted onto the internet. I can then get more granular over time, but I need to start somewhere

      The pfsense config is really quite simple at the moment.
      1 LAN Interface, 1 WAN Interface
      1 OpenVPN Tunnel, 1 VPN Interface, 1 VPN Gateway
      Rules
      MASQ rules on WAN and VPN Interface
      PBR rule on LAN (and this may be the immediate issue) pointing certain ip addresses to VPN Gateway. Traffic is tagged
      Floating Rule to block tagged traffic from going out of the WAN

      With the network in this state containers cannot ping other LAN devices, but can ping the gateway. DNS resolution also does not work as TrueNAS DNS Servers are LAN based. This is suboptimal (as in useless)

      If I switch off the VPN Gateway (as in disable it) then containers work properly (but not encrypted obviously)
      If I switch the VPN Gateway back on then containers don't work again.

      Note that this and "Apply Changes" is all that is needed.

      Running a traceroute from the Heimdall container (it has nslookup, ping and traceroute) I get
      33525525-0cfb-46c5-93ea-5dde426c8cf2-image.png
      Traffic is leaving the container and going straight to the firewall which then routes / redirects the traffic back to the actual destination. This implies that all traffic leaving a container goes to the firewall first and then back into the network. Surely this is wrong on all sorts of levels and not the way that traffic should be routed.

      I think what is happening is that the firewall is, through the policy based route, grabbing the traffic and trying to push it out of the VPN Interface - which is entirely the wrong direction - and I think I have confirmed that with some other tests (confirmed)

      However the bigger question is why is all the local traffic being sent to the firewall rather than being dumped onto the LAN properly - which I feel must be a bug in TrueNAS unless someone can correct me.

      Let me be clear I don't think this is a bug in pfSense (although it might be PBR being over enthusiastic). I do think this is a bug somewhere in the TrueNAS Scale ecosystem, but I don't know enough about it to say where. However I may need to work around this in the short to medium term. I am posting here because:
      a. It may interest someone
      b. Looking for advice - does this sound right
      c. Looking for advice on a pfsense workaround (I have some ideas) - but I am all up for simpler ones
      d. I have spent several days tearing my almost non-existent hair out over this

      M johnpozJ 2 Replies Last reply Reply Quote 0
      • M
        mariatech @JustConfused
        last edited by

        @justconfused
        A few general things to look at:

        Everything on 192.168.38.0 should be able to reach each other without the firewall knowing about it. Are the firewall and switch separate? Can you unplug the pfSense from the switch and have the remaining devices ping/traceroute each other? How are the other devices configured, DHCP, static? Is the switch managed, does it have any VLANs, port isolation, or routed ports? Ping/traceroute from the TN Scale hosts, if possible.

        The first hop from the container is 172.16.0.1. Thus the container must have an address 172.16.0.X, right? Is that address visible to the rest of the network, or is TN peforming NAT? If so, is that desired? Is there a way to look at TN's routing table (perhaps netstat -nr)

        I'm assuming you haven't subnetted 192.168.38.0? Are all the machines using the same netmask 255.255.255.0?

        Does the traceroute change when you have the VPN on or off?

        J 1 Reply Last reply Reply Quote 0
        • J
          JustConfused @mariatech
          last edited by JustConfused

          @mariatech - some good questions

          "Everything on 192.168.38.0 should be able to reach each other without the firewall knowing about it." - I agree - and this works normally for everything else
          "Are the firewall and switch separate? " - Yes - everything here is physical (apart from the containers). There is VMWare - but not involved.
          "Can you unplug the pfSense from the switch and have the remaining devices ping/traceroute each other?" - good logical question - I will test - see below
          "How are the other devices configured, DHCP, static?" - All theses physical devices are static. There is DHCP, but not involved
          "Is the switch managed, does it have any VLANs, port isolation, or routed ports?" - all switches are manageable (but I do very little with them). There are VLANs (iSCSI is on its own VLAN & switch) - but none are involved here. No port isolation or routed ports. The reason for multiple switches is that some of this kit is in the garage and the firewall is in the house. Some kit is 10Gb, some kit is 1Gb. There is both fibre and copper between the house and the garage
          "Ping/traceroute from the TN Scale hosts, if possible." - normal and correct behaviour - goes straight to the desired host and not via the firewall.
          "The first hop from the container is 172.16.0.1. Thus the container must have an address 172.16.0.X, right?" - Yes
          "Is that address visible to the rest of the network, or is TN peforming NAT?" - TN is performing NAT - at least that was my understanding
          "If so, is that desired?" - there is no option for anything else. This is not portainer (perhaps unfortunately)
          "Is there a way to look at TN's routing table (perhaps netstat -nr)"? Yes - nothing unusual here

          root@NewNAS[~]# netstat -nr
          Kernel IP routing table
          Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
          0.0.0.0         192.168.38.15   0.0.0.0         UG        0 0          0 enp7s0f4
          172.16.0.0      0.0.0.0         255.255.0.0     U         0 0          0 kube-bridge 
          172.22.16.0     0.0.0.0         255.255.255.0   U         0 0          0 enp7s0f4d1
          192.168.38.0    0.0.0.0         255.255.255.0   U         0 0          0 enp7s0f4
          root@NewNAS[~]#
          

          172.22.16.0 = iSCSI network
          172.16.0.0 is reserved for all things kubernetes / containery

          "I'm assuming you haven't subnetted 192.168.38.0?" - Correct. No wierd subnetting. I am lazy and don't like to work that out if I can avoid it and I have no need for it
          "Are all the machines using the same netmask 255.255.255.0?" - Yes
          Does the traceroute change when you have the VPN on or off?" - Yes. I am assuming by VPN On/Off that you mean when I toggle the gateway. - see below

          Longer Answers below
          Traceroute to 192.168.38.10 with Gateway On
          68c2beda-fd27-4807-8a04-adaf0379a94a-image.png
          Traceroute to 192.168.38.10 with Gateway disabled
          33ae0fb1-bb19-47dd-986d-f04acce4a185-image.png

          Testing removing the firewall (physical unplug of internal NIC)
          Traceroute (and ping) to 192.168.38.10
          61e08a06-fc46-46f3-b34d-4b71627f62d0-image.png

          Heimdall container is 172.16.3.165 with a 255.255.0.0 subnet. Gateway is 172.16.0.1

          My current conclusion is still the same as in my OP. The routing of the container traffic is / must be a bug in TrueNAS as its highly suboptimal and doubles traffic to the firewall that doesn't need to be there. This bug is shown up by the PBR in pfSense grabbing the traffic (correctly really) and pushing it into the VPN

          1 Reply Last reply Reply Quote 0
          • johnpozJ
            johnpoz LAYER 8 Global Moderator @JustConfused
            last edited by johnpoz

            @justconfused said in Wierd Routing behaviour that I may need to work around:

            However the bigger question is why is all the local traffic being sent to the firewall rather than being dumped onto the LAN properly - which I feel must be a bug in TrueNAS

            Yeah your 172.16.0.1 is container network, but that should be natted to your dockers host IP..

            So I run a heimdall container and if grab a console on it..

            root@heimdall:/# ifconfig
            eth0      Link encap:Ethernet  HWaddr 02:42:AC:11:00:02  
                      inet addr:172.17.0.2  Bcast:172.17.255.255  Mask:255.255.0.0
                      UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
                      RX packets:22 errors:0 dropped:0 overruns:0 frame:0
                      TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
                      collisions:0 txqueuelen:0 
                      RX bytes:1916 (1.8 KiB)  TX bytes:238 (238.0 B)
            
            lo        Link encap:Local Loopback  
                      inet addr:127.0.0.1  Mask:255.0.0.0
                      UP LOOPBACK RUNNING  MTU:65536  Metric:1
                      RX packets:0 errors:0 dropped:0 overruns:0 frame:0
                      TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
                      collisions:0 txqueuelen:1 
                      RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
            

            This is on a host 192.168.9.10, if I traceroute to something on the 192.168.9/24 network I do show hitting 172.16.0.1 but then directly to the 192.168.9.100 address for example..

            traceroute to 192.168.9.100 (192.168.9.100), 30 hops max, 46 byte packets
             1  172.17.0.1 (172.17.0.1)  0.008 ms  0.007 ms  0.004 ms
             2  i9-win.local.lan (192.168.9.100)  0.505 ms  *  0.337 ms
            

            If I ping the pfsense box on the 192.168.9 from the heimall container it looks likes coming from the docker host 192.168.9.10

            If I had to guess, and I am sure not a docker guru for sure.. I run them on my synology nas this seems to be something odd in your docker networking..

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 24.11 | Lab VMs 2.8, 24.11

            J 1 Reply Last reply Reply Quote 0
            • J
              JustConfused @johnpoz
              last edited by JustConfused

              So, how can I use pfSense to work around this issue? (if I can)

              I have discovered that if I disable the policy based routing down the tunnel then the router diverts the traffic back where it needs to go correctly, although it shouldn't need to

              Try 1:
              I have tried adding a LAN Gateway (sort of a dummy really)
              c44a4854-3891-4ced-97f4-7c62b9e09bd6-image.png
              111ef826-5403-4a0f-836f-c1d22bd17fb0-image.png
              and then adding a new PBR in front of the rule causing the issue
              1b218b7d-3936-4ab3-965d-e8f6de3d3e2d-image.png

              But this doesn't work - I suspect because the LAN_Gateway isn't really a gateway as it has no address

              Try 2:
              Basically the same but point the gateway at 192.168.38.11 (a different AD Server)
              I wasn't expecting this to work - and I cannot say that I am suprised that it doesn't

              That unfortunately tapped me out for ideas at the moment (barring a major network redesign). Maybe put another firewall/router in the middle with no PBR and shove my real firewall out 1 network step. I have the kit available (the good news), could just use pfSense with the FW turned off.

              johnpozJ 1 Reply Last reply Reply Quote 0
              • johnpozJ
                johnpoz LAYER 8 Global Moderator @JustConfused
                last edited by johnpoz

                @justconfused not sure why you think this has anything to do with pfsense.

                I think maybe your coming at it from the wrong direction. Because your having problems accessing it from your network?

                With your example of heimdall - This is accessed from the network, it doesn't normally access stuff directly - does it? mine sure isn't for anything I am doing with it. Well maybe it does, since it shows like my overseerr how much its processing.. And my tautulli if any streams are running. But those are just setup in heimdall to the fqdn of those services. Have never looked into exactly how it gets that info - if it initiates the connection from itself, or if it just presents my browser with how to get that info.

                So for example I access it from my networks via the docker hosts fqdn on port 80

                http://nas.local.lan:8056/

                This presents me with the interface..

                interface.jpg

                Are you trying to run something on this heimdall docker that needs to access something on its own? A connection it started? Is something not specifically working?

                Again not a docker guru - but if the docker creates a connection to anything outside the host, be it the hosts network, or some other network.. That should be natted to the hosts IP..

                For example if I get a ping going from heimdall docker to something on the hosts network 192.168.9 pfsense never even sees that traffic.. Because the docker host sends it on to my host on the 192.168.9.100 directly..

                You have something odd in your docker setup in my opinion. And zero to do with pfsense.. Sure if pfsense gets sent traffic it will try and route it.. But your host on your network X, should never send traffic to pfsense if some docker is trying to talk to some IP the docker host is directly connected too.

                If your trying to get to something from your docker on another network other than the docker hosts network, then sure it would be sent to pfsense, from the docker host IP.. And pfsense would route it best it can. If you have a policy route sending traffic somewhere then it would go there vs just some other network on pfsense.

                You need to allow normal routing on pfsense. This would be a policy route bypass

                https://docs.netgate.com/pfsense/en/latest/multiwan/policy-route.html#bypassing-policy-routing

                But from your drawing with the 192.168.38 devices them talking to each other should never send any traffic to pfsense.. Devices on the same network talking have nothing to do with pfsense - the gateway to get off said network.

                An intelligent man is sometimes forced to be drunk to spend time with his fools
                If you get confused: Listen to the Music Play
                Please don't Chat/PM me for help, unless mod related
                SG-4860 24.11 | Lab VMs 2.8, 24.11

                J 1 Reply Last reply Reply Quote 1
                • J
                  JustConfused @johnpoz
                  last edited by JustConfused

                  @johnpoz
                  I don't think this is pfSense's fault at all. pfSense is doing nothing wrong. Also don't get hung up on Heimdall, its just an available container with nslookup, ping and traceroute available inside the container (most don't have that much)

                  The fault IS with the kubernetes setup - but that isn't very configurable by the user (aka Me) and whilst I have logged a ticket with IX I suspect this will take a while.

                  I am just hoping I can bodge around the issue with pfsense (which is configurable) until TN gets fixed.

                  I shall give your idea a try. Bypassing Policy Routing

                  And it works - it was what I was trying to achieve - but:

                  1. Simpler
                  2. Works - at least in initial testing

                  Thank you

                  M 1 Reply Last reply Reply Quote 0
                  • M
                    mariatech @JustConfused
                    last edited by

                    @justconfused I agree. The only way pfSense could get hold of that traffic is by poisoning Scale's ARP cache, or MAC flooding the switch, and by physically unplugging it we have eliminated even that! A packet with 192.168.38.10 as a destination somehow ends up with pfSense's MAC address. There must be some policy in Scale that's bypassing the host's routing table.

                    Speaking of not being configurable, something like this must be happening on my Chromebook. Crostini has an address that looks like it's on the local prefix, but Chrome inserts itself as an extra hop between it and the rest of the LAN. And it's not reachable from the LAN until I create some state by initiating an outward connection from the VM.

                    Anyhoo, Good Luck!

                    J 1 Reply Last reply Reply Quote 0
                    • J
                      JustConfused @mariatech
                      last edited by

                      Can this thread be marked as solved somehow?

                      From pfSense's PoV it is solved by enabling a workaround. The problem still exists - but its not pfSense's issue

                      Thanks to all the contributed

                      luckman212L johnpozJ 2 Replies Last reply Reply Quote 0
                      • luckman212L
                        luckman212 LAYER 8 @JustConfused
                        last edited by

                        @justconfused i'm on my mobile right now so I'll admit, I didn't read this with a fine tooth comb but I will say: I ran into a very strange and similar sounding issue some months ago involving docker containers and networking. it boiled down to the fact that I wasn't aware that the default internal docker network was 172.17.0.0/16. this was causing routing and VPN issues for me that I could not figure out for days. eventually I ended up changing some of my VPN transit networks to work around that and everything was smooth.

                        not sure if this applies at all but wanted to throw it out there.

                        1 Reply Last reply Reply Quote 0
                        • johnpozJ
                          johnpoz LAYER 8 Global Moderator @JustConfused
                          last edited by

                          @justconfused I can marked it solved for you. So your currently working? While as I said I really don't think this is a pfsense issue - want to help in anyway we can to get you working.

                          An intelligent man is sometimes forced to be drunk to spend time with his fools
                          If you get confused: Listen to the Music Play
                          Please don't Chat/PM me for help, unless mod related
                          SG-4860 24.11 | Lab VMs 2.8, 24.11

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.