Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Is this an Asymmetric Routing routing issue?

    Scheduled Pinned Locked Moved L2/Switching/VLANs
    27 Posts 3 Posters 1.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • H
      Helmut101 @johnpoz
      last edited by Helmut101

      @johnpoz
      Nice, thanks! I did not know that I can do all of this. And I am really feeling I need to read into packet captures, sniffing etc.. But the cap collected in pfsense with promiscous looks different in wireshark:
      1254f52b-2f5d-4690-8651-1451f596a377-image.png

      johnpozJ 1 Reply Last reply Reply Quote 0
      • johnpozJ
        johnpoz LAYER 8 Global Moderator @Helmut101
        last edited by johnpoz

        Yeah if there is no answer you will see retrans.. Thought your problem child was .9?

        But if click into a specific packet you should see the tag, like my example.

        edit: Maybe you have to enable to show 802.1q in the dissector.. Let me check my wireshark settings. I use wireshark a lot, so might have turned it on long time ago..

        An intelligent man is sometimes forced to be drunk to spend time with his fools
        If you get confused: Listen to the Music Play
        Please don't Chat/PM me for help, unless mod related
        SG-4860 24.11 | Lab VMs 2.8, 24.11

        H 1 Reply Last reply Reply Quote 0
        • H
          Helmut101 @johnpoz
          last edited by Helmut101

          @johnpoz
          Yes, half an hour ago I changed the LXC Container's IP to 17, to see if it has any effect: No, it doesn't. Same problem. Can reach .8, but cannot reach .17 (both on the same vmbridge in Proxmox). I can even reach a third container, with a different subnet VLAN (60 instead of 40)..

          johnpozJ 1 Reply Last reply Reply Quote 0
          • johnpozJ
            johnpoz LAYER 8 Global Moderator @Helmut101
            last edited by johnpoz

            Well if your seeing the traffic go out, and its tagged correctly and to the right mac.. It has zero to do with pfsense..

            You don't have any static mac setup in pfsense do you.. Maybe you setup static arp for that mac, and its changed?

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 24.11 | Lab VMs 2.8, 24.11

            H 1 Reply Last reply Reply Quote 0
            • H
              Helmut101 @johnpoz
              last edited by Helmut101

              @johnpoz said in Is this an Asymmetric Routing routing issue?:

              Maybe you setup static arp for that mac, and its changed?

              I do have static ARP/MAC - the container gets its IP using DHCP and this is assigned based on MAC. However, checked and MAC is Ok (and I can also reach the container from pfsense/OpenVPN, or from the Management LAN 10, just not from the vlan 30).

              I am out now today. Many thanks so far, this is really helpful and, while I am not yet further with my problem, I am learning a lot!

              johnpozJ 1 Reply Last reply Reply Quote 0
              • johnpozJ
                johnpoz LAYER 8 Global Moderator @Helmut101
                last edited by johnpoz

                @helmut101 said in Is this an Asymmetric Routing routing issue?:

                just not from the vlan 30).

                That sure doesn't make any sense.. You sure you don't have a firewall this thing your trying to reach, or odd routing for 30 network on your dest device? So nothing in 30 can talk to it, but 30 can talk to other devices in the 40 vlan..

                Can you sniff on 40.17 and validate it actually sees the traffic?

                An intelligent man is sometimes forced to be drunk to spend time with his fools
                If you get confused: Listen to the Music Play
                Please don't Chat/PM me for help, unless mod related
                SG-4860 24.11 | Lab VMs 2.8, 24.11

                H 1 Reply Last reply Reply Quote 1
                • H
                  Helmut101 @johnpoz
                  last edited by Helmut101

                  @johnpoz said in Is this an Asymmetric Routing routing issue?:

                  ewall this thing your trying t

                  Yes, tomorrow I have a bit more time. I will look into this carefully and test more siffing at different points, including the VM itself. Will report back.

                  If it wasn't that strange I would have not written here.. I was working on this issue for 4 days so far.

                  H 1 Reply Last reply Reply Quote 0
                  • H
                    Helmut101 @Helmut101
                    last edited by Helmut101

                    Alright, so this will get long.

                    TL;DR

                    I currently do not know why, but on the specific Host, there was/is a bridge ethernet link / virtual nic configured that forwarded outgoing routes to the wrong subnet (192.168.16.0) - I have never heard of this subnet and I don't know why this ip/bridge/link ended up there.

                    I solved the issue (for the moment) with:

                    ifconfig br-985a84259068 down
                    

                    But: Once the VM is restarted, the bridge appears again. I am still working on this.

                    Sleuthing (long)

                    This was a long walk down the rabbit hole. But I'll write here, perhaps someone else will find any of the commands useful for similar catch the rabbit tasks.

                    Here's setup for testing:

                    • 60 is my IOT subnet

                    • 40 is my Sevrice subnet

                    • 30 is my Consumer subnet

                    • Client 30.11, where 30 is the subnet/vlan and 11 the IP

                    • Host 40.17, issue getting reached from subnet 30 clients

                    • Host 40.8 no issue getting reached, can reach 40.17

                    • Host 60.10 no issue getting reached, can reach 40.17

                    This already is really strange. In addition, I could reach 40.17 just
                    fine from pfsense (ping) and when connected through OpenVPN.

                    1. Check Routing

                    • On VM 40.17
                    ip route
                    
                    default via 192.168.40.1 dev eth0
                    172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
                    172.23.0.0/16 dev br-5acdb2ca8271 proto kernel scope link src 172.23.0.1 linkdown
                    172.28.0.0/16 dev br-2d547cdc7389 proto kernel scope link src 172.28.0.1
                    192.168.16.0/20 dev br-985a84259068 proto kernel scope link src 192.168.16.1
                    192.168.40.0/24 dev eth0 proto kernel scope link src 192.168.40.17
                    

                    The default route looks fine, but why are there other routes?

                    Compare output to other Host 40.8 without issues:

                    default via 192.168.40.1 dev eth0
                    192.168.40.0/24 dev eth0 proto kernel scope link src 192.168.40.8
                    

                    The 172 routes may be explained with Docker running on 40.17,
                    but 192.168.16.0/20 looks strange.

                    2. Checks Packets (tcpdump)

                    Now, as suggested, check whether packets really arrive at the host.

                    40.17:

                    tcpdump 'host 192.168.30.11 and port not 22'
                    

                    30.11:

                    ping 192.168.40.17
                    

                    Output (tcpdump):

                    01:50:58.811226 IP 192.168.30.11 > 192.168.40.17: ICMP echo request, id 1221, seq 1, length 64
                    01:50:59.816840 IP 192.168.30.11 > 192.168.40.17: ICMP echo request, id 1221, seq 2, length 64
                    01:51:00.820305 IP 192.168.30.11 > 192.168.40.17: ICMP echo request, id 1221, seq 3, length 64
                    01:51:01.823602 IP 192.168.30.11 > 192.168.40.17: ICMP echo request, id 1221, seq 4, length 64
                    01:51:02.827368 IP 192.168.30.11 > 192.168.40.17: ICMP echo request, id 1221, seq 5, length 64
                    01:51:03.831271 IP 192.168.30.11 > 192.168.40.17: ICMP echo request, id 1221, seq 6, length 64

                    They arrive, but: nothing is returned.

                    Verify/compare to output of the same commands on working host 40.8:

                    01:49:39.460155 IP 192.168.30.11 > 192.168.40.8: ICMP echo request, id 1217, seq 1, length 64
                    01:49:39.460184 IP 192.168.40.8 > 192.168.30.11: ICMP echo reply, id 1217, seq 1, length 64
                    01:49:40.461106 IP 192.168.30.11 > 192.168.40.8: ICMP echo request, id 1217, seq 2, length 64
                    01:49:40.461133 IP 192.168.40.8 > 192.168.30.11: ICMP echo reply, id 1217, seq 2, length 64
                    01:49:41.461886 IP 192.168.30.11 > 192.168.40.8: ICMP echo request, id 1217, seq 3, length 64
                    01:49:41.461918 IP 192.168.40.8 > 192.168.30.11: ICMP echo reply, id 1217, seq 3, length 64

                    3. Check routing

                    At this moment, I was pretty sure to have the issue isolated to the Host 40.17 itself.
                    Something is going on with the routing.

                    on host 40.17:

                    ip route get 192.168.30.11
                    

                    192.168.30.11 dev br-985a84259068 src 192.168.16.1 uid 0
                    cache

                    uh?

                    compare on working host 40.8:

                    ip route get 192.168.30.11
                    

                    192.168.30.11 via 192.168.40.1 dev eth0 src 192.168.40.8 uid 0
                    cache

                    Why is outgoing traffic routed through a bridge called br-985a84259068 to subnet 192.168.16.1?

                    on 40.17:
                    Check:

                    cat /etc/network/interfaces
                    

                    auto lo
                    iface lo inet loopback

                    auto eth0
                    iface eth0 inet dhcp

                    ok.. further check routes

                    apt install net-tools
                    route -n
                    

                    Kernel IP routing table
                    Destination Gateway Genmask Flags Metric Ref Use Iface
                    0.0.0.0 192.168.40.1 0.0.0.0 UG 0 0 0 eth0
                    172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
                    172.23.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-5acdb2ca8271
                    172.28.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-2d547cdc7389
                    192.168.16.0 0.0.0.0 255.255.240.0 U 0 0 0 br-985a84259068 <--- What is this??
                    192.168.40.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0

                    Study what are network bridges:
                    https://wiki.archlinux.org/index.php/Network_bridge
                    https://tldp.org/HOWTO/BRIDGE-STP-HOWTO/set-up-the-bridge.html

                    bridge link
                    

                    8: vetha2e5a47@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br-2d547cdc7389 state forwarding priority 32 cost 2
                    10: vethcd0643c@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br-985a84259068 state forwarding priority 32 cost 2 <-- Here it is
                    14: veth992d5b3@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br-2d547cdc7389 state forwarding priority 32 cost 2
                    16: vethb6721a9@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br-2d547cdc7389 state forwarding priority 32 cost 2
                    18: veth7dfb21f@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br-2d547cdc7389 state forwarding priority 32 cost 2
                    20: vethc3562b4@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br-2d547cdc7389 state forwarding priority 32 cost 2
                    22: veth9017e4e@if21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br-2d547cdc7389 state forwarding priority 32 cost 2

                    apt install bridge-utils
                    brctl show br-985a84259068
                    

                    bridge name bridge id STP enabled interfaces
                    br-985a84259068 8000.02428b97932d no vethcd0643c <-- Here, too

                    ifconfig
                    

                    vethcd0643c: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
                    inet6 fe80::dc58:85ff:fef0:eef1 prefixlen 64 scopeid 0x20<link>
                    ether de:58:85:f0:ee:f1 txqueuelen 0 (Ethernet)
                    RX packets 8644 bytes 680480 (664.5 KiB)
                    RX errors 0 dropped 0 overruns 0 frame 0
                    TX packets 7417 bytes 1041527 (1017.1 KiB)
                    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

                    ifconfig vethcd0643c down
                    

                    Test route again:

                    ip route get 192.168.30.11
                    192.168.30.11 dev br-985a84259068 src 192.168.16.1 uid 0
                        cache
                    

                    Also down with the bridge:

                    ifconfig br-985a84259068 down
                    

                    192.168.30.11 via 192.168.40.1 dev eth0 src 192.168.40.17 uid 0
                    cache

                    Yay!

                    That is it for the moment. If I restart the LXC container, the bridge is
                    added again with the same name. Who does this? I do not know yet.

                    H 1 Reply Last reply Reply Quote 0
                    • H
                      Helmut101 @Helmut101
                      last edited by

                      What a nightmare.. that is finally solved. Ended up in the a complete different hole. It wasn't the rabbit, it was the docker.

                      TL;DR

                      On 40.17 LXC, I had docker running with sevral configs.
                      Docker apparently decides for a default subnet based on a list of ips from:
                      172.17.0.0/16", 172.18.0.0/16", "172.19.0.0/16",
                      "172.20.0.0/14", "172.24.0.0/14" "172.28.0.0/14", "192.168.0.0/16"

                      It verifies whether that IP range is in use. In my case, it somehow missed that 192.168.0.0/16 is near my VLAN config.

                      Adding

                      {
                          "bip": "193.168.1.5/24",
                      	"default-address-pools":
                      	[
                      		{"base":"172.17.0.0/16","size":24}
                      	]
                      }
                      

                      to /etc/docker/daemon.json solved the problem. But I had to identify first which container used the default network, stop it, reload docker, and start it, to refresh the network.

                      docker network list
                      
                      > NETWORK ID     NAME                DRIVER    SCOPE
                      > 4d17f9cc818b   bridge              bridge    local
                      > 985a84259068   docker_default      bridge    local<-- this
                      > 2d547cdc7389   funkwhale_default   bridge    local
                      > abddd765db3e   host                host      local
                      > 5acdb2ca8271   iris_default        bridge    local
                      > be879c14dc73   none                null      local
                      
                      docker network inspect 985a84259068
                      
                      >         "Containers": {
                      >             "24f38ca4c3e1080f050b868f4b980f3616b8047be45809276e74e217bf2f7f57": {
                      >                 "Name": "Solaranzeige", <--- this
                      >                 "EndpointID": "8274319e9ec797c19bcd46d2aadf9277d249135a3fbf326abc14b4893f994081",
                      >                 "MacAddress": "02:42:c0:a8:10:02",
                      >                 "IPv4Address": "192.168.16.2/20",  <--- this
                      >                 "IPv6Address": ""
                      >             }
                      >         },
                      
                      docker stop Solaranzeige
                      systemctl daemon-reload
                      systemctl restart docker
                      

                      Verify:

                      ip addr | grep 192
                          inet 192.168.40.17/24 brd 192.168.40.255 scope global eth0
                      

                      Only the native VLAN!

                      H 1 Reply Last reply Reply Quote 0
                      • H
                        Helmut101 @Helmut101
                        last edited by Helmut101

                        Thanks so much to everybody involved here. I was entirely wrong in my initial suspicion, but the analysis helped me better understand how networks work, so I do not consider this as lost time.

                        Some revelations:

                        • for incoming traffic, wireshark, tcpdump and packet capture (pfsense) are king
                        • for outgoing traffic, ip route get [host ip] helps to see in which direction traffic leaves (or doesn't leave) the host
                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.