Is this an Asymmetric Routing routing issue?
-
I am posting in L2/Switching/VLANs, but I am not sure if my issue really belongs here:
- I have a pretty basic Setup with a pfsense box (L3), a netgear switch (L2) and a Hyperviror (Proxmox) for serving local services.
- Currently, I have two VLANs: VLAN 30 (Consumer subnet, all connected devices) and VLAN 40 (Service subnet, for serving Services)
- I can reach all Service hosts from the consumer network, except one (192.168.40.9)
- No firewall logs are shown when I try to reach 192.168.40.9 from 192.168.30.99 (e.g. traffic is passed);
- Diagnostics / States shows
CLOSED:SYN_SENT
- I can establish a connection via netcat UDP, but not via netcat TCP
- This leads me to assume this is an issue of Asymmetric Routing
- However, it is strange that only 1 of the three services is affected. They are configured entirely equal, apart from the IP.
- I've set Bypass firewall rules for traffic on the same interface option located under System > Advanced on the Firewall/NAT tab, but no change
- I can connect to the Service if I connect via OpenVPN
I can exclude Firewall Rules. How can I further debug this?
Here is a log of the State Diagnostics output:
OPT2VLAN30 tcp 192.168.30.99:53491 -> 192.168.40.9:443 CLOSED:SYN_SENT 4 / 0 208 B / 0 B OPT2VLAN40 tcp 192.168.30.99:53491 -> 192.168.40.9:443 SYN_SENT:CLOSED 4 / 0 208 B / 0 B OPT2VLAN30 tcp 192.168.30.99:53492 -> 192.168.40.9:443 CLOSED:SYN_SENT 4 / 0 208 B / 0 B OPT2VLAN40 tcp 192.168.30.99:53492 -> 192.168.40.9:443 SYN_SENT:CLOSED 4 / 0 208 B / 0 B
This is a diagram showing my network:
-
Another 4 hours trying to debug this:
Packet Capture on pfsense shows similarly nothing13:41:30.106337 IP 192.168.30.99.55124 > 192.168.40.9.443: tcp 0 13:41:30.354496 IP 192.168.30.99.55125 > 192.168.40.9.443: tcp 0 13:41:31.106360 IP 192.168.30.99.55124 > 192.168.40.9.443: tcp 0 13:41:31.361582 IP 192.168.30.99.55125 > 192.168.40.9.443: tcp 0
From the pfsense, I can reach the port through port test:
Port test to host: 192.168.40.9 Port: 443 successful.
I also followed pfsense docs to add firewall rules manual for traffic between VLAN30<->VLAN40. No effect.
-
Also, it does not matter which port. Protocol, however, appears to matter:
On 192.168.40.9 (Server):
netcat -u -l -p 1234
Client (192.168.30.99)
netcat -u 192.168.40.9 1234 <<< 1
Output on Server (UDP):
1
For TCP,
On 192.168.40.9 (Server):netcat -l -p 1234
Client (192.168.30.99)
netcat 192.168.40.9 1234 <<< 1
No Output on Server.
-
@helmut101 said in Is this an Asymmetric Routing routing issue?:
a netgear switch (L2) and a Hyperviror (Proxmox) for serving local services.
Since it is an L2 switch, it is not capable to separate VLANs.
Basically it's possible to send VLAN traffic over an L2 switch though, but you need end-devices, which can handle it correctly.
Is the wifi AP VLAN-capable?
Is Proxmox configured in a way so that it only accepts tagged packets?@helmut101 said in Is this an Asymmetric Routing routing issue?:
Packet Capture on pfsense shows similarly nothing
Which interface?
-
@viragomann said in Is this an Asymmetric Routing routing issue?:
Since it is an L2 switch, it is not capable to separate VLANs.
Huh? You mean the switch is an unmanaged, ie dumb switch... Then you would be correct. But L2 switches can and do vlans just fine ;)
He lists his GSS116E switch - this is more than capable of doing vlans. Now does he have something amiss in the config.. That is quite possible sure.
edit:
So whatever port on his netgear that is connected to his lan port on pfsense
lan (vlan 1 on the switch I would assume) untagged, and 30 and 40 tagged.port goes to his AP would be vlan 30 untagged (access port).
Ports going to his hypervisor would be both access ports with 192.168.10 being vlan 1.. But he is calling it vlan 10? That would be untagged.. And other port connected to eth1 would be also access port untagged 40.
Really the only thing that comes into question is config of the ports. And not sure why calling it lan10 connected to proxmox.. And then just lan (native) connected to pfsense? What exactly is the native untagged vlan set in the netgear.. 10 or 1 ? Did he change it from the default vlan 1?
I've set Bypass firewall rules for traffic on the same interface
That for sure should NOT BE set..
edit: But then it looks like he has his vlans on opt2? Vs his lan - but he only shows 1 connection (labeled lan) from his netgear to pfsense. Which I would then assume he has his lan port with a native untagged 192.168.10 network on, and then his too vlans 30 and 40 on that same interface?
-
@johnpoz said in Is this an Asymmetric Routing routing issue?:
Huh? You mean the switch is an unmanaged, ie dumb switch
Yeah, that was my thinking.
The GSS116E is VLAN capable. Didn't notice.
-
@viragomann said in Is this an Asymmetric Routing routing issue?:
@helmut101 said in Is this an Asymmetric Routing routing issue?:
a netgear switch (L2) and a Hyperviror (Proxmox) for serving local services.
Since it is an L2 switch, it is not capable to separate VLANs.
I was thinking the same route and I checked the Switch Manual - while it can forward VLANs, there is no mention that it can route VLAN traffic. I also verified in the settings, there is no option to set inter-VLAN routing on the switch.
Basically it's possible to send VLAN traffic over an L2 switch though, but you need end-devices, which can handle it correctly.
Is the wifi AP VLAN-capable?No, it is a FRITZ!Box 7430 set as IP-Client.
Is Proxmox configured in a way so that it only accepts tagged packets?
Ha, now we get to something.. I was not sure so I checked. In Proxmox, I have two NICs, eth0 and eth1 - I use eth0 to connect to the Hypervisor, eth1 is used (DMZ like) for serving services.
The default Gateway (untagged traffic) is for eth0 (192.168.10.254),
which is my Management native LAN. The individual Containers all have 192.168.40.1 as gateway and "Tagged VLAN" enabled:eth0/vmbridge0 (Management LAN for Hypervisor):
eth1/vmbridge1 (Service VLAN for Containers):
Container 40.9 (vmbridge1 settings):
But: All other containers have the same Network settings. I can
reahc those other containers just fine.@helmut101 said in Is this an Asymmetric Routing routing issue?:
Packet Capture on pfsense shows similarly nothing
Which interface?
I have tested 192.168.30.1 and 192.168.40.1. Both Pacet Captures show that Client->Server traffic passes, but nothing is returned. Haven't tested the Management LAN (192.168.10.1) - will report back.
I'll see if I can modify the Proxmox Network settings and if this has any effect. Thank you for the hints!!
-
This is my Proxmox network config as text:
auto lo iface lo inet loopback iface eno1 inet manual iface eno2 inet manual auto vmbr0 iface vmbr0 inet static address 192.168.10.42/24 gateway 192.168.10.254 bridge-ports eno1 bridge-stp off bridge-fd 0 #Management Network auto vmbr1 iface vmbr1 inet static address 192.168.40.0/24 bridge-ports eno2 bridge-stp off bridge-fd 0 bridge-vlan-aware yes bridge-vids 2-4094 #Service Network
It is mainly following the docs, but I will need to investigate further.
Well, after doing:
/etc/init.d/networking restart
in Proxmox, I lost connection to the other 2 Services in VLAn 40, too. This setup was working for over a year now.. what is going on..
-
Oh my god.. solved! A simple restart of the Hypervisor!
These proxmox networking issues made me suspicious. I did a complete reboot of the Server and all my services are reachable & working now.
I can only speculate that there was some Package Update in Proxmox
that confused things on the networking/firewall/routing side.I hope this was a rare individual case. Actually, it is the first time I had problems with this setup for since running it for over a year.
Nevertheless, learned a lot about debugging network issues on various areas. Many thanks for your help!
-
Back to Zero: Service 192.168.40.9 stopped being reachable after about 1 hour. The other services still work.
Btw.: It is a miracle to me why everything still works through OpenVPN, but not the VLAN.
-
@helmut101 said in Is this an Asymmetric Routing routing issue?:
I was thinking the same route and I checked the Switch Manual - while it can forward VLANs, there is no mention that it can route VLAN traffic. I also verified in the settings, there is no option to set inter-VLAN routing on the switch.
If I look into the manual I see chapter 4 describe how to configure VLANs on each port, either tagged or untagged.
That should be sufficient to separate the VLANs correctly. There is no need for routing traffic on the switch, this can be done by pfSense.So configure the switch port which the wifi is connected to as untagged for VLAN30 and PVID for 30, so that incoming packets get tagged.
The port which is pfSense connected to has to be added to all VLANs as tagged.
On the ports facing to Proxmox the packets can be tagged so that you don't need to change the Proxmox configuration. -
setup for since running it for over a year.
You sure you just don't have duplicate IP then? If your saying this worked for a year without issue, then I don't see how its any sort of networking issue.. But something wrong with whatever that IP is, or something stepping on that IP?
-
@johnpoz said in Is this an Asymmetric Routing routing issue?:
You sure you just don't have duplicate IP then? If your saying this worked for a year without issue, then I don't see how its any sort of networking issue.
That could be a reason for the strange behavior of course.
-
First, thanks a lot to all of you for responding here. I have the feeling this is Proxmox specific, and since this is a forum for pfsense, I cannot expect such help. Anyway, since the discussion is already going and I haven't found a solution yet.. I appreciate any hints.
@viragomann said in Is this an Asymmetric Routing routing issue?:
If I look into the manual I see chapter 4 describe how to configure VLANs on each port, either tagged or untagged.
That should be sufficient to separate the VLANs correctly. There is no need for routing traffic on the switch, this can be done by pfSense.
So configure the switch port which the wifi is connected to as untagged for VLAN30 and PVID for 30, so that incoming packets get tagged.
The port which is pfSense connected to has to be added to all VLANs as tagged.
On the ports facing to Proxmox the packets can be tagged so that you don't need to change the Proxmox configuration.Yes, that is basically how it is set. I have two Trunk Ports on the Switch, one for Switch <> pfsense and one for Switch <> Proxmox.
Only these two ports support tagged traffic. All other ports are marked as untagged, and the switch itself assigns tags (e.g. for the WLAN). This has worked, so I am not expecting any problem here.
I tested whether my Proxmox VLAN Settings may be the problem: I am back to the original configuration after 4 hours of testing various combinatins. I am basically following the officially, recommended default setting for VLAN/Proxmox:
VLAN awareness on the Linux bridge: In this case, each guest’s virtual network card is assigned to a VLAN tag, which is transparently supported by the Linux bridge.
With this configuration, Two Services on
192.168.40.0
VLAN (.7
and.8
work), but not.9
.I am able to connect/ping
192.168.40.9
from pfsense itself, or when I connect via OpenVPN, but not from192.168.30.0
subnet.If I do packet capture on SSH connect 192.168.30.0
->
192.168.40.9`I see (Interface VLAN30 and VLAN40 the same):
14:06:08.549507 IP 192.168.30.99.52908 > 192.168.40.9.22: tcp 0 14:06:09.549269 IP 192.168.30.99.52908 > 192.168.40.9.22: tcp 0 14:06:11.557236 IP 192.168.30.99.52908 > 192.168.40.9.22: tcp 0
I have also checked several times that
192.168.40.9
is not assigned anywhere else.Next thing to check is whether this works if I change the IP to something else. However, this will require changing several settings of interconnections between services (IOT, Docker etc.).
-
Sniff on the pfsense vlan40 interface.. Do you see the traffic go out?
If so then its not pfsense. You can also validate that sending to the correct mac for the .9 address. And that its tagged..
But if your saying you can talk to other 40.X stuff from your 30 network.. Its really unlikely its anything to do with pfsense. But doesn't hurt to check that you actually see the traffic go out to the correct mac, and its tagged correctly.
You don't have some rules on the 30 vlan or floating that could be doing anything weird with that IP? Say policy routing?
To view tags in sniffing traffic on pfsense you would need to use cmdline on pfsense with tcpdump -e
one sec and put up an example... sniffing on one of my interfaces with vlans on it.. BRB
-
Yes, I can see that the traffic is going out on pfsense, but not coming back to the client:
14:19:58.625064 IP 192.168.30.99.53151 > 192.168.40.17.22: tcp 0 14:19:59.624594 IP 192.168.30.99.53151 > 192.168.40.17.22: tcp 0 14:20:01.624796 IP 192.168.30.99.53151 > 192.168.40.17.22: tcp 0
Note: Above I have changed the LXC Container's IP to
17
(instead of9
). This also has no effect.This is how it looks like for the other service on
.8
, successfully connecting ssh:14:28:40.372431 IP 192.168.40.8.22 > 192.168.30.99.53285: tcp 0 14:28:40.375857 IP 192.168.40.8.22 > 192.168.30.99.53285: tcp 452 14:28:40.383526 IP 192.168.30.99.53285 > 192.168.40.8.22: tcp 0 14:28:40.383573 IP 192.168.30.99.53285 > 192.168.40.8.22: tcp 16
You don't have some rules on the 30 vlan or floating that could be doing anything weird with that IP? Say policy routing?
To view tags in sniffing traffic on pfsense you would need to use cmdline on pfsense with tcpdump -e
one sec and put up an example... sniffing on one of my interfaces with vlans on it.. BRB
I have checked rules for the VLAN 30 (and 40) over and over - but no, I do not see anything interfering here.
I'll check with
tcpdump -e
! Thank you.My time is running out today.. will report tomorrow if I got further.
-
Ok you don't actually have to do it via cmd line... If you enable promiscuous mode, and sniff on the parent interface.. You can do it easier just from the gui, and then for easy reading just download and open with wireshark.
example
And here in wireshark
You can see traffic on the 192.168.4 network is tagged with vlan id 4, and traffic on the 192.168.2 is native an untagged.. Both of these networks are on my igb2 interface.
But if you are seeing traffic going out of pfsense and tagged correctly, then no it has nothing to do with pfsense.
edit: hiding that 73.x address - that is my son's connection. His unifi stuff talks to controller on my network.
you will want to make sure you look at outbound traffic from pfsense for your vlan tag, and that sending to whatever mac this .9 is actually at.. That is inbound traffic into mine.. But just an example of seeing the tags. You can see if you are seeing an answer, but maybe the answer is not tagged? Or tagged wrong, etc.
-
@johnpoz
Nice, thanks! I did not know that I can do all of this. And I am really feeling I need to read into packet captures, sniffing etc.. But the cap collected in pfsense with promiscous looks different in wireshark:
-
Yeah if there is no answer you will see retrans.. Thought your problem child was .9?
But if click into a specific packet you should see the tag, like my example.
edit: Maybe you have to enable to show 802.1q in the dissector.. Let me check my wireshark settings. I use wireshark a lot, so might have turned it on long time ago..
-
@johnpoz
Yes, half an hour ago I changed the LXC Container's IP to17
, to see if it has any effect: No, it doesn't. Same problem. Can reach.8
, but cannot reach.17
(both on the same vmbridge in Proxmox). I can even reach a third container, with a different subnet VLAN (60
instead of40
)..