Web sites not loading when accessing pfSense through VLAN trunk.
-
Pinging with packet size 1473-1474 always fails.
Starting to look more and more like an MTU problem.
As I recall if you are using a NIC that doesn't support hardware VLAN tagging then the tagging has to be included in the packet reducing the MTU. I believe it's only 4 bytes though but this could be combining with some other problem to produce the effect you are seeing.Since you aren't seeing any problem on your LAN interface I would suggest that this is happening in the VLAN stage of the transfer. Try reducing the MTU of your internal interface somehow. Perhaps:
http://forum.pfsense.org/index.php/topic,47387.0.html
Though I would expect that to be taken care of without issue. It's as if you have a 'do not fragment' flag set somewhere?Steve
-
Ok, so I did some testing and it looks like it may be a driver issue. I tried pinging hosts on my network with various packet sizes and it looks like the firebox doesn't like to receive packets sized 1474 bytes and to send then out via VLAN interfaces. Transmitting on normal interfaces is fine, as is receiving the ping replies.
here's a log of me pinging the PE1900 via the old LAN and via VLAN'ed network: http://pastebin.com/xvd2gd86
I tail the syslog to see if outgoing traffic causes watchdog timeouts. So far it looks like only incoming traffic does, more on that later. Also what we see here is that the particular packet size is only a problem when going through a vlan trunk. Also, I did 'ifconfig' at the end.I attach the results of pinging the pfSense box and my laptop from PE1900, results of doing:
for (( size=1470; size<=1492; size++ )); do ping -s $size -c1 192.168.0.22; done > ping_laptop.txt for (( size=1470; size<=1492; size++ )); do ping -s $size -c1 192.168.0.1; done > ping_pfsense.txt for (( size=1470; size<=1492; size++ )); do ping -s $size -c1 192.168.10.1; done > ping_pfsenseVlan.txt
So it looks like even though its possible to directly send out requests of 1474 bytes, the firebox has trouble receiving and processing them. Also, pinging the pfSense box with 1473/1474 bytes causes a watchdog timeout on the interface. Pinging VLANs causes timeouts on random LAGG interfaces which receive the packet.
Here's how it looks when I run tcpdump: http://pastebin.com/xkAcD1Q2
Just to be sure its not an issue with LACP/VLAN trunk on the switch, I connected the laptop to VLAN 16 and pinged it from PE1900: http://pastebin.com/5pPuykqX
I'm still not sure if this has anything to do with websites not loading for me. Assuming it does, would changing the MTU on my VLAN interfaces help? Should I make it higher or lower? And more importantly, how do I change it?
[2.0.1-RELEASE][root@pfsense.bobnet]/root(90): ifconfig re3 mtu 1492 ifconfig: ioctl (set mtu): Invalid argument [2.0.1-RELEASE][root@pfsense.bobnet]/root(91): ifconfig lagg0 mtu 1492 ifconfig: ioctl (set mtu): Invalid argument
I can't change the MTU on the LAGG interfaces, or on the LAGG iteslf. I know the interfaces themselves should support changing of the MTU:
[2.0.1-RELEASE][root@pfsense.bobnet]/root(92): ifconfig re0 mtu 1492 [2.0.1-RELEASE][root@pfsense.bobnet]/root(93): ifconfig re1 mtu 1492 [2.0.1-RELEASE][root@pfsense.bobnet]/root(94): ifconfig re2 mtu 1492 [2.0.1-RELEASE][root@pfsense.bobnet]/root(95): ifconfig re3 mtu 1492 ifconfig: ioctl (set mtu): Invalid argument [2.0.1-RELEASE][root@pfsense.bobnet]/root(96): ifconfig re4 mtu 1492 ifconfig: ioctl (set mtu): Invalid argument [2.0.1-RELEASE][root@pfsense.bobnet]/root(97): ifconfig re5 mtu 1492 ifconfig: ioctl (set mtu): Invalid argument
Is this a matter of tricking the ifconfig script by assigning the LAGG ports an IP addr? Or… is this a bigger issue?
-
This could be an important result in narrowing down the watchdog timeout errors that have plagued that box. Until now about the best advice we have is to use a managed connected to the box based anecdotal evidence that fragmented packets can cause a problem.
Anyway I think you will have to bring the interface down before changing the MTU or possibly remove it from the LAGG? Though I have no problem doing that on my box. :-\ You want to be doing this on the VLAN parent interface but this is the LAGG interface for you. Hmm. Trial and error I think!
Also I see that the re driver and the chip supports hardware VLAN tagging. It might be interesting to try disabling it, especially vlanmtu:
@ifconfig:vlanmtu, vlanhwtag, vlanhwfilter, vlanhwtso
If the driver offers user-configurable VLAN support, enable
reception of extended frames, tag processing in hardware, frame
filtering in hardware, or TSO on VLAN, respectively. Note that
this must be issued on a physical interface associated with
vlan(4), not on a vlan(4) interface itself.-vlanmtu, -vlanhwtag, -vlanhwfilter, -vlanhwtso
If the driver offers user-configurable VLAN support, disable
reception of extended frames, tag processing in hardware, frame
filtering in hardware, or TSO on VLAN, respectively.Steve
-
Some Realtek NICs have broken long frame support, so when you're trying to pass packets that have the full 1500 MTU and then add the VLAN tag, they refuse to send or receive them. Your symptoms match that scenario 100%. Not sure anyone has done VLANs on those boxes, but it'd be far from the first NIC issues people have seen on them.
-
I am having difficulty understanding the various flags reported by ifconfig but it seems to me that the vlanmtu flag is used by the VLAN driver to determine whether or not the interface supports VLAN frames larger than 1500. If it is set and the interface in fact does not support this then it could be the cause of many problems. There is a suggested solution to this:
The vlan driver automatically recognizes devices that natively support
long frames for vlan use and calculates the appropriate frame MTU based
on the capabilities of the parent interface. Some other interfaces not
listed above may handle long frames, but they do not advertise this abil-
ity of theirs. The MTU setting on vlan can be corrected manually if used
in conjunction with such a parent interface.My own X700 box has died completely so I can't check. :(
What MTU size has the VLAN driver determined is correct on your box? What flags are reported by ifconfig on the re interfaces?Steve
-
Here's a quote from YongHyeon PYUN, the re(4) maintainer/author:
@http://freebsd.1045724.n5.nabble.com/Abysmal-re-4-performance-under-8-1-STABLE-mid-August-td3946608.html:I'm sure this has nothing to do that this issue.
If you want to disable checksum offloading of VLAN
interface, use vlan interface instead of parent interface
of the VLAN interface(i.e. ifconfig vlan0 -txcsum -rxcsum).
And you can't disable VLAN_MTU on re(4). There is no
reason to disable supporting VLAN oversized frames.So perhaps a manual MTU reduction is necessary.
Steve
-
Thanks for all the insight!
So far I tried disabling all the hardware features on the LAGG interfaces and reducing the MTU. I'll try doing one thing at a time this weekend, just wanted to see the result of the extreme set of changes. I had to delete the LAGG and all the VLANs to be able to change the underlying interfaces.
There are two differences I noticed so far:
1 - now I can ping VLAN PCs with all the packet sizes with no packet loss. Ie. doing ping -v -c 1 -g 1470 -G 1492 -S 192.168.10.1 192.168.10.64 on pfSense box doesn't exibit packet loss anymore.
2 - while pinging the pfSense box via the VLAN interfaces (with hw. features disabled), when using packet sizes that didn't work before (~1474, basically MTU - 28), there are no echo replies detected in tcpdump. Previously, echo replies were logged but nothing got outAlso, no watchdog timeouts logged yet, but I've done no stress testing yet either.
Certain web sites are still inaccessible when connecting via VLAN interfaces. Actually, I think it even got worse since I can't even load imgur now, while previously it was just a matter of refreshing the page until the main pic loaded.
Next up I'll try disabling the LAGG. Even though I don't suspect it of inducing errors, it doesn't make changing interface settings any easier. It did inherit all the relevant changes I made to the interfaces, like MTU and hw. features.
-
2 - while pinging the pfSense box via the VLAN interfaces (with hw. features disabled), when using packet sizes that didn't work before (~1474, basically MTU - 28), there are no echo replies detected in tcpdump. Previously, echo replies were logged but nothing got out.
Hmm, anything in the firewall log? Did you reinstate the firewall rules? Easily overlooked. ;)
Looks like you're making some progress.Steve
-
What I meant in point 2:
PE1900, pinging normal interface with HW acceleration enabled:
root@bobeus:~# ping -c3 -s 1474 -I 192.168.2.16 192.168.2.1 PING 192.168.2.1 (192.168.2.1) from 192.168.2.16 : 1474(1502) bytes of data. ^C --- 192.168.2.1 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2015ms
tcpdump on pfSense box:
23:58:29.303739 IP 192.168.2.16 > 192.168.2.1: ICMP echo request, id 22611, seq 1, length 1480 23:58:29.303762 IP 192.168.2.16 > 192.168.2.1: icmp 23:58:29.303871 IP 192.168.2.1 > 192.168.2.16: ICMP echo reply, id 22611, seq 1, length 1480 23:58:29.303875 IP 192.168.2.1 > 192.168.2.16: icmp 23:58:30.316443 IP 192.168.2.16 > 192.168.2.1: ICMP echo request, id 22611, seq 2, length 1480 23:58:30.316464 IP 192.168.2.16 > 192.168.2.1: icmp 23:58:30.316517 IP 192.168.2.1 > 192.168.2.16: ICMP echo reply, id 22611, seq 2, length 1480 23:58:30.316521 IP 192.168.2.1 > 192.168.2.16: icmp 23:58:31.329564 IP 192.168.2.16 > 192.168.2.1: ICMP echo request, id 22611, seq 3, length 1480 23:58:31.329586 IP 192.168.2.16 > 192.168.2.1: icmp 23:58:31.329646 IP 192.168.2.1 > 192.168.2.16: ICMP echo reply, id 22611, seq 3, length 1480 23:58:31.329650 IP 192.168.2.1 > 192.168.2.16: icmp
PE1900, pinging the a VLAN interface with hw acceleration disabled.
root@bobeus:~# ping -c3 -s 1468 -I 192.168.10.64 192.168.10.1 PING 192.168.10.1 (192.168.10.1) from 192.168.10.64 : 1468(1496) bytes of data. ^C --- 192.168.10.1 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2015ms
tcpdump on pfSense box:
[2.0.1-RELEASE][root@pfsense.bobnet]/root(19): tcpdump -i re3_vlan128 host 192.168.10.64 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on re3_vlan128, link-type EN10MB (Ethernet), capture size 96 bytes 00:03:14.496355 IP 192.168.10.64 > 192.168.10.1: ICMP echo request, id 22616, seq 1, length 1476 00:03:15.500857 IP 192.168.10.64 > 192.168.10.1: ICMP echo request, id 22616, seq 2, length 1476 00:03:16.505921 IP 192.168.10.64 > 192.168.10.1: ICMP echo request, id 22616, seq 3, length 1476 00:03:19.529010 ARP, Request who-has 192.168.10.1 tell 192.168.10.64, length 42 00:03:19.529036 ARP, Reply 192.168.10.1 is-at 00:90:7f:2e:84:db (oui Unknown), length 28 ^C 5 packets captured 5 packets received by filter 0 packets dropped by kernel
PE1900, pinging the a VLAN interface with a smaller payload.
root@bobeus:~# ping -c3 -s 1452 -I 192.168.10.64 192.168.10.1 PING 192.168.10.1 (192.168.10.1) from 192.168.10.64 : 1452(1480) bytes of data. 1460 bytes from 192.168.10.1: icmp_req=1 ttl=64 time=0.503 ms 1460 bytes from 192.168.10.1: icmp_req=2 ttl=64 time=0.443 ms 1460 bytes from 192.168.10.1: icmp_req=3 ttl=64 time=0.434 ms --- 192.168.10.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1998ms rtt min/avg/max/mdev = 0.434/0.460/0.503/0.030 ms
tcpdump on pfSense box:
[2.0.1-RELEASE][root@pfsense.bobnet]/root(25): tcpdump -i re3_vlan128 host 192.168.10.64 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on re3_vlan128, link-type EN10MB (Ethernet), capture size 96 bytes 00:08:47.328097 IP 192.168.10.64 > 192.168.10.1: ICMP echo request, id 22659, seq 1, length 1460 00:08:47.328201 IP 192.168.10.1 > 192.168.10.64: ICMP echo reply, id 22659, seq 1, length 1460 00:08:48.332135 IP 192.168.10.64 > 192.168.10.1: ICMP echo request, id 22659, seq 2, length 1460 00:08:48.332183 IP 192.168.10.1 > 192.168.10.64: ICMP echo reply, id 22659, seq 2, length 1460 00:08:49.336806 IP 192.168.10.64 > 192.168.10.1: ICMP echo request, id 22659, seq 3, length 1460 00:08:49.336850 IP 192.168.10.1 > 192.168.10.64: ICMP echo reply, id 22659, seq 3, length 1460 ^C 6 packets captured 6 packets received by filter 0 packets dropped by kernel
I've also looked at the interface itself, no replies visible either.
Pinging with large payloads (2000+) works well, whenever the packets are fragmented.
Question - if I set the interface/VLAN interface MTU very low, say 300, should a ping of 1200 bytes directed to it be automatically split up into smaller chunks? Or should it always fail (like it does here)? I think I need to read up on the basics…
-
If you don't have 'do not fragment' set then it should simply fragment the packets into suitably sized frames. The problem is how it decides whether it needs to do that and how it decides what a suitable size is.
To be honest this is now well outside my own experience! ;)Steve
-
Question - if I set the interface/VLAN interface MTU very low, say 300, should a ping of 1200 bytes directed to it be automatically split up into smaller chunks? Or should it always fail (like it does here)? I think I need to read up on the basics…
It'll get dropped, can't accept frames larger than your MTU. Nothing in the path to fragment it.
-
I tried, tested, tuned and couldn't get it to work in a reasonable amount of time. So I dropped the idea of using VLANs on the pfSense/firebox combo.
Instead of aggregating four ports into a LAGG and passing VLANs through that, I just mapped four ports on the switch to different VLANs and setup interfaces normally. It works fine this way, but I really liked the idea of having a theoretical throughput of 400Mb/s to play with, along with a flexible amount of VLAN interfaces to control.
-
A dissapointing result but hopefully save someone else some time. ::)
I'm sure it could be made to work but whether it would be worth the effort or not is debatable. It would probably be easier to just put an Intel gigabit card in the PCI slot with the case mods that requires.Steve