VLAN+Firewall MTU problem
-
Hello to all,
We have been running PFSense for about a year and Mono for a few years prior to that. We have a series of web and mail servers behind a PFS firewall, going to the internet via a pair of redundant routers doing vrrp on the inside and bgp on the outside. All has been running flawlessly for a year or so. We just recently started selling Lan Extension to some of our clients and as a result added a 3Com Layer3 switch between the PFS and our routers to terminate the Vlans and route them to the internet. That also worked without any problems. The internet links are 100Mb/s.
Then came our strange issue: when someone at the client side of a Vlan tries to access one of our mail servers behind the PFS to download a large file (I did my tests with 10M files over http) then the download starts normally and then jams for 5 seconds before running again for a few seconds, jamming, etc. until the whole file downloads. The same download done from the internet has no issues and if I run it from another location on my network located between the router and PFS it works as well. Bypassing the firewall (connecting the server directly in front of the firewall) also works flawlessly from a Vlan client.
Sniffing results in no checksum errors but the last few packets are above 1500…
23:08:59.065432 00:0f:cb:a4:4a:e0 > 00:e0:81:46:25:47, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 127, id 32252, offset 0, flags [none], proto: TCP (6), length: 40) 208.71.9.102.1277 > 208.71.9.128.80: ., cksum 0x8488 (correct), 1099:1099(0) ack 60833 win 18195
23:08:59.067557 00:0f:cb:a4:4a:e0 > 00:e0:81:46:25:47, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 127, id 32253, offset 0, flags [none], proto: TCP (6), length: 40) 208.71.9.102.1277 > 208.71.9.128.80: ., cksum 0x8488 (correct), 1099:1099(0) ack 63753 win 15275
23:08:59.069680 00:0f:cb:a4:4a:e0 > 00:e0:81:46:25:47, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 127, id 32254, offset 0, flags [none], proto: TCP (6), length: 40) 208.71.9.102.1277 > 208.71.9.128.80: ., cksum 0x8488 (correct), 1099:1099(0) ack 66673 win 12355
23:08:59.071805 00:0f:cb:a4:4a:e0 > 00:e0:81:46:25:47, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 127, id 32255, offset 0, flags [none], proto: TCP (6), length: 40) 208.71.9.102.1277 > 208.71.9.128.80: ., cksum 0x8488 (correct), 1099:1099(0) ack 69593 win 9435
23:08:59.073555 00:0f:cb:a4:4a:e0 > 00:e0:81:46:25:47, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 127, id 32256, offset 0, flags [none], proto: TCP (6), length: 40) 208.71.9.102.1277 > 208.71.9.128.80: ., cksum 0x8488 (correct), 1099:1099(0) ack 72513 win 6515
23:08:59.079553 00:0f:cb:a4:4a:e0 > 00:e0:81:46:25:47, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 127, id 32257, offset 0, flags [none], proto: TCP (6), length: 40) 208.71.9.102.1277 > 208.71.9.128.80: ., cksum 0x8488 (correct), 1099:1099(0) ack 75433 win 3595
23:08:59.081302 00:0f:cb:a4:4a:e0 > 00:e0:81:46:25:47, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 127, id 32258, offset 0, flags [none], proto: TCP (6), length: 40) 208.71.9.102.1277 > 208.71.9.128.80: ., cksum 0x8488 (correct), 1099:1099(0) ack 78353 win 675
23:08:59.087130 00:e0:81:46:25:47 > 00:0f:cb:a4:4a:e0, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 128, id 53907, offset 0, flags [none], proto: TCP (6), length: 40) 208.71.9.128.80 > 208.71.9.102.1277: ., cksum 0x8b75 (correct), 78353:78353(0) ack 1099 win 64437
23:08:59.365591 00:0f:cb:a4:4a:e0 > 00:e0:81:46:25:47, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 127, id 32262, offset 0, flags [none], proto: TCP (6), length: 40) 208.71.9.102.1277 > 208.71.9.128.80: ., cksum 0x872b (correct), 1099:1099(0) ack 78353 win 65535
23:08:59.366174 00:e0:81:46:25:47 > 00:0f:cb:a4:4a:e0, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 128, id 42688, offset 0, flags [none], proto: TCP (6), length: 1500) 208.71.9.128.80 > 208.71.9.102.1277: . 78353:79813(1460) ack 1099 win 64437
23:08:59.366288 00:e0:81:46:25:47 > 00:0f:cb:a4:4a:e0, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 128, id 41925, offset 0, flags [none], proto: TCP (6), length: 1500) 208.71.9.128.80 > 208.71.9.102.1277: . 79813:81273(1460) ack 1099 win 64437
23:08:59.366414 00:e0:81:46:25:47 > 00:0f:cb:a4:4a:e0, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 128, id 23547, offset 0, flags [none], proto: TCP (6), length: 1500) 208.71.9.128.80 > 208.71.9.102.1277: . 81273:82733(1460) ack 1099 win 64437
23:08:59.366538 00:e0:81:46:25:47 > 00:0f:cb:a4:4a:e0, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 128, id 49050, offset 0, flags [none], proto: TCP (6), length: 1500) 208.71.9.128.80 > 208.71.9.102.1277: . 82733:84193(1460) ack 1099 win 64437Which leads me to believe that the vlan adds a header that is not removed after the L3 switch is done with the routing. Should'nt the switch remove the Vlan header from the packets before forwarding it elsewhere (after routing them)? I was able to reproduce this issue with all Vlan links I have, but noticed that one of my clients has a Checkpoint firewall at their end and tests from behind the Checkpoint run fine (but a laptop connected in front of the Checkpoint results in the exact same issue) Web Servers <–>L3 switch <--> VLAN <--> Checkpoint...
Are there any known PMTU issues with this version (1.2Beta) of PFS/FreeBSD? We had version 1.01 before and had the same issue. No BlackHole routers in this one, I control the whole network from end to end and all ICMP is allowed. Also, I noticed that a traceroute (TCP) done from the firewall makes it as far as the L3 switch but never reaches the server. The same traceroute works fine from a server or workstation through the firewall (but the icmp traceroute works).
NIC's are Intel, I am bridging the WAN and OPT1 (transparent firewall, no other options used). The Vlan's are fully managed by the switch and not PFS.
Any ideas??
Thanks
-
Can you try upgrading to http://snapshots.pfsense.com/FreeBSD6/RELENG_1_2/updates/pfSense-Full-And-Embedded-Update-1.2-BETA-3.tgz and see if the problem persists?
-
We are running the Live-CD version (like I said, this is a pure firewall, no routing or anything else). We are currently running 1.2B2 dated July 2nd. I’ve found a realease 1.2, dated July 11th, 2007 on your site - I am assiming this is it and will download it and test.
-
OK, we are now running Beta3 - same result.
We also replaced the Layer3 switch with another unit (same model) with the same result.Any other suggestions? I find this one puzzling - Since I can't think of anything else, we are in the process of following each wire and analysing each vlan and router to see if we could find anything (although Spanning Tree is configured everywhere).
-
It appears that something on your network is using a mtu > 1500? If so you will need to find the box and set the mtu back to 1500.
-
Longest packets I see there are 1514, which is the normal 1500 MTU plus the 14 byte Ethernet header.
What it sounds like is happening is the NIC doesn't support VLAN's in hardware, hence it can't deal with the resultant 1518 byte packet after adding the 4 byte 802.1Q tag.
You said Intel NIC, specifically what kind? I believe everything based on the fxp or em drivers should support hardware VLAN tagging.
-
The NIC's are 2Xem and 2Xfxp.
However, I am not using the VLAN's in PFSense (a 3Com switch manages those upstream, and then routes the traffic to the internet or to the PFS if it's going to one of our protected servers).