Why MTU limit of 9000?

EniGmA1987

Obviously have to get the joke out of the way: we must allow pfsense to be over 9000!

Ok but seriously, my Intel network adapters support jumbo frames, but the frame size is 9014. Most of the computers on my network use these adapters and the switch they plug into supports the jumbo frames of 9014 as well (its a 10 gig switch and NICs, the Intel X540-T1's). However I cant set the interface that connects the switch to pfsense as the same, proper 9014 MTU size like everything else supports because the limit is at 9000. So how come it has this limit? And is there anything I can do to raise the limit to support 9014?

awebster

Chipset vendors cap this limit based on the underlying silicon, so 9000 would be the strict minimum to officially support "jumbo" frames. Some switch vendors support >9000 frame sizes because they could be doubly (or more) encapsulated VLANs, or other such tunnelling protocols. Some vendors even support an MTU of 12000, the point at which the CRC32 check value starts to loose effectiveness.

Whatever MTU value you choose, all devices must agree to use the SAME value. You will need to capture traffic to know for sure what size frames they are sending.

https://en.wikipedia.org/wiki/Jumbo_frame

JKnott

9000 is common, but I have seen some gear up to 16000. Regardless, just set the MTU to be 9000 or whatever is appropriate. This can be done in pfSense or on the device when manual configuration is used. That way, even though switches etc. may be capable of more than what pfSense can support, any connected devices will only use the specified MTU.

Harvy66

My understanding is that MTU is the layer 3 max size. The frame size of 9014 loses several bytes to Ethernet frame overhead and the MTU will be below 9000.

This also raises the question about multi-page sizes frames. Pages are 4KiB and a 9000 byte frame will need at least 2 pages. In most situations, jumbo frames are pointless. For SAN like patterns, it can be great, but PPS is no much of a worry anymore and the CPU cost of more packets is typically cheaper then the memory and CPU cost of larger frames.

awebster

Actually, jumbo frames are beneficial as the efficiency of the communication goes up from 94.93% to 99.14% due to less overhead, which is exactly why it is beneficial in SAN environments, and I'd argue any environment where you are moving a lot of data around.

Network cards today read/write directly from/to host memory, so if 2 pages need to be allocated per frame so bet it… RAM is cheap.

JKnott

@Harvy66:

My understanding is that MTU is the layer 3 max size. The frame size of 9014 loses several bytes to Ethernet frame overhead and the MTU will be below 9000.

This also raises the question about multi-page sizes frames. Pages are 4KiB and a 9000 byte frame will need at least 2 pages. In most situations, jumbo frames are pointless. For SAN like patterns, it can be great, but PPS is no much of a worry anymore and the CPU cost of more packets is typically cheaper then the memory and CPU cost of larger frames.

The MTU refers to the Ethernet frame payload. Ethernet headers are in addition to that. Also, jumbo frames are used to improve network efficiency. While more data per header provides a small increase, the real benefit is the CPU power required to handle a frame. Multiple smaller frames will take more CPU than a single large frame. Many data centres run jumbo frames internally.

Incidentally, years ago, it was commonplace to run much larger frames on token ring than on Ethernet. As I recall, I used 4K frames, when I worked at IBM in the late '90s.

awebster

…the real benefit is the CPU power required to handle a frame...

Especially relevant in this day and age of Spectre / Meltdown where CPU context switches are much more expensive, you want those to be as small as possible!

128K MTU…let's go!

JKnott

battlefield

One problem with much larger MTUs is the CRC is not big enough to ensure detection of the frame errors. 9K Ethernet is not far removed from the possible token ring MTUs, which used the same CRC.

Harvy66

@awebster:

Actually, jumbo frames are beneficial as the efficiency of the communication goes up from 94.93% to 99.14% due to less overhead, which is exactly why it is beneficial in SAN environments, and I'd argue any environment where you are moving a lot of data around.

Network cards today read/write directly from/to host memory, so if 2 pages need to be allocated per frame so bet it… RAM is cheap.

RAM is cheap, but memory bandwidth, cache, memory fragmentation, and many other issues are not. I've seen benchmarks where under very high loads, like 100Gb, jumbo frames are quite a bit slower due to memory bandwidth issues. In certain latency sensitive cases, larger frames harm the cache.

And most of the benefit of jumbo frames for SAN is having the packet of data being the same size as a disk block. Which in some implementations will cause extra IO on the SAN device if there is a miss-match.

Larger frames also have longer head-of-queue blocking. If you're saturating your link to the point of congestion, the gain efficiency is less useful than the increased latency and additional bloat.

Harvy66

@JKnott:

@Harvy66:

My understanding is that MTU is the layer 3 max size. The frame size of 9014 loses several bytes to Ethernet frame overhead and the MTU will be below 9000.

This also raises the question about multi-page sizes frames. Pages are 4KiB and a 9000 byte frame will need at least 2 pages. In most situations, jumbo frames are pointless. For SAN like patterns, it can be great, but PPS is no much of a worry anymore and the CPU cost of more packets is typically cheaper then the memory and CPU cost of larger frames.

The MTU refers to the Ethernet frame payload. Ethernet headers are in addition to that. Also, jumbo frames are used to improve network efficiency. While more data per header provides a small increase, the real benefit is the CPU power required to handle a frame. Multiple smaller frames will take more CPU than a single large frame. Many data centres run jumbo frames internally.

Incidentally, years ago, it was commonplace to run much larger frames on token ring than on Ethernet. As I recall, I used 4K frames, when I worked at IBM in the late '90s.

It was common place for larger frames because of old interrupt based technology. New tech allows soft interrupts and interrupt aggregation. CPU time is no longer an issue in most cases. My crappy quad-core Haswell is currently capable of doing line rate gigabit firewall+routing+NAT+Shaping below 20% cpu with pfSense. That was with me sending empty UDP packets. So pretty small packets. A Xeon should be faster yet. And PF in FreeBSD is known to be quite slow compared to more recent firewall designs. Even Netflix is memory bound with octal channel DDR4 and 100Gb NICs trying to stream SSL. Their CPUs are largely idle.

A few years back I was looking into designing a home fileserver and was doing some research on jumbo frames. What I found was jumbo frames are a mostly thing of the past, and most people only use them because of their archaic knowledge of historic problems with networks, not to mention mindlessly regurgitated everywhere as a "best practice". I was mostly reading that jumbo frames cause more harm than good. In some cases it's not clear cut. Micro benchmarks may show increased performance, but real world heavy load shows reduced performance.

If you can a SAN that needs the frames to be the same size as the blocks, definitely a huge win, but mostly because of a poorly designed SAN or special purpose. There is no one correct answer. You need to do your own research and test it.

JKnott

Larger frames also have longer head-of-queue blocking. If you're saturating your link to the point of congestion, the gain efficiency is less useful than the increased latency and additional bloat.

Compare modern gigabit or even 10 gb networks with the 10 Mb, half duplex networks of years ago. Which do you think will have greater blocking? 6x the frame size vs 100x or 1000x the bandwidth!

Harvy66

@JKnott:

Larger frames also have longer head-of-queue blocking. If you're saturating your link to the point of congestion, the gain efficiency is less useful than the increased latency and additional bloat.

Compare modern gigabit or even 10 gb networks with the 10 Mb, half duplex networks of years ago. Which do you think will have greater blocking? 6x the frame size vs 100x or 1000x the bandwidth!

I admit, I was just parroting some of the issues I've heard to some degree. I'm not quite sure why some people are so concerned about head-of-queue blocking on a 10Gb interface, but there are problem domains where it matters. Probably too specialized to matter in this discussion. I probably shouldn't have mentioned it since we're mostly talking about SANs.

JKnott

It was common place for larger frames because of old interrupt based technology.

Also, there were differences between token ring and Ethernet in the access method. With token ring, a NIC could only transmit when it held the token, preventing any chance of collision, but with Ethernet, collisions were to be expected and it became a tradeoff between data retransmission and efficiency, along with blocking in a non-deterministic network. This blocking also resulted in a capture effect, where a device that just successfully transmitted was more likely to win the next transmission attempt. That sort of thing couldn't happen with token ring. It also doesn't happen with Ethernet switches, as collisions no longer occur.

I was mostly reading that jumbo frames cause more harm than good. In some cases it's not clear cut.

So, I guess that's why pretty much all gigibit NICs and even many 100 Mb support jumbo frames and why large data centres use them. With NICs, an interrupt is generated when data has to be transferred to/from memory. The CPU time needed to handle the interrupt does not change with frame size, so fewer larger frames reduce the load on the CPU compared to more smaller frames.

JKnott

I admit, I was just parroting some of the issues I've heard to some degree. I'm not quite sure why some people are so concerned about head-of-queue blocking on a 10Gb interface, but there are problem domains where it matters. Probably too specialized to matter in this discussion. I probably shouldn't have mentioned it since we're mostly talking about SANs.

Blocking tends to be an issue with time sensitive traffic, but really doesn't make much of a difference with things like file transfer or email. On the other hand, it does with things like VoIP, where the delay can be noticeable. In fact, I'll be dealing with that issue next week, at a customer where one user is apparently moving so much data it's interfering with VoIP phones. I'll be working with a 48 port TP-Link switch and probably be configuring that user for a lower priority and perhaps throttling him (his port, not him ;) ) to resolve this issue.

EniGmA1987

Thank you for the great discussion everyone. Lots of good info.