10gigabit routing performance, jumbo frames, intel x710 observations
-
However, there is this information on setting MTU which conflicts with configurations of local vs internet connections?
https://homenetworkgeek.com/mtu-size/ -
Back in the late 90s, IBM was encouraging employees to start working with Linux. I liked Mandrake and got it going on a ThinkPad. One thing I had to do was change a config file, so it would work with token ring. I also used to support a 3270/5250 terminal emulator called "Personal Communications" which could use SNA over TR to talk to the "big iron" computers. Back then IP was just a small part of network traffic, with SNA & NetBIOS also going over the wire.
-
@buggz said in 10gigabit routing performance, jumbo frames, intel x710 observations:
However, there is this information on setting MTU which conflicts with configurations of local vs internet connections?
Some old info in that article. 1500 used to be the maximum MTU on Ethernet, but hasn't been for something around 35 years or so. Frame expansion arrived in the late 90s, to allow for things like VLANs. Later on, MTU was increased considerably to carry more data. As I mentioned, I have a switch that can handle 16K MTU. Also, switches and NICs are more efficient with larger frames, as they process a frame at a time, not a stream of bytes. So, it takes just as much effort to handle a 1500 byte frame as 16KB. This is why data centres often use large MTU. Also, the article mentions errors. Error rate is not the issue it was years ago and if there's a loss it's more likely due to overloading a switch port than signal error. When I first started using the Internet, I had a dial up modem and was supposed to set the MTU to 573, IIRC. This is because phone lines were more likely to cause errors. Another reason for smaller frames on Ethernet was collisions, which are not likely these days. Other networks, such as token ring, had much larger MTU because collisions didn't happen. Of course, IP has been designed from the start to handle varying MTUs and initially used fragmentation to handle the difference, but now relies more on path MTU discovery (PMTUD).
Other than the original Ethernet hardware, there is no hard limit on DIX II Ethernet, but 802.3 has a 1500 limit, because when the Ethertype/length field is 1500 or less, it's 802.3 and frame size. If it's 1536 or above, it's DIX II, which relies entirely on the frame to determine length.
BTW, back in 1989, I hand wired a couple of Ethernet controllers on prototype boards for Data General Eclipse computers. Back then we had some DEC VAX 11/780 computers connected with DECnet over 10base5 "ThinkNet" at work and was working with an engineer to develop the boards so that Eclipse computers could connect to the network.
-
When I see degradation from 9.4Gb/sec to 3Gb/sec for this case (MTU 1500):
iperf3 -c 192.168.60.11 -B 192.168.60.1 -t 30 -RI see an opportunity for improvement. Something significant is happening there, which may be possible to optimize (or it could be an oversight of some sort).
I just tested a vanilla 14.0 BSD build with this case:
At MTU 9000+, performance jumps to 9.9Gb/sec send, and 7.2Gb/sec for recv (-R -P 1). Increasing to -P 4, -P 8, etc does not help with exceeding 7.2Gb, unlike at MTU 1500. Disabling forwarding=0 also doesn't help, so maybe a driver issue (will try different driver later).The "vanilla" BSD install shows the same ~7.2Gb/sec limit at MTU 9000 for -R, but using -P 2 drives the aggregate to the expected 9.8Gb/sec.
Re: loopback, it's an interesting way to show differences in the networking stack.
The Ubuntu MTU on the loopback adapater is 65536.
I increased the MTU on the pfSense loopback interface to 49152 (max). You can drive the throughput above > 10Gb/sec this way, using say -P 2, but it's still lower than other systems.You can also set the MTU to say 1500 on loopback, and it exhibits interesting scaling issues. So you wouldn't need 10G nics to test some of what I've called out here :)
The pfSense lo0 netmask is set to zero, is that expected? It's 255.0.0.0 on other systems I checked.
I've confirmed routing through the pfSense with MTU 9000 works > 9.5Gb/sec (even using iperf3 --bidi).
-
Sure you can improve server like performance. Enable some of the off-loading options. Switch to a different TCP CC algorithm. Increase the TCP window buffers etc. But that may reduce router performance.
-
Okay, however, if you ping an internet url, ggogle.com, can you use the entire MTU of 9k, or do you get packet errors?
Heck, I can't seem to use the default of 1500, sigh...
ping google.com -f -l 1500
Pinging google.com [142.250.189.142] with 1500 bytes of data:
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.Am I missing something?
Hmm, can you set fragemntation?
Though, I don't know if that is something one wants... -
@buggz said in 10gigabit routing performance, jumbo frames, intel x710 observations:
Okay, however, if you ping an internet url, ggogle.com, can you use the entire MTU of 9k, or do you get packet errors?
You should get an ICMP too big message which will include the MTU to use. Failing that, on IPv4, fragmentation will kick in and you won't see any difference as it will take place at some router without any feedback.
-
I've tried several of the CC options, that makes no difference. The offload options also don't materially impact this.
Keep in mind there is zero other traffic and no other disturbance during these operations.I have no rate limiting or traffic shaping configured.
It seems there may be some artificial (?) throttling happening in the networking stack. The behavior of loopback is suspect, as that seems capped at almost exactly 10.0Gb/sec (even -P 2, -P 4 just end up at ~10.0 aggregate).
Anybody aware of something artificially limiting this?As another tidbit, it looks like loop interface can be built with 131072 MTU support, but other parts of the network stack don't allow that to work. (MTU 49152 doesn't exceed 10Gb/sec either).
I also tried a openwrt image on system in identical config, and the same test works as expected. (MTU 9000, --bidi yields to the firewall/host - 9.89Gbp/sec TX, 9.57Gb/sec RX (tested for 30 minutes).
Openwrt also does > 30Gb/sec on the loopback interface. It scales linearly if you add more threads too -P 2 (>60Gb/sec aggregate), -P 4 (>100Gb/sec aggregate). Whereas something in FreeBSD won't allow the aggregate to exceed 10Gb/sec. -
@PixieDust said in 10gigabit routing performance, jumbo frames, intel x710 observations:
As another tidbit, it looks like loop interface can be built with 131072 MTU support, but other parts of the network stack don't allow that to work. (MTU 49152 doesn't exceed 10Gb/sec either).
Everything on the LAN has to support the same MTU. You can't use different MTU unless there's a router in between.
-
@JKnott said in 10gigabit routing performance, jumbo frames, intel x710 observations:
@PixieDust said in 10gigabit routing performance, jumbo frames, intel x710 observations:
As another tidbit, it looks like loop interface can be built with 131072 MTU support, but other parts of the network stack don't allow that to work. (MTU 49152 doesn't exceed 10Gb/sec either).
Everything on the LAN has to support the same MTU. You can't use different MTU unless there's a router in between.
I'm not referring to different network elements having incompatible MTU values.
I'll expand the loopback scenario listed above:
Loopback test
on pfSense node, run test at 48K MTU:
ifconfig lo0 127.0.0.1 netmask 255.0.0.0 mtu 49152
iperf3 -s -D -B 127.0.0.1
iperf3 -c 127.0.0.1 -B 127.0.0.1
Performance appears capped at about 9Gb/sec. Expected?
Same test on Ubuntu 22.04, I see > 30Gb/sec.on pfSense node, run test at 1500B MTU
ifconfig lo0 127.0.0.1 netmask 255.0.0.0 mtu 1500
iperf3 -s -D -B 127.0.0.1
iperf3 -c 127.0.0.1 -B 127.0.0.1
Performance is about 3gb/sec, expected?
Same test on Ubuntu 22.04, I see > 30Gb/sec.You cannot set the loopback (lo0) mtu to 131072, nor 65536.