10gigabit routing performance, jumbo frames, intel x710 observations
-
Environment
pfSense 2.7.2
X710-T4L Intel NIC
Port OPT1, set to static IP4 only address 192.168.70.1, which is port 0 on the X710-T4L nic, directly connected via CAT8 network cable to a system running Ubuntu 22.04 LTS.
Ubuntu 22.04, set to static IP4 only address 192.168.70.11, also using Intel X710-T4 NIC.
Both machines are basically idle, with firewall rules configured to allow any traffic, any protocol, any direction. (setting pfSense to router only mode, didn't materially change performance).
sysctl net.inet.tcp.sack.enable=0 appears to help x710
sysctl net.inet.tcp.mssdflt=1460 for MTU 1500, mssdflt=8960 for MTU 9000
A few other tunables to bump maxsockbuf, recvbuf_max, recvspace, sendbuf_inc, sendbuf_max, sendspaceAt MTU 1500, txpause and rxpause appear mandatory with FreeBSD X710 drivers (part of FreeBSD build), else there are significant issues with retransmissions and slow recovery to full throughput:
- On the FreeBSD side: sysctl dev.ixl.0.fc=3
- On the Ubuntu side: ethtool -A enp1s0 rx on tx on
- On the FreeBSD side, dmesg | grep -i ixl0 you should see "Link is up, ... Flow Control: Full"
Testing
On the Ubuntu side, start up iperf3 as daemon: iperf3 -s -D
On the pfSense side, test the send rate. I get 9.4Gb/sec At the console, run:
iperf3 -c 192.168.60.11 -B 192.168.60.1 -t 30On the pfSense side, test the receive rate. I get about 3Gb/sec. At the console run:
iperf3 -c 192.168.60.11 -B 192.168.60.1 -t 30 -RUsing more parallel, threads, I can hit 9.4Gb/sec total:
iperf3 -c 192.168.60.11 -B 192.168.60.1 -t 30 -R -P 8Disabling routing on pfSense, also hit 9.4Gb/sec for receive, with only one thread:
sysctl net.inet.ip.forwarding=0
iperf3 -c 192.168.60.11 -B 192.168.60.1 -t 30 -RIn the case of -R, what is contributing so much to performance degradation, particularly for the case when the firewall itself is the ultimate target?
(I would like to run some code on the firewall that recieves bulk data). Memory and #cores aren't a limiting factors in this case.Jumbo frames
The X710 supports jumbo frames up to ~9700 bytes at 10G.
The pfSense web GUI articially limits the MTU to 9000, even though setting it to 9700 manually via ifconfig works. Bug?
At MTU 9000+, performance jumps to 9.9Gb/sec send, and 7.2Gb/sec for recv (-R -P 1). Increasing to -P 4, -P 8, etc does not help with exceeding 7.2Gb, unlike at MTU 1500. Disabling forwarding=0 also doesn't help, so maybe a driver issue (will try different driver later).Ubuntu 22.04 to Ubuntu 22.04 (boot swap with pfSense, so everything else identical) has no problem hitting 9.9Gb/sec in both directions at MTU 9000+.
Loopback test
on pfSense, run
iperf3 -s -D -B 127.0.0.1
iperf3 -c 127.0.0.1 -B 127.0.0.1
Performance appears capped at about 9Gb/sec. Expected?
Same test on Ubuntu 22.04, I see > 30Gb/sec. -
9.41Gbps is the theoretical limit at 1500B so that is hitting what is effectively 'line rate'.
pfSense is optimised as a router/firewall and not a server. It will always perform worse for connections it has to terminate directly.
I don't think I've ever seen anything using an MTU above 9000. It's likely limited to that to prevent foot shooting that would break compatibility with anything else.
Steve
-
@PixieDust said in 10gigabit routing performance, jumbo frames, intel x710 observations:
The pfSense web GUI articially limits the MTU to 9000, even though setting it to 9700 manually via ifconfig works. Bug?
At MTU 9000+, performance jumps to 9.9Gb/sec send, and 7.2Gb/sec for recv (-R -P 1).9000 MTU is commonly used, though I have a switch that can do 16K. I don't know of any limit in pfSense or FreeBSD. This is an example of why it might be a good idea for the Internet to move to larger MTU. Years ago, token ring networks supported 17914 bytes.
-
@stephenw10 said in 10gigabit routing performance, jumbo frames, intel x710 observations:
I don't think I've ever seen anything using an MTU above 9000.
I guess you've never worked with token ring. That's what we had when I worked at IBM Canada in the late 90s. The next time I was there, in 2005, it was all Ethernet.
-
Ha, indeed. Definitely not in pfSense!
I do have some token ring gear but I don't think I ever got it working.
-
However, there is this information on setting MTU which conflicts with configurations of local vs internet connections?
https://homenetworkgeek.com/mtu-size/ -
Back in the late 90s, IBM was encouraging employees to start working with Linux. I liked Mandrake and got it going on a ThinkPad. One thing I had to do was change a config file, so it would work with token ring. I also used to support a 3270/5250 terminal emulator called "Personal Communications" which could use SNA over TR to talk to the "big iron" computers. Back then IP was just a small part of network traffic, with SNA & NetBIOS also going over the wire.
-
@buggz said in 10gigabit routing performance, jumbo frames, intel x710 observations:
However, there is this information on setting MTU which conflicts with configurations of local vs internet connections?
Some old info in that article. 1500 used to be the maximum MTU on Ethernet, but hasn't been for something around 35 years or so. Frame expansion arrived in the late 90s, to allow for things like VLANs. Later on, MTU was increased considerably to carry more data. As I mentioned, I have a switch that can handle 16K MTU. Also, switches and NICs are more efficient with larger frames, as they process a frame at a time, not a stream of bytes. So, it takes just as much effort to handle a 1500 byte frame as 16KB. This is why data centres often use large MTU. Also, the article mentions errors. Error rate is not the issue it was years ago and if there's a loss it's more likely due to overloading a switch port than signal error. When I first started using the Internet, I had a dial up modem and was supposed to set the MTU to 573, IIRC. This is because phone lines were more likely to cause errors. Another reason for smaller frames on Ethernet was collisions, which are not likely these days. Other networks, such as token ring, had much larger MTU because collisions didn't happen. Of course, IP has been designed from the start to handle varying MTUs and initially used fragmentation to handle the difference, but now relies more on path MTU discovery (PMTUD).
Other than the original Ethernet hardware, there is no hard limit on DIX II Ethernet, but 802.3 has a 1500 limit, because when the Ethertype/length field is 1500 or less, it's 802.3 and frame size. If it's 1536 or above, it's DIX II, which relies entirely on the frame to determine length.
BTW, back in 1989, I hand wired a couple of Ethernet controllers on prototype boards for Data General Eclipse computers. Back then we had some DEC VAX 11/780 computers connected with DECnet over 10base5 "ThinkNet" at work and was working with an engineer to develop the boards so that Eclipse computers could connect to the network.
-
When I see degradation from 9.4Gb/sec to 3Gb/sec for this case (MTU 1500):
iperf3 -c 192.168.60.11 -B 192.168.60.1 -t 30 -RI see an opportunity for improvement. Something significant is happening there, which may be possible to optimize (or it could be an oversight of some sort).
I just tested a vanilla 14.0 BSD build with this case:
At MTU 9000+, performance jumps to 9.9Gb/sec send, and 7.2Gb/sec for recv (-R -P 1). Increasing to -P 4, -P 8, etc does not help with exceeding 7.2Gb, unlike at MTU 1500. Disabling forwarding=0 also doesn't help, so maybe a driver issue (will try different driver later).The "vanilla" BSD install shows the same ~7.2Gb/sec limit at MTU 9000 for -R, but using -P 2 drives the aggregate to the expected 9.8Gb/sec.
Re: loopback, it's an interesting way to show differences in the networking stack.
The Ubuntu MTU on the loopback adapater is 65536.
I increased the MTU on the pfSense loopback interface to 49152 (max). You can drive the throughput above > 10Gb/sec this way, using say -P 2, but it's still lower than other systems.You can also set the MTU to say 1500 on loopback, and it exhibits interesting scaling issues. So you wouldn't need 10G nics to test some of what I've called out here :)
The pfSense lo0 netmask is set to zero, is that expected? It's 255.0.0.0 on other systems I checked.
I've confirmed routing through the pfSense with MTU 9000 works > 9.5Gb/sec (even using iperf3 --bidi).
-
Sure you can improve server like performance. Enable some of the off-loading options. Switch to a different TCP CC algorithm. Increase the TCP window buffers etc. But that may reduce router performance.
-
Okay, however, if you ping an internet url, ggogle.com, can you use the entire MTU of 9k, or do you get packet errors?
Heck, I can't seem to use the default of 1500, sigh...
ping google.com -f -l 1500
Pinging google.com [142.250.189.142] with 1500 bytes of data:
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.Am I missing something?
Hmm, can you set fragemntation?
Though, I don't know if that is something one wants... -
@buggz said in 10gigabit routing performance, jumbo frames, intel x710 observations:
Okay, however, if you ping an internet url, ggogle.com, can you use the entire MTU of 9k, or do you get packet errors?
You should get an ICMP too big message which will include the MTU to use. Failing that, on IPv4, fragmentation will kick in and you won't see any difference as it will take place at some router without any feedback.
-
I've tried several of the CC options, that makes no difference. The offload options also don't materially impact this.
Keep in mind there is zero other traffic and no other disturbance during these operations.I have no rate limiting or traffic shaping configured.
It seems there may be some artificial (?) throttling happening in the networking stack. The behavior of loopback is suspect, as that seems capped at almost exactly 10.0Gb/sec (even -P 2, -P 4 just end up at ~10.0 aggregate).
Anybody aware of something artificially limiting this?As another tidbit, it looks like loop interface can be built with 131072 MTU support, but other parts of the network stack don't allow that to work. (MTU 49152 doesn't exceed 10Gb/sec either).
I also tried a openwrt image on system in identical config, and the same test works as expected. (MTU 9000, --bidi yields to the firewall/host - 9.89Gbp/sec TX, 9.57Gb/sec RX (tested for 30 minutes).
Openwrt also does > 30Gb/sec on the loopback interface. It scales linearly if you add more threads too -P 2 (>60Gb/sec aggregate), -P 4 (>100Gb/sec aggregate). Whereas something in FreeBSD won't allow the aggregate to exceed 10Gb/sec. -
@PixieDust said in 10gigabit routing performance, jumbo frames, intel x710 observations:
As another tidbit, it looks like loop interface can be built with 131072 MTU support, but other parts of the network stack don't allow that to work. (MTU 49152 doesn't exceed 10Gb/sec either).
Everything on the LAN has to support the same MTU. You can't use different MTU unless there's a router in between.
-
@JKnott said in 10gigabit routing performance, jumbo frames, intel x710 observations:
@PixieDust said in 10gigabit routing performance, jumbo frames, intel x710 observations:
As another tidbit, it looks like loop interface can be built with 131072 MTU support, but other parts of the network stack don't allow that to work. (MTU 49152 doesn't exceed 10Gb/sec either).
Everything on the LAN has to support the same MTU. You can't use different MTU unless there's a router in between.
I'm not referring to different network elements having incompatible MTU values.
I'll expand the loopback scenario listed above:
Loopback test
on pfSense node, run test at 48K MTU:
ifconfig lo0 127.0.0.1 netmask 255.0.0.0 mtu 49152
iperf3 -s -D -B 127.0.0.1
iperf3 -c 127.0.0.1 -B 127.0.0.1
Performance appears capped at about 9Gb/sec. Expected?
Same test on Ubuntu 22.04, I see > 30Gb/sec.on pfSense node, run test at 1500B MTU
ifconfig lo0 127.0.0.1 netmask 255.0.0.0 mtu 1500
iperf3 -s -D -B 127.0.0.1
iperf3 -c 127.0.0.1 -B 127.0.0.1
Performance is about 3gb/sec, expected?
Same test on Ubuntu 22.04, I see > 30Gb/sec.You cannot set the loopback (lo0) mtu to 131072, nor 65536.