2.2 on Atom 2750 NAT performance report
-
The reason i'm testing 2.2 is because of throughput limitations on my firewall with 2.1.5 single threaded PF on an SMC Atom 2750 box under a real workload.
Before I deployed the Atom's I tested the NAT throughput using iperf with just a small number of streams and was able to get around 930Mbps on a 1Gbps link which matches what Netgate publishes for their similar firewall box. However when we deployed the box and implemented it under our real workload with many more connections our real throughput dropped to about 520Mbps. 1 CPU of the 8 was at 100% interrupt. Our real workload is a performance test that consists of 200-250 NAT connections.
It was suggested on IRC that I try 2.2 as the person was interested to know what the performance is with multi-threaded PF. I don't recall who that was.
To start I opened up the IGB settings from 1 queue to 4, I'll be moving that to 8 queues for the next test.
kern.ipc.nmbclusters="131072" hw.igb.num_queues=4
And I when we started running the workload we saw a consistent 850mbps outbound 15mbps inbound on a 1Gbps link.
Top looks like this:
last pid: 35129; load averages: 2.53, 2.52, 2.45 up 3+10:49:08 07:05:24 163 processes: 12 running, 106 sleeping, 45 waiting CPU 0: 0.0% user, 0.0% nice, 0.0% system, 62.2% interrupt, 37.8% idle CPU 1: 0.0% user, 0.0% nice, 0.0% system, 59.5% interrupt, 40.5% idle CPU 2: 0.0% user, 0.0% nice, 0.0% system, 66.7% interrupt, 33.3% idle CPU 3: 0.0% user, 0.0% nice, 0.0% system, 62.5% interrupt, 37.5% idle CPU 4: 0.0% user, 0.0% nice, 2.7% system, 2.7% interrupt, 94.6% idle CPU 5: 0.3% user, 0.0% nice, 4.5% system, 2.4% interrupt, 92.8% idle CPU 6: 0.0% user, 0.0% nice, 7.2% system, 1.5% interrupt, 91.3% idle CPU 7: 0.0% user, 0.0% nice, 2.7% system, 2.7% interrupt, 94.6% idle Mem: 16M Active, 77M Inact, 294M Wired, 136M Buf, 7496M Free Swap: 16G Total, 16G Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 155 ki31 0K 128K CPU7 7 82.6H 100.00% idle{idle: cpu7} 11 root 155 ki31 0K 128K CPU5 5 82.6H 100.00% idle{idle: cpu5} 11 root 155 ki31 0K 128K CPU6 6 82.6H 95.26% idle{idle: cpu6} 11 root 155 ki31 0K 128K RUN 4 82.6H 94.68% idle{idle: cpu4} 12 root -92 - 0K 768K WAIT 2 41:18 64.16% intr{irq259: igb0:que} 12 root -92 - 0K 768K CPU0 0 40:37 63.48% intr{irq257: igb0:que} 12 root -92 - 0K 768K CPU3 3 41:32 62.99% intr{irq260: igb0:que} 12 root -92 - 0K 768K CPU1 1 41:45 62.26% intr{irq258: igb0:que} 11 root 155 ki31 0K 128K RUN 0 81.5H 44.38% idle{idle: cpu0} 11 root 155 ki31 0K 128K RUN 1 82.0H 43.46% idle{idle: cpu1} 11 root 155 ki31 0K 128K RUN 2 82.0H 42.68% idle{idle: cpu2} 11 root 155 ki31 0K 128K RUN 3 82.0H 41.46% idle{idle: cpu3} 12 root -72 - 0K 768K WAIT 5 5:06 6.05% intr{swi1: pfsync} 0 root -92 0 0K 416K - 7 4:56 3.47% kernel{igb0 que} 0 root -92 0 0K 416K - 6 4:54 3.47% kernel{igb0 que} 0 root -92 0 0K 416K - 7 4:39 3.17% kernel{igb0 que} 0 root -92 0 0K 416K - 5 4:58 2.98% kernel{igb0 que} 15 root -16 - 0K 16K - 7 3:22 0.68% rand_harvestq 12 root -60 - 0K 768K WAIT 3 29:12 0.00% intr{swi4: clock} 14885 root 20 0 16812K 2304K bpf 7 1:54 0.00% filterlog 12 root -72 - 0K 768K WAIT 4 1:16 0.00% intr{swi1: netisr 4} 5453 root 20 0 14664K 2316K select 5 1:08 0.00% syslogd 5 root -16 - 0K 16K pftm 6 1:04 0.00% pf purge 0 root -16 0 0K 416K swapin 4 0:57 0.00% kernel{swapper}
I'm not sure if PF, or our test rig is leaving the extra 70mbps on the floor. but we are going to be tuning both to try to get it up as high as it will go.
If you have any additional questions, or want me to report other data let me know.
![Screen Shot 2014-11-07 at 10.08.11 AM.png](/public/imported_attachments/1/Screen Shot 2014-11-07 at 10.08.11 AM.png)
![Screen Shot 2014-11-07 at 10.08.11 AM.png_thumb](/public/imported_attachments/1/Screen Shot 2014-11-07 at 10.08.11 AM.png_thumb)
![Screen Shot 2014-11-07 at 10.07.55 AM.png](/public/imported_attachments/1/Screen Shot 2014-11-07 at 10.07.55 AM.png)
![Screen Shot 2014-11-07 at 10.07.55 AM.png_thumb](/public/imported_attachments/1/Screen Shot 2014-11-07 at 10.07.55 AM.png_thumb) -
To start I opened up the IGB settings from 1 queue to 4, I'll be moving that to 8 queues for the next test.
kern.ipc.nmbclusters="131072" hw.igb.num_queues=4
I've seen in another thread that those settings are not necessary in 2.2, because FreeBSD 10 allocates as needed.
Perhaps a test letting FreeBSD allocate dynamically would be worth a try.
-
I ran our testing with 2,4,6,8 and no queue settings. looks like 4 or 6 gives us the best and equal performance.
-
duh, I just realized that my LAN and WAN ports are vlans on the same physical 1g port and that is not ideal when your WAN link is 1gbps. I'll update when I move the lan port to it's own interface.