Unable to get 1 Gb NAT throughput with new Jetway NUC build
-
I am a brand new pfSense user converting from an old tomato router. I just upgraded to AT&T Gigapower. Directly hardwired to the AT&T RG (residential gateway) I am able to get 930/930 mbits. However, with my new Jetway pfSense firewall I am only getting 780/780 when hardwired to the LAN port on the Jetway. From all the research I did before purchasing the hardware this hardware should have more than enough horsepower to push 1 Gb NAT.
Today I just built a pfSense firewall with the following hardware and configuration
Jetway HBJC311U93W-2930-B
http://www.amazon.com/gp/product/B00OY8Q0QCKingston HyperX Impact Black 4GB 1600MHz DDR3L CL9 SODIMM 1.35V Laptop Memory
http://www.amazon.com/gp/product/B00KQCOSLYKingston Digital 60GB SSDNow mS200 mSATA (6Gbps) Solid State Drive
http://www.amazon.com/gp/product/B00COFMPAMConfiguration:
2.2.6-RELEASE
No packages installedHere are all the things I have tried to troubleshoot the issue.
-
Swapping ethernet cables
-
Testing with the ethernet cable directly to the AT&T RG and got 930/930 throughput
-
Unchecked Disable hardware TCP segmentation offload (restarted Jetway)
-
Unchecked Disable hardware large receive offload (restarted Jetway)
-
Checked CPU usage during speedtest and only hitting 46%
Please help me to figure out why I am not getting full 1 Gb NAT!
-
-
Jetway HBJC311U93W-2930-B: $208.00 Has 1.83 GHz Celeron, no AES-NI.
RAM: $24
Kingston m-SATA: $42
total: $274.Netgate RCC-DFF 2220: $275 (http://store.netgate.com/ADI/RCC-DFF-2220.aspx)
Internal benchmarks: https://www.reddit.com/r/PFSENSE/comments/3xqhqo/thinking_of_switching_to_pfsense/cy7evhu
I actually run 2.3 on a 4860 at home (1Gbps/1Gbps fiber connection).
using iperf3 to a machine at work (work = pfSense HQ):
[SUM] 0.00-10.00 sec 1.03 GBytes 882 Mbits/sec 4097 sender
[SUM] 0.00-10.00 sec 1.02 GBytes 880 Mbits/sec receiverSee the 4097 retransmissions? We're looking to find/fix that.
-
under your link, another success story was told as a criteria comment at amazon:
And now please watch this part of the comment here and have a look towards to the numbers
@jwt was getting out and compare this please. Together withSquidGuard & Snort he is reporting….
Running an iperf test, I get around 850mbps, again with SquidGuard and Snort running.The 930/930 from yours and the TCP/IP overhead is absolutely 100% of 1 GBit/s that means that is the
whole throughput that will be available on the line. For sure you will never get in real life this numbers!
Please trust me and listen why;- SPI/NAT will "eat" 3% - 5% of your 1 GBit/s
- The overhead is also there please have a look at the 930/930 and how much you must count on this
- Firewall rules will drop this once more again!
- Each installed package will drop it also again!
So a throughput of ~800 - 850 will be a fair number, because the things named above will be dropping
the entire 1 GBit/s step by step down and not up. The only thing you will be able to do is the following;- Enable PowerD (hi adaptive)
- high up the mbuf size
- Enable TRIM support or the SSD/mSATA drive
Please do only a measuring with iPerf or NetIO that we all can confirm or reproduce it again or verify it.
So the 780/780 you got + tuning tips = ~800 - 850 and then on top the 5% for SPI/NAT and the overhead
and all will be fine for you.Netgate RCC-DFF 2220: $275 (http://store.netgate.com/ADI/RCC-DFF-2220.aspx)
1.Then please write also there that this device will be able to push 1 GBit/s, that is only written on
the other SG or RCC boxes, but not for the 2220 variant, so customers might be preventing from
this device related to their 1 GBit/s WAN line.2. We are all only normal users and will never be able to tune or pimp our pfSense like you will be
able to do! Please don´t forget this also please.Good move to 2016 and be healthy back in 2016
-
Just to followup on the above post, the parameters of a test can play a big role in bandwidth calculations. Packet size for example. Ethernet packets have a fixed number of bits of overhead per packet (roughly 12 bytes), there is an interpacket gap of about 12 bytes, the line speed is typically the raw number of bits. So if you are sending a 64byte payload, the actual number of bits on the wire is (ignoring the interpacket gap) 64+12=76 bytes. That's the overhead part of what Frank is talking about.
Just make sure you understand whatever tools you use to measure. Simple ping test of 64 byte packets may give completely different results than something like iperf. Heck ping should be able to send different packet sizes so see what you get by using packets from 64 bytes up to MTU size (maybe 1460 to cover link overhead and avoid fragmentation). Try 64, 128,256, 512, 1024, 1400.
NAT: this actually modifies packets, rewriting address portions of the packet. Depending on what is rewritten CRCs need to be recalculated or external devices may simply drop the packet. Even with hardware assist, this recalculation takes a finite amount of time, so that will automatically drop the throughput.
Physical link: make sure both ends agree on the configuration. Same speed, same duplex. If one side is set to autonegotiate, the other set to a fixed config (say 100M/Full duplex), you can run into problems because the autoneg side may not get the duplex correct. Full/Half duplex mismatch is a killer. output of ifconfig will tell you configuration.
-
@BlueKobold:
Netgate RCC-DFF 2220: $275 (http://store.netgate.com/ADI/RCC-DFF-2220.aspx)
1.Then please write also there that this device will be able to push 1 GBit/s, that is only written on
the other SG or RCC boxes, but not for the 2220 variant, so customers might be preventing from
this device related to their 1 GBit/s WAN line.We do about 1.1Mpps on a 2220 today with netmap-fwd.
A full 1G Ethernet (64 byte IP packets) is 1.488Mpps.
If your packet size averages even 90 bytes, we'll be able to run line rate. Maybe 128 byte averages with NAT.
-
@mer:
Just to followup on the above post, the parameters of a test can play a big role in bandwidth calculations. Packet size for example. Ethernet packets have a fixed number of bits of overhead per packet (roughly 12 bytes), there is an interpacket gap of about 12 bytes, the line speed is typically the raw number of bits. So if you are sending a 64byte payload, the actual number of bits on the wire is (ignoring the interpacket gap) 64+12=76 bytes. That's the overhead part of what Frank is talking about.
Just make sure you understand whatever tools you use to measure. Simple ping test of 64 byte packets may give completely different results than something like iperf. Heck ping should be able to send different packet sizes so see what you get by using packets from 64 bytes up to MTU size (maybe 1460 to cover link overhead and avoid fragmentation). Try 64, 128,256, 512, 1024, 1400.
NAT: this actually modifies packets, rewriting address portions of the packet. Depending on what is rewritten CRCs need to be recalculated or external devices may simply drop the packet. Even with hardware assist, this recalculation takes a finite amount of time, so that will automatically drop the throughput.
Physical link: make sure both ends agree on the configuration. Same speed, same duplex. If one side is set to autonegotiate, the other set to a fixed config (say 100M/Full duplex), you can run into problems because the autoneg side may not get the duplex correct. Full/Half duplex mismatch is a killer. output of ifconfig will tell you configuration.
Actual size of a 64 byte payload Ethernet packet on the wire is 102 bytes. 46 byte payload will generate 76 bytes on the wire (without counting the IFG).
Preamble is 7 bytes (octets, really)
Start of Frame is 1 byte
MAC dest is 6 bytes
Mac source is 6 bytes
Ethertype is 2 bytes
payload (min = 46 bytes, max = 1500 without VLAN tagging.)
FCS is 4 bytes
Inter packet gap is 12 bytes of time (rather than actual bits on the wire)CRC is generated (and checked) by the NIC, but IP / UDP / TCP header checksum(s) typically are not (they can be, depends on the NIC and driver). CRC will not affect throughput, CSUM might.
-
frankly, I'm more impressed with our ability to run 500Mbps on IPSec (AES-GCM) over the same link with a 4860 at one end
and a C2758 at the other. -
With my PFSense box, I get exactly the same iperf through PFSense as directly between my two clients. Client1(10.x.x.x)<->LAN<->PFSense(NAT)<->TestWAN<->Client2(192.168.x.x) No jumbo frames, but are normal Ethernet MTU sized. I think it was about 946Mb/s reported by iperf.
Totally OP build for home. Haswell 3.2ghz and an Intel i350-T2. My two clients capped out around 1.5Gb/s with the iperf bidirectional test, and only 5% cpu usage on PFSense.
-
@jwt:
using iperf3 to a machine at work (work = pfSense HQ):
[SUM] 0.00-10.00 sec 1.03 GBytes 882 Mbits/sec 4097 sender
[SUM] 0.00-10.00 sec 1.02 GBytes 880 Mbits/sec receiverSee the 4097 retransmissions? We're looking to find/fix that.
I am completely new to pfSense and using iperf3. I don't see what the problem is that you are pointing to when saying "See the 4097 retransmissions? We're looking to find/fix that."
Would you please explain what the problem is that you are saying needs to be fixed?
-
@BlueKobold:
The 930/930 from yours and the TCP/IP overhead is absolutely 100% of 1 GBit/s that means that is the
whole throughput that will be available on the line. For sure you will never get in real life this numbers!
Please trust me and listen why;- SPI/NAT will "eat" 3% - 5% of your 1 GBit/s
- The overhead is also there please have a look at the 930/930 and how much you must count on this
- Firewall rules will drop this once more again!
- Each installed package will drop it also again!
So a throughput of ~800 - 850 will be a fair number, because the things named above will be dropping
the entire 1 GBit/s step by step down and not up.I am not running any additional packages like snort so unless something for SPI is included out of the box. I don't think I have SPI running that should not be take a hit on NAT performance.
Also the AT&T RG is not just a modem it is a gateway itself too. The RG also does NAT too and is able to achieve 936/912 directly connected to the RG. So I don't agree with your logic that normal NAT will drop things down to 800-850 to be normal.
I forgot to mention above how I am testing but I am using the att.com/speedtest/ site to test my gigabit connection. I have setup the AT&T RG to do IP-Passthrough mode where it passes it's public IP to pfSense.
@BlueKobold:
The only thing you will be able to do is the following;
- Enable PowerD (hi adaptive)
- high up the mbuf size
- Enable TRIM support or the SSD/mSATA drive
I found and enabled PowerD using hiadaptive and that instantly made thing jump up from 780/780 to 900/920. The weird thing after many tests is upload consistently gets 920, 20 mbits more than download. Also interesting thing is when I am directly connected to the RG the max upload i can get is 912 but via pfSense I can get 920.
I tried increasing mbuf to 131072 but that made performance inconsistent and erratic with tests. It just seemed overall much worse and I removed the setting from System Tunables and performance went back to 900/920.
Is there a way to enable TRIM for the SSD via the GUI or does that have to be done via the command line?
Also any other ideas for how to increase throughput to get the full 936 mbits for the downstream connection?
-
@jwt:
@mer:
Just to followup on the above post, the parameters of a test can play a big role in bandwidth calculations. Packet size for example. Ethernet packets have a fixed number of bits of overhead per packet (roughly 12 bytes), there is an interpacket gap of about 12 bytes, the line speed is typically the raw number of bits. So if you are sending a 64byte payload, the actual number of bits on the wire is (ignoring the interpacket gap) 64+12=76 bytes. That's the overhead part of what Frank is talking about.
Just make sure you understand whatever tools you use to measure. Simple ping test of 64 byte packets may give completely different results than something like iperf. Heck ping should be able to send different packet sizes so see what you get by using packets from 64 bytes up to MTU size (maybe 1460 to cover link overhead and avoid fragmentation). Try 64, 128,256, 512, 1024, 1400.
NAT: this actually modifies packets, rewriting address portions of the packet. Depending on what is rewritten CRCs need to be recalculated or external devices may simply drop the packet. Even with hardware assist, this recalculation takes a finite amount of time, so that will automatically drop the throughput.
Physical link: make sure both ends agree on the configuration. Same speed, same duplex. If one side is set to autonegotiate, the other set to a fixed config (say 100M/Full duplex), you can run into problems because the autoneg side may not get the duplex correct. Full/Half duplex mismatch is a killer. output of ifconfig will tell you configuration.
Actual size of a 64 byte payload Ethernet packet on the wire is 102 bytes. 46 byte payload will generate 76 bytes on the wire (without counting the IFG).
Preamble is 7 bytes (octets, really)
Start of Frame is 1 byte
MAC dest is 6 bytes
Mac source is 6 bytes
Ethertype is 2 bytes
payload (min = 46 bytes, max = 1500 without VLAN tagging.)
FCS is 4 bytes
Inter packet gap is 12 bytes of time (rather than actual bits on the wire)CRC is generated (and checked) by the NIC, but IP / UDP / TCP header checksum(s) typically are not (they can be, depends on the NIC and driver). CRC will not affect throughput, CSUM might.
Thanks for correcting me. I forgot about the SOF and ether header.
-
Also the AT&T RG is not just a modem it is a gateway itself too. The RG also does NAT too and is able to achieve 936/912 directly connected to the RG.
This ISP sponsored boxes often are do their job in silicon (FPGA/ASIC) and pfSense is a pure software firewall
and pending on the double NAT or router cascade you have created it might be not going faster as you want
but more less fast owed to the circumstance of the double NAT situation and 2 * 5% you may loose now.
And so this ISP box might be faster with only doing one times NAT.So I don't agree with your logic that normal NAT will drop things down to 800-850 to be normal.
You might be not agreeing with me but spending 200 € is delevering you also only for 200 € speed!
I found and enabled PowerD using hiadaptive and that instantly made thing jump up from 780/780 to 900/920.
- minus double NAT
- minus overhead
- minus packetfilter (pf) firewall rules
One thing I was forgetting in my first post to count on top of all of this, the pfSense is at this time, surely
not for ever and they are working hard on it, but actual if will be only using one single CPU core at the WAN
interface if PPPoE will done there. This might be also narrow down the entire WAN speed, but as I see it
right your consumer router is not doing and passing any firewall rules, that are also slowing down the
entire WAN throughput you got what you payed for and with 900/920 the full maximum related to your
hardware is given here in this game. So I promise you, you will get lower speed if you turn on more
firewall rules or installing some packets.The weird thing after many tests is upload consistently gets 920, 20 mbits more than download. Also interesting thing is when I am directly connected to the RG the max upload i can get is 912 but via pfSense I can get 920.
Speed test done with iPerf or NetIO would sometimes different from each other, but if you only do a speed test
on some of the Internet based websites you might be also counting in the time that was needed to do so.I tried increasing mbuf to 131072 but that made performance inconsistent and erratic with tests. It just seemed overall much worse and I removed the setting from System Tunables and performance went back to 900/920.
4 COU cores * 2 LAN ports = 8 queues that is not to high and perhaps here in this case it is not really needed
to high up the mbuf size to an equal value.Is there a way to enable TRIM for the SSD via the GUI or does that have to be done via the command line?
You should boot from an usb pen drive into single user mode and then activating TRIM right there
and reboot from the mSATA or SSD. Here is a thread that explanes the entire rest.Also any other ideas for how to increase throughput to get the full 936 mbits for the downstream connection?
Please trust me you will never see this number really, but more pending on the single core cpu usage
then more pointed to other circumstances. if you will have a C2758, D-15x8 or Xeon E3 based pfSense box
and using a static IP at the WAN interface you will be seeing this number with ease, but not with the
hardware ypu are using. Your consumer router is a pure router and pfSense will be a firewall that can
be turned into a full featured UTM device and if you compare this prices on the global market you will
be also knowing that ~200 € is not very much for a 1 GBit/s Internet connection. -
"Also any other ideas for how to increase throughput to get the full 936 mbits for the downstream connection?"
Sure: eliminate the packet loss
Assuming that you're seeing 1538 byte packets onto the wire (1500 + 7 + 1 + 6 + 6 + 2 + 4 + 12)
These are 12304 bits long. (x 8)1,000,000,000/12304 = 81273 packets/second
936,000,000/12304 = 76072 packets /second
920,000,000/12304 74771 packets/secondSo you've got something like 1.7% packet loss along the route, or in the end application (the AT&T Speedtest application), or in the ability of your 82538V NICs, or the driver thereof to actually deal with those packet rates.
The 82538Vs don't support RSS or any hw queues.
The i210/i211/i35x (As used on the C2758, RCC-VE and RCC-DFF), do.BlueKobold assumes (above) that "4 COU cores * 2 LAN ports = 8 queues" but your NICs have one.
(The math is really that you want the queue count to match the core count. it doesn't matter how many NICs you have.) -
it doesn't matter how many NICs you have.)
Happy new year to all!
Yes you are right, at the beginning here in the forum I was read something about that each CPU core
would create one queue for each NIC on the board or inside of pfSense, if this is not so, it is my fault!
Sorry then, about this behavior. -
Well thanks for everyone's help. Its real disappointing to see that this hardware can't push 1 Gb NAT after all and i'll be returning the equipment.
My original intent was to buy something very small, compact and lower power draw but can definitely push 1 Gb NAT. Before I purchased my Jetway I saw the pfSense SG-2220 but that model did not state it would be able to do 1 Gb NAT like other models. Also based upon previous forum posts I found it sounded questionable that it would be able to push 1 Gb NAT based upon people's real world experience.
Does anyone have any recommendations for hardware that would fit that bill?
-
well, Nephi (born of goodly parents?)
As I said, you'll need to eliminate the packet loss.
-
@jwt:
well, Nephi (born of goodly parents?)
As I said, you'll need to eliminate the packet loss.
Yes, I understand which is why I am asking about things from a hardware perspective. You guys know more about that than me. Previously you said.
@jwt:
The 82538Vs don't support RSS or any hw queues.
The i210/i211/i35x (As used on the C2758, RCC-VE and RCC-DFF), do.BlueKobold assumes (above) that "4 COU cores * 2 LAN ports = 8 queues" but your NICs have one.
(The math is really that you want the queue count to match the core count. it doesn't matter how many NICs you have.)I tried out my Jetway build with Sophos UTM and was able to get full 1 Gb NAT performance. So apparently Sophos has better drivers or optimizations to take advantage of the Jetway hardware. However, I like pfSense more from what I have seen so far. That is why I am asking more questions about what pfSense hardware like SG-2220 can handle.
I only have a moderate knowledge of networking experience unlike you guys who are experts. I know the basics of how NAT works and have done plenty of wireshark captures to troubleshoot issues at work. Years ago when I was in college I did tier 2 VPN support for example. So I know enough to get around. But I am completely new to pfSense and especially the ins and outs of network hardware that is anything above regular consumer hardware.
Before all this research I had never known about NIC RSS or AES-NI. But now you guys are helping me out learning and I appreciate that.
So going back to the SG-2220 I noticed today when I looked at the pfSense store product page it now says under the "Best For" section heading "Anyone with High-Speed Gigabit Connections". I am pretty sure it didn't say that a couple weeks ago when I was first researching hardware for a pfSense firewall. I also learned about the Intel Atom Rangeley series which from I have briefly read today is a server series class of Intel Atom processors.
So based upon at least with Sophos on my Jetway I could get full gigabit NAT when testing with speedtest.net and att.com/speedtest, do you think I would I be able to get full gigabit NAT with the SG-2220?
If so, would I be severely performance constrained with the Intel Atom C2338 to add packages later when I want to become more adventurous with pfSense?
Are there other hardware acceleration benefits other than RSS and AES-NI that I would get with the SG-2220?
Thanks in advance!
-
Well thanks for everyone's help.
Happy new year!
Its real disappointing to see that this hardware can't push 1 Gb NAT after all and i'll be returning the equipment.
900/920 MBit/s + overhead + firewall rules + NAT is for me nearly 1 GBit/s, and please don´t forget
it is done with one CPU core only or alone! The N2930 is a 4 core CPU, if you get from your ISP a static public
IP address and don´t need PPPoE, the WAN part will be worked out by all 4 CPU cores and not by only one!
And for sure this will be not the problem from the vendor Jetway or pfSense.Also based upon previous forum posts I found it sounded questionable that it would be able to push 1 Gb
NAT based upon people's real world experience.Are they using PPPoE and will be also using only a single CPU core at the WAN part or did they own
their own static public IP address from their ISP? And what is a real world experience for you?
if I get 900/920 MBit/s with a ~200 € device likes you I would be glad to count on top of this
NAT + overhead + firewall rules and then I am at nearly to above 1 GBit/s. So no problems are
really there as I see it right. If you get 100% of 1 GBit/s throughput, where is the time to perform, NAT,
passing the firewall rules and on top counting the overhead? This is not done in 0.0 seconds by using a
lower end CPU based appliance!!! If you are using an Intel Atom C2000 SoC, Xeon D-15x8 or Xeon E3-1200
based appliance I am on your side and with you, but spending 200 bucks and then starting a thread why not
all is given to you, but offered by pfSense might be another thing only you should think about.You can not buy a small car that is saving fuel and think then why the hell this is not fast as a Porsche Cayenne!
Please have a look at this device here Jetway N2930 it comes with 4 x Intel 211AT LAN ports and is pushing something
around ~950/970 MBit/s, but only pending on the LAN ports and more RAM to high up the mbuf size??? Could this bring
up something more WAN speed?Why not saving money and go with a SG-4860 unit that is capable to deliver this speed?
Together with an pre-tuned ADI Image you would be on the save side as I see it right. -
I know there is some over head and when I say I want to get full 1 gigabit NAT, which by the way I got on the same Jetway with Sophos UTM, I mean getting the full 936/936 directly from the AT&T RG.
if I get 900/920 MBit/s with a ~200 € device likes you I would be glad to count on top of this
NAT + overhead + firewall rules and then I am at nearly to above 1 GBit/sThis is not done in 0.0 seconds by using a
lower end CPU based appliance!!!I never insinuated that I would expect it to take zero time to do NAT processing. However, if my AT&T RG and a Jetway Sophos build can do it, surely it isn't unreasonable to think it isn't possible to do with the same Jetway hardware but with pfSense.
I know it isn't a Porsche, I am not asking to push 1 Gb via VPN.
Also I think you need to calm down some, at the time when I was doing my research it did not seem completely unreasonable for me think the Jetway could do 1 Gb NAT since according to CPU benchmarks the CPU in the Jetway was over 2x more powerful then the Intel Atom CPU in the SG-2220. I saw comments in similar forum posts basically saying "Oh yeah, Intel Celeron and Intel NIC will definitely get you 1 Gb NAT." At the time I didn't know about the hardware accelerated features in the SG-2220.
I just didn't know any better and now I do. So please cut me some slack. I am barely learning about pfSense.
So back to my questions again…
So based upon at least with Sophos on my Jetway I could get full gigabit NAT when testing with speedtest.net and att.com/speedtest, do you think I would I be able to get full gigabit NAT with the SG-2220?
If so, would I be severely performance constrained with the Intel Atom C2338 to add packages later when I want to become more adventurous with pfSense?
Are there other hardware acceleration benefits other than RSS and AES-NI that I would get with the SG-2220?
I am not using PPPoE and I have a dynamic IP from my ISP. But it is basically static since it never changes.
-
You have narrowed down your bottleneck to pfSense. If the difference between 936/936 and 900/920 is a dealbreaker than use Sophos. If your nitpicking over 3% difference you should consider yourself lucky to have such minor problems…