10 Gbe network with C2758, possible?

trumee

Hello

I have a Supermicro A1SRi-2758F motherboard based system and was wondering if I can upgrade to a 10GBe network on this. Is it motherboard good enough for 10GBe network?

Thanks

edwardwong

You can add a PCI-E card for 10GE network, but the CPU itself is unable to give you 10GE routing speed.

@trumee:

Hello

I have a Supermicro A1SRi-2758F motherboard based system and was wondering if I can upgrade to a 10GBe network on this. Is it motherboard good enough for 10GBe network?

Thanks

Guest

From a 10 GbE line you will be normally seeing or getting 2 GBit/s - 4 GBit/s real or raw throughput
in the real world, for sure by measuring the line you will see other numbers but this is also for the
1 GBit/s Internet speed line.

I have a Supermicro A1SRi-2758F motherboard based system and was wondering if I can upgrade to a 10GBe network on this.

Actual at the WAN interface I would saying no it isn´t able to handle that load. But for the LAN part perhaps
to connect two Switches to that box each over a 10 GbE uplink to that DMZ and LAN switch it would be going
to get more speed then now, but not the fully 10 GbE line speed! And on top I guess it will be the need for a
really good matching and driver supported dual NIC such the Chelsio one from the pfSense store that is able
to offload fully some TCP/IP tasks from the CPU or SoC by using an ASIC/FPGA to push the line speed or
something similar to this. Tasks are likes VLANs, QoS, load balancing, queuing and others.

Is it motherboard good enough for 10GBe network?

Perhaps not yet and together with pfSense, but in the near or away future I would be thinking that
they (pfSense developers) will get it nearly 10 GbE right working together with one or more technicals
or features together such as;

DPDK (enabled software)
is using the API from an Intel network development kit to speed up enormously the Layer3 forwarding part
nepmap-fwd
not from Intel but will be able to push the traffic over the line too, likes using the Intel DPDK
Intel QuickAssist
is using compression and decompression for better and perhaps more data over the same line as now
fast-fwd
I really don´t know anythinkg about and if this might be gone or is actually also in the game like netmap-fwd

At last I am personally thinking you might be having more luck in the LAN area with your doing then the
WAN area, but not with a "normal" or cheap 10 GbE NIC. And the second part is the used switch or the used
switches in that network plan. If you are using a SMB (KMU) switch such the D-Link DGS1510 or Cisco SG500
it would be perhaps also not so really "rocking" as you was imagine before!!! But together with other switches
that are playing in a perhaps higher league you could be getting more luck on the other side based on the
wished throughput. This are normally Switches likes
Netgear M4300 series
Netgear M6100 series
Netgear M7100 series
Netgear M7300 series

They are all using other switch chips and ASICs or FPGAs to deliver other results and numbers.

jwt

With one exception, every sw dev at Netgate now has a pair of XG-2758s on their desk.
Some have larger hardware as well.

Now ask yourself, "Why?" Why would ~~gonzopancho~~jwt put a pair of these on each sw dev's desk?

When we eliminate the kernel bottleneck, we see performance jump. We can achieve over 12Mpps on this platform. Smallest possible IPv4 packet size is 64 bytes (84 with headers, CRC and the IFG (which is time, not data, but still counts)). This equates to 84*8=672 bits. 10,000,000,000 bits/second divided by 672 bits is 14,880,952 packets/Second.

That's as fast as you ever need to go on 10gbps. Most of the time you don't need to go anywhere near this fast. Most packets are (much) longer than 64/84 bytes.

But if you can do 14.880Mpps, you can't get behind, no matter what the traffic is. And, with our DPDk-based router, we can do over 12Mpps on an 8 core C2758, like in the XG-2758 model. If the median packer size is merely 101 bytes (remember, this includes all protocol headers, and all Ethernet overhead), we can route (with a full 560,000 entry bgp table) at line rate on an Xbox-2758.

Obviously on even slightly larger hardware, we're there for line rate 10Gb, and on a broadwell-de, we can run line rate at 40Gbps. And E5 Xeon or really nice 8/10 core i7 will get us to interesting numbers of 10/40g interfaces at line rate.

The 2gbps - 4gbps numbers are accurate for kernel-based networking, and that's using very large packets. 1500 byte payloads. This translates, with all overhead, to 1538 bytes or, with a 802.1q tag, 1542 bytes. This is 12,304 or 12,336 bits. 10,000,000,000 bits/second / 12,304 bits/packet = 812,743 packets/second.

All of this is covered in detail, with numbers for several different platforms, in the papers that George Neville-Neil and I have given over the past two years at several BSD conferences.

If you check those papers, and look at the kernel forwarding PPS rates, you'll see why thr "throughout" is 2-4 gbps.

But, by leveraging DPDK or netmap, we remove the bottleneck. We're really seeing 20X speedups, and we're nearly ready to dog-food the router software internally.

Oh, I didn't mention. That 12Mpps number… It had 50 ACLs configured. :)

Just as obviously, GigE is trivial for this application.

We also have a userspace IPSec stack, and it runs an order of magnitude faster than the in-kernel on in FreeBSD. (We've only measured AES-CBC-126/256 + HMAC-SHA1. A C2758 will do about 2-2.3Gbps. AES-GCM will do about 2X this.

And today, we got Quickassist on the c2758 talking to the IPSec stack. Results soon.

Want to know how you can help? This type of sw development is expensive. Support us. Buy pfsense appliances from the pfSense store. Yes, I know you can assemble a Supermicro for less. Would that still be true if you weren't getting the sw at no cost?

At least buy pfsense gold.

whosmatt

@jwt:

Want to know how you can help? This type of sw development is expensive. Support us. Buy pfsense appliances from the pfSense store. Yes, I know you can assemble a Supermicro for less. Would that still be true if you weren't getting the sw at no cost?

At least buy pfsense gold.

Great post. My company's contribution is running the Netgate pfsense AMI in AWS. We needed an OpenVPN server and pfsense fit the bill, and hopefully the extra cost for us provides development support. I'm pushing to move from VMs to hardware in a couple of locations for performance reasons and the appliances are first on the list.

On a personal note, can I just donate?

jahonix

@jwt:

… we can route at line rate on an Xbox-2758.

;D
I love automatic corrections

Guest

@jahonix:

@jwt:

… we can route at line rate on an Xbox-2758.

;D
I love automatic corrections

If this was an automatic correction!

dreamslacker

@jwt:

The 2gbps - 4gbps numbers are accurate for kernel-based networking, and that's using very large packets. 1500 byte payloads. This translates, with all overhead, to 1538 bytes or, with a 802.1q tag, 1542 bytes. This is 12,304 or 12,336 bits. 10,000,000,000 bits/second / 12,304 bits/packet = 812,743 packets/second.

All of this is covered in detail, with numbers for several different platforms, in the papers that George Neville-Neil and I have given over the past two years at several BSD conferences.

If you check those papers, and look at the kernel forwarding PPS rates, you'll see why thr "throughout" is 2-4 gbps.

But, by leveraging DPDK or netmap, we remove the bottleneck. We're really seeing 20X speedups, and we're nearly ready to dog-food the router software internally.

That's great to know, although 3.0 with Netmap doesn't seem to be coming anytime soon.

Do those figures require the Chelsio adapter with customized offloading?
I'm getting stuck with each NIC only loading a single core as it is and the interrupt queue basically saturates the entire core at 1Gbps.

Guest

That's great to know, although 3.0 with Netmap doesn't seem to be coming anytime soon.

But this will be more or less pending on the version 3.0 it selfs, because it will be fully written new
and this might be also taking his time if it should be smooth and liquid running after the launch.

Do those figures require the Chelsio adapter with customized offloading?

The Chelsio adapters are coming as two different models, but both are supported by an
own ASIC/FPGA chip on the card it self and so they can fully offloading TCP/IP tasks such
as VLAN and QoS workloads directly based on the card and must not touching the entire
pfSense system. Its better in my eyes to have cards like this well supported with drivers
in pfSense.

I'm getting stuck with each NIC only loading a single core as it is and the interrupt
queue basically saturates the entire core at 1 Gbps.

In the version 2.3.1 this could be now working as a so called workaround but in normal
this might be not really true, or am I wrong with this now!?

dreamslacker

@BlueKobold:

I'm getting stuck with each NIC only loading a single core as it is and the interrupt
queue basically saturates the entire core at 1 Gbps.

In the version 2.3.1 this could be now working as a so called workaround but in normal
this might be not really true, or am I wrong with this now!?

Not sure. MSIX is working but for what it's worth, I can't seem to get the IRQs to go beyond a single core per NIC.