50MB/s between Interfaces (physical or VLAN)



  • Hi all,

    We are on "2.1-BETA0 (i386) built on Tue Sep 18 20:20:31 EDT 2012 FreeBSD 8.3-RELEASE-p4" we are using 2 X on board 'em' Intel NICs and 4 X PCIe igb Intel NIC.

    All is good stability wise, however we have a performance problem that is proving difficult to solve.

    If we route via a VLAN interface, or non VLAN interface we are only getting 50MB/s using netio throughput tests.

    However if we re-run the tests between two servers on the same VLAN - we get 111MB/s which is what I would expect.

    I have tried the

    kern.ipc.nmbclusters="131072"
    hw.igb.num_queues=1

    Tips on the troubleshooting page by editing /boot/loader.conf however this did not appear to have any affect.

    Anyone got any ideas?

    We really need to get the full GbE out of PF otherwise backing up is going to take twice as long…..

    Cheers,

    JD



  • @jdamnation:

    If we route via a VLAN interface, or non VLAN interface we are only getting 50MB/s using netio throughput tests.

    If the two physical interfaces are on the same PCI bus then you are probably bus limited.  The standard PCI bus can transfer a maximum of 133MBps (including overheads).

    @jdamnation:

    However if we re-run the tests between two servers on the same VLAN - we get 111MB/s which is what I would expect.

    This would normally bypass pfSense.



  • Hi WB…

    Bit confused by what you say here - you are saying that on a bus that's limited to 133MB/s I can not achieve more than 50MB/s ?!?!

    This is server is running an Intel 4 X 1GbE which is 8x PCIe which has Dual Intel 82576 dual-port Gigabit Ethernet controllers. PCIe 8x is rated at 20Gb/s or 1.6GB/s in a single direction so I should easily be able to get the full 111MB/s... yes?

    JD



  • @jdamnation:

    Bit confused by what you say here - you are saying that on a bus that's limited to 133MB/s I can not achieve more than 50MB/s ?!?!

    "Standard" PCI runs at 33,000,000 cycles per second. A "standard" transfer requires two bus cycles: an address cycle of 32 bits then a data cycle of up to 4 bytes. Thus gives an upper bound on standard transfers of 33,000,000 cycles/sec * 4 bytes / 2 cycles = 66MB/sec. A burst transfer consists of one address cycle followed by some number of data cycles, but not "too many" data cycles. Any bus user has to relenquish the bus so other users get a chance to use the bus. If your data transfer through the pfSense box comes in on a PCI NIC and out of a PCI NIC on the same bus then every byte through the system is involved in two bus transfers, once into main memory and one out of main memory.

    @jdamnation:

    This is server is running an Intel 4 X 1GbE which is 8x PCIe which has Dual Intel 82576 dual-port Gigabit Ethernet controllers. PCIe 8x is rated at 20Gb/s or 1.6GB/s in a single direction so I should easily be able to get the full 111MB/s… yes?

    A TCP connection can't go any faster than the slowest hop. The fact that one hop can run at 1Gbps is irrelevant.

    Unfortunately you haven't provided enough information to determine how much of the preceding discussion is relevant.
    @jdamnation:

    If we route via a VLAN interface, or non VLAN interface we are only getting 50MB/s using netio throughput tests.

    is not specific enough about the hops in the path used by the throughput test. If the two endpoints are on the PCIe NICs I would expect you should be able to get over 100MBps between them, but if one or both endpoints are on PCI NICs (you haven't given the bus type of the on board NICs you mentioned) then I would expect somewhat less throughput.

    You also haven't given any indication of what CPU your system has or what else it is doing as well as forwarding packets.



  • I suggest you to install and do your speed tests on clean freebsd 8.3.



  • OK - well here some more info!

    PF 2.1 BOX info:

    Motherboard: Supermicro X7SPE-HF-D525
    CPU is Intel® Atom™ D525 processor - dual core with hyperthreads. So in PF Info I see CPU0, CPU1, CPU2, CPU3.
    PCI is PCIe 4x and yes I know the card is a PCI e8x - I have been on to Supermicro about this and they've told me performance would only suffer if all four ports were saturating which is not the case. I am getting 50MB/s on VLAN to VLAN on the same interface.
    NIC:  Intel 4 X 1GbE (8x PCIe) which has Dual Intel 82576 dual-port Gigabit Ethernet

    Two clients are VMs on different VM hosts. Each VM host is connected via 1GbE - so network tests pass out of the VM host, over the switch, route via PF and the to the other VM.

    When both VMs are on the same VLAN - ie not routing via the PF box, but still passing via the switch from one VM host to the other - I get 111MB/s dead on for all NetIO tests.

    When I go via the PF box, I get between 30MB/s to 55MB/s if I'm lucky.

    The CPU on the PF box is not being taxed really max 20% CPU - though I'm not really sure if FreeBSD is doing a good job of multi threading…

    Although this is a 64bit CPU I am using i386 as have had issues before with the AMD 64 version.

    Did I missing something !??!

    JD



  • Thanks for the additional information.

    @jdamnation:

    I am getting 50MB/s on VLAN to VLAN on the same interface.

    I am not familiar with the details of the tests you are running.

    Suppose the test reports 50MBps between the two VMs meaning it is BOTH sending AND receiving at 50MBps in BOTH VMs. Call the VMs VMA and VMB. Then lets count the traffic on the transmit side of the NIC. That will be 50MBps from VMA to VMB AND 50MBps from VMB to VMA. That totals 100MBps which pretty nearly saturates NIC capacity.

    Do you get a different result VLAN to VLAN on DIFFERENT physical interfaces?

    @jdamnation:

    The CPU on the PF box is not being taxed really max 20% CPU - though I'm not really sure if FreeBSD is doing a good job of multi threading…

    Please post the output of pfSense shell command```

    top -S -H -d1



  • An Atom will max out in the neighborhood of 500 Mbps, you're likely just exhausting the capabilities of your CPU. You're basically trying to get high end firewall range throughput through a mid range box, an Atom just not going to cut it if you need gigabit wire speed. It's like buying a commercial firewall box at the low end of mid range that has a max of 500 Mbps throughput and wondering why you can't get more than its capabilities (not to mention an equivalent performance commercial firewall would cost a few times as much as an Atom). Just not a fast enough proc.



  • Just thought I would post up a follow up.

    We ended up replacing the PF firewall for a different make - set up exactly the same way (routing tagged vlans etc).

    The result?

    95MB/s (Bytes, not bits) between interfaces!

    So I'm convinced that there is some sort of issue with Intel NICs, routing via tagged VLANs and PFSense somewhere…..

    JD



  • p.s. when I get the Atom boxes back here (have been under pressure to fix this issue!) I will do some more testing and post them up here.

    I think someone suggested just sticking FBSD clean and re-run the test - which is a very good idea as it cuts PF totally out and enables us to bench the metal.

    So that's what we'll do!

    JD



  • @jdamnation:

    I think someone suggested just sticking FBSD clean and re-run the test - which is a very good idea as it cuts PF totally out and enables us to bench the metal.

    Enabling pf on the traffic cuts down significantly on throughput. You'll be able to route a gigabit through it, but not filter w/pf. Stock FreeBSD won't be any different.


Log in to reply