Intervlan performance slow on my C2758 atom 8 core.



  • So I'm not sure what's up with pfsense.  I have the same hardware as the one in pfsense store, 2nd best (2758 atom 8 core). I'm running 8 total nics with 16 gb RAM.

    The uplink to my switch Cisco 2960x, is a 6 port LACP lagg.  If I'm transferring a file via FTP within same vlan, which bypasses pfsense all together, I'm getting around 95MB/sec or close to 900+mbps.  So nothing wrong with my switch.

    When doing intervlan transfer, obviously going through pfsense, it drops to about 40MB/sec so maybe around 450-500mbps.

    I don't think for this hardware and my setup I would only be transferring half a gigabit through pfsense.  Unless I'm wrong about their hardware they are selling.



  • Sounds like you're only using 1 link of your 6 port LACP lagg for inbound and outbound traffic.



  • Have you looked at your CPU load, like system time, interrupts, etc?



  • low cpu usage, about 1.1 load so 1 core only.



  • I tested lan-wan smb file transfer through the firewall and using snort and et ruleset and saw 105 MB/sec going through. This is withough lagg on the interfaces. Gave 38% load.



  • LACP?  What's your setting on your switch?  Maybe I need to use passive instead of active lacp?



  • So no one else has issues with firewall throughput?



  • You are for sure only using one member of your port channel for transfers between two nodes. Cisco switches don't round-robin across port channels. By default, they hash the source MAC address and come up with a member of the port channel to send the data across. Transfers between two MAC addresses will always go across the same member of a port channel. You can modify this to use combinations of source and destination MAC address or IP address, but the net result is the same - your data is going to use only one link.

    However, that doesn't explain the 50% drop in throughput (unless there is other fairly heavy traffic). You might try running WireShark on one of the transfer hosts and watching the actual traffic. See if anything jumps out at you.



  • Exactly, even if it was using one link, and nothing else is happening on the network, that's equivalent of full duplex 1Gbps (2 Gbps both ways).

    I'm using src dst mac for the algorithm.


  • LAYER 8 Netgate

    Your expectations on switched versus routed performance on a single TCP stream might be too high.



  • So I'm not sure what's up with pfsense.  I have the same hardware as the one in pfsense store,

    To what kind of unit please you are comparing your Board?

    2nd best (2758 atom 8 core). I'm running 8 total nics with 16 gb RAM.

    4 NICs are soldered on the Board and the others what kind of NICs this are?

    The uplink to my switch Cisco 2960x, is a 6 port LACP lagg.

    Why a LAG (LACP) and not a 10 GbE uplink or a SFP+ uplink? Is this a Layer2 or Layer3 Switch?

    If I'm transferring a file via FTP within same vlan, which bypasses pfsense all together, I'm getting around 95MB/sec or close to 900+mbps.  So nothing wrong with my switch.

    How great is this file you are transferring? And how the pfSense will be bypassed in this case?
    Is this a Layer3 Switch that is routing between the VLANs itself or only inside of each VLAN?

    When doing intervlan transfer, obviously going through pfsense, it drops to about 40MB/sec so maybe around 450-500mbps.

    A real good test method will be using iPerf or NetIO from one device acting as a server and another one acting
    as a client, that will be protocol independent and more saying likes your copy or FTP test.

    I don't think for this hardware and my setup I would only be transferring half a gigabit through pfsense.

    Me too, but in many cases the user thinks that he owns a real pfSense bomb and is turning
    many features on, serving also many other things and installs nearly all packets he can find.  ;)
    Ok not really all, but many or much, and if then something is pushed through the pfSense box
    that is not so fast as he was thinking hell is open.

    Unless I'm wrong about their hardware they are selling.

    No and why? They are also selling both types of hardware, the SG-8860 based on the Intel C2758 SoC and
    on top the C2758 1U that must be more matching and according to your SuperMicro board, but one think
    I really guess we all would never get together working like them and this is the pre-tuned and fine tuned
    pfSense version! Some hints and tips would be nice to activate or enable for sure, but like they where doing
    it we all together would not remake or re-configure as I see it right, and this also can´t payed by money too.

    So no one else has issues with firewall throughput?

    But not really with this board as I see it right, it is really powerful in my eyes.
    If your switch will be a Layer3 device you could try out and let him doing the
    routing between the VLANs and inside the VLANs, that would be much more
    fast and with lower latencies. But there fore are existing two different camps
    one is preferring this and the other not.

    • Did you enable the PowerD (hi adaptive) option?
      For the usage of all available CPU frequencies  likes needed
    • Do you use a SSD or mSATA and enable TRIM support?
      Not really urgent but perhaps a nice to do
    • Did you high up the mbufs size to 1.000.000?
      According to your amount of RAM it will be no problem.

    If you are using LACP to handle the LAG, at first one line (cable or port) must be full rendered and then
    the next one will be in usage! There are often many more options that you should try out first before telling
    around that the Supermicro hardware is slow or lame. So you could try out to play around with the settings
    like active - active and active - passive or thinking about to use a static LAG (without LACP) then you might
    be able to set up round robin and active - active so all pipes (LAN ports or cables in usage) will be filled
    all together step by step but not only one.

    The LAG is more to surround or work around a so called bottleneck and this is mostly occurring when many
    users or clients are connecting to one server. It can be nice to have this feature in some rarely cases but often
    a real 10 GbE or SFP+ uplink will be better to work around this.

    Dynamic LAG (LACP) automatic configuration over the LACP
    Balancing over hashing algorithm
    active - active

    Static LAG (No LACP) manual configuration by hand
    Balancing over round robin (only one possibility)
    active - active

    Often changing LAGs or often high up or narrowing down LAGs (adding ports or leaving ports)
    might be better to go with LAG (LACP) but never changing LAGs a static one could be better to
    go with.

    Again if your switch is a Layer3 device and contains one or more SFP+ ports I would changing
    the VLAN routing to him and over a 10 GbE interface you might be having more success and speed.



  • @Derelict:

    Your expectations on switched versus routed performance on a single TCP stream might be too high.

    Might be, but migrating from my old box running Zeroshell, on similar LAGG with LACP, but weaker hardware.  AMD Athlon X2 2.8Ghz Dual core with intel NICs PCIe.  I achieved higher thoughput, not 900+ but around 800mbps or 85MB/sec through the firewall.



  • Zeroshell is based on Linux and more tight and thin it is really near the hardware programmed and
    so some more smooth and liquid running and the hardware, so it could really surely be that Linux
    is under an older hardware more powerful for sure.



  • @BlueKobold:

    To what kind of unit please you are comparing your Board?

    I have the exact same server/MB/cpu as this.  The only improvements are 16GB ram vs 8gb, and dual 120GB SSD for mirror.  I even have the supermicro 4 port gigabit PCIe adapter they sell.

    https://store.pfsense.org/C2758/

    4 NICs are soldered on the Board and the others what kind of NICs this are?

    As stated, the supermicro 4 gigabit PCIe card.

    Why a LAG (LACP) and not a 10 GbE uplink or a SFP+ uplink? Is this a Layer2 or Layer3 Switch?

    Didn't think I need 10GbE uplink for my home network, I just need concurrent gigabit throughput, not more than that, but not half gigabit throughput either.  I have a C2960X in L2 mode.

    How great is this file you are transferring? And how the pfSense will be bypassed in this case?
    Is this a Layer3 Switch that is routing between the VLANs itself or only inside of each VLAN?

    About 7GB.  Within same VLAN ID, the switch will handle the transfer and not send the packets to the trunk LAGG on pfSense.  Switch again is L2 and not routing, pfSense is doing routing between VLANs.

    A real good test method will be using iPerf or NetIO from one device acting as a server and another one acting
    as a client, that will be protocol independent and more saying likes your copy or FTP test.

    Probably, but in actual use case is what matters here, FTP or SMB transfers, I tried on different machines, all result same speed through pfSense.

    Me too, but in many cases the user thinks that he owns a real pfSense bomb and is turning
    many features on, serving also many other things and installs nearly all packets he can find.  ;)
    Ok not really all, but many or much, and if then something is pushed through the pfSense box
    that is not so fast as he was thinking hell is open.

    I don't have anything cpu intensive turned on yet, like snort or squid etc.  It's basic setup just with multiple VLANs and rules between the vlans, that's it.

    No and why? They are also selling both types of hardware, the SG-8860 based on the Intel C2758 SoC and
    on top the C2758 1U that must be more matching and according to your SuperMicro board, but one think
    I really guess we all would never get together working like them and this is the pre-tuned and fine tuned
    pfSense version! Some hints and tips would be nice to activate or enable for sure, but like they where doing
    it we all together would not remake or re-configure as I see it right, and this also can´t payed by money too.

    The hardware is exactly the same as I linked above with several improvements which shouldn't impact NIC performance between VLANs.  What exactly are they doing that is fine tuning that everyone else can't get?  Settings should be able to use for anyone.  This is open source right?

    But not really with this board as I see it right, it is really powerful in my eyes.
    If your switch will be a Layer3 device you could try out and let him doing the
    routing between the VLANs and inside the VLANs, that would be much more
    fast and with lower latencies. But there fore are existing two different camps
    one is preferring this and the other not.

    • Did you enable the PowerD (hi adaptive) option?
      For the usage of all available CPU frequencies  likes needed
    • Do you use a SSD or mSATA and enable TRIM support?
      Not really urgent but perhaps a nice to do
    • Did you high up the mbufs size to 1.000.000?
      According to your amount of RAM it will be no problem.

    If you are using LACP to handle the LAG, at first one line (cable or port) must be full rendered and then
    the next one will be in usage! There are often many more options that you should try out first before telling
    around that the Supermicro hardware is slow or lame. So you could try out to play around with the settings
    like active - active and active - passive or thinking about to use a static LAG (without LACP) then you might
    be able to set up round robin and active - active so all pipes (LAN ports or cables in usage) will be filled
    all together step by step but not only one.

    The LAG is more to surround or work around a so called bottleneck and this is mostly occurring when many
    users or clients are connecting to one server. It can be nice to have this feature in some rarely cases but often
    a real 10 GbE or SFP+ uplink will be better to work around this.

    Dynamic LAG (LACP) automatic configuration over the LACP
    Balancing over hashing algorithm
    active - active

    Static LAG (No LACP) manual configuration by hand
    Balancing over round robin (only one possibility)
    active - active

    Often changing LAGs or often high up or narrowing down LAGs (adding ports or leaving ports)
    might be better to go with LAG (LACP) but never changing LAGs a static one could be better to
    go with.

    Again if your switch is a Layer3 device and contains one or more SFP+ ports I would changing
    the VLAN routing to him and over a 10 GbE interface you might be having more success and speed.

    Yup everything is setup except PowerD, 1000000 mbufs and trim enabled SSD.

    I'm leaning towards the LAGG setup, maybe LACP isn't good but had no issues prior on my old setup with zeroshell.  no 10GbE on my switch, just a base catalyst 2960x model.



  • i've read posts in the past that claimed drastic performance increase when enabling powerD



  • I don't think your port channel is the problem.

    For grins, can you spin up a linux live distro on the hardware and configure it to just route between the VLANs for a comparison test? It would be interesting.

    You might also try installing HyperV on the hardware and running pfsense as a virtual. You'll most likely see worse performance, but you never know…

    I will warn you that setting up free HyperV outside a windows domain is a royal PITA. I can send you docs if you want to try.

    It may be that you just lose that much performance by virtue of all the work that routing takes when you are doing it in software. There is a reason that Cisco can charge $$$ for their layer 3 switches.



  • Yeah it shouldn't be the port channel since the file I'm transferring is hosted on my NAS which has a 2 port LACP to the switch, even if I'm on same vlan, it goes though a port channel.



  • About 7GB.  Within same VLAN ID, the switch will handle the transfer and not send the packets to the trunk LAGG on pfSense.  Switch again is L2 and not routing, pfSense is doing routing between VLANs.

    Why should a 7 GB file running through the firewall? And then you might be thinking about the performance
    or think the LAG is miss matching? I don´t think so.

    Yup everything is setup except PowerD, 1000000 mbufs and trim enabled SSD.

    Ok that would be fine then.

    I'm leaning towards the LAGG setup, maybe LACP isn't good but had no issues prior on my old setup with zeroshell.  no 10GbE on my switch, just a base catalyst 2960x model.

    If have only a smaller and very cheap switch with 2 SFP ports one is connected to the NAS and and one to a
    server and the pfSense firewall will be "only" connected to a 1 GBit/s port, but must on the other side also
    and only routing the WAN - LAN traffic and the Switch is doing the entire LAN routing. If the firewall fails
    at some time the entire LAN traffic will flow without a break.

    The hardware is exactly the same as I linked above with several improvements which shouldn't
    impact NIC performance between VLANs.

    OK

    What exactly are they doing that is fine tuning that everyone else can't get?

    Because they know the hardware that is coming with the pre-installed version of pfSense
    and so they can do some tunings that matches exactly this hardware, to unleash the full
    power the hardware. And yes it is the same version like we both are using, but with some
    tunings because if they sell the hardware they know what is exactly sold. By the community
    version for everyone, no tuning can be done, because the developers are not knowing what
    kind of hardware we are all using or we will use!

    Settings should be able to use for anyone.

    Yes, for sure they are, but I really don´t thing that we all have so much wisdom and deeper knowledge
    about pfSense as the developers will own! And if they know the hardware because they are selling it self
    they can do some more things as we will be able to do. Or how many about such things you will know?

    This is open source right?

    Yes OpenSource for sure, but if you are offering the software only without knowing what kind of hardware
    will be in usage at the endpoint or on the customer site, what you will tune of pre-tune? But if you are selling
    the hardware and the software together and also pre-installed, you will be exactly knowing the hardware basis
    and would be able to pre-install and tune the absolutely identically pfSense community version that we are all
    using, but with the deeper knowledge from the developer site that we all never will have. Not more but also
    not less.

    But back to your problem, what kind of settings you where using in ZeroShell?
    Are these the same one like now? And again a LAG is more for the use case that
    many clients will connect to one other device likes your NAS. Because they will
    be able to render one line completely and the next one will be in usage then.



  • HiAdaptive did not do anything :(

    Zeroshell was the same as what pfSense is doing, LACP with 6 ports.

    I had no issues with zeroshell, ran about 850mbps throughput.  And yes LAG is primarily for multiple users connected to my NAS to transfer files.



  • I think you'd be wise to do a wireshark capture of an FTP session to look for things like retransmissions or tcp zero windows. You might be able to tweak your systems' tcp parameters to get better throughput.

    Ordinarily, I recommend against using jumbo frames on gigabit (and even on 10gb except for iSCSI), but in your case reducing the number of packets that pfsense has to look at might boost your performance.

    Lastly, you might want to consider installing ESX or HyperV (ESX probably wouldn't have drivers for your supermicro NIC) and use pfsense for firewall, and something like zeroshell for intervlan routing.

    Or, buy a layer three switch.



  • HiAdaptive did not do anything

    Oh really sad, in normal it does the following, if the machine gets stressed it uses the full 2,4GHz and
    if less power is used it saves electric power by running the CPU only a sometimes like 60MHz or 800MHz
    like it is needed, and so if this is not enabled it can be that the cpu frequency is only and static running
    at 600MHz or 800MHz and this will then not really unleash or delivers the performance and on top the
    needed throughput, that you will need from time to time!

    Zeroshell was the same as what pfSense is doing, LACP with 6 ports.

    Did you remember the settings like "active - active" or anything else, that you are not really
    using or configuring this time together with pfSense?



  • @BlueKobold:

    HiAdaptive did not do anything

    Oh really sad, in normal it does the following, if the machine gets stressed it uses the full 2,4GHz and
    if less power is used it saves electric power by running the CPU only a sometimes like 60MHz or 800MHz
    like it is needed, and so if this is not enabled it can be that the cpu frequency is only and static running
    at 600MHz or 800MHz and this will then not really unleash or delivers the performance and on top the
    needed throughput, that you will need from time to time!

    Zeroshell was the same as what pfSense is doing, LACP with 6 ports.

    Did you remember the settings like "active - active" or anything else, that you are not really
    using or configuring this time together with pfSense?

    It was just configuring in the interfaces file like any linux distro.

    auto bond0
    iface bond0 inet static
    address 192.168.1.10
    gateway 192.168.1.1
    netmask 255.255.255.0
    bond-mode 4
    bond-miimon 100
    bond-slaves none
    

    Something like that, there's nothing really to state active or passive.

    I do have a layer 3 switch., the Cisco C2960X-48TS-L is not a full L3 switch but has routing capabilities and ACL.

    I just don't know if I can define all the same rules in the switch and also allow certain hosts/networks outbound on pfSense to different gateways (OpenVPN clients)

    From what I've read, you're supposed to use a transit network from the switch to pfsense so pfsense doesn't really know the internal vlans of the switch.  In this case I don't think I can selectively route traffic outbound to different OpenVPN gateways.



  • From what I've read, you're supposed to use a transit network from the switch to pfsense so pfsense doesn't really know the internal vlans of the switch.  In this case I don't think I can selectively route traffic outbound to different OpenVPN gateways.

    You will be able to create a VLAN50 as an example and the Gateway of this VLAN50 will be then the IP address
    from the pfSense box! So you could set up routes to any other VLANs and all would be fine. Thats it.

    Ok perhaps you wont to walk on this way but it is a really fine solution to get all LAN traffic fast routet
    nearly wire speed pending on the power of your switch and the entire LAN will be also alive if the pfSense
    box gets rebooted or is failing.



  • Your C2960X-48TS-L is not a layer 3 switch, it runs the LANBase feature set. No routing possible.

    If your switch is a C2960XR switch, then you are in tall cotton - by all means use it for your inter-VLAN routing and inter-VLAN access control lists.

    Set up your VLANs on your switch and use it to route between them. Create a private IP network between the switch and the pfsense box and make the switch's default route the IP address of pfsense.

    I don't think the 2960XR will originate any routing protocols, so you'll have to create routes on the pfsense box that route your VLAN subnets back to the switch.

    You will be much happier with this setup.


  • LAYER 8 Netgate

    I wish the netgate guys would chime in on threads like this.



  • Sorry didn't reply back for a while on this thread.

    I think I've figured out the issue, maybe not, who knows.

    But basically the LAGG algorithm is sending/receiving the file transfer on the same port on pfSense, so it's doing full duplex transfer.  Now theoretically, the gigabit ethernet can handle 2000mbps total.  But I ran iperf between 2 machines using the simultaneous option, and the max I was about to get was about 450mbps both ways the same time.  So not sure why?  Anyhow  when I transfer a file the other direction, the algorithm uses 2 ports on pfSense, so then I'm getting closer to 1Gb in that direction.

    Either way, I think I will upgrade to 10Gbe with the Chelsio card, that should solve any Gb bottlenecks.



  • But basically the LAGG algorithm is sending/receiving the file transfer on the same port on pfSense, so it's doing full duplex transfer.

    I am not really sure but all depends on the configuration you made! You can also configure that one
    LAN port is "doing" RX and the other is "doing" the TX part! And then you will be getting out;

    • 1 GBit/s > TX
    • 1 GBit/s > RX

    And this might be then even 1 GBit/s and not 2 GBit/s! But for sure the entire LAG (LACP) is building
    a aggregated 2 GBit/s fat pipe!

    Now theoretically, the gigabit ethernet can handle 2000mbps total.

    That is the exactly point where you are failing or made a so called thinking false in my eyes!
    1 GBit/s line (cable) is able to send and receive 1 GBit/s over 4 adders of the cable in each direction
    and this is then 1 GBit/s in each direction and not 2 GBit/s in one direction.

    But I ran iperf between 2 machines using the simultaneous option, and the max I was about to get was about 450mbps both ways the same time.  So not sure why?

    If the technical and theoretical max throughput of a 1 GBit/s line is 125 MBit/s and with your LAG (LACP)
    you will get out then in normal and as a max. 500 MBit/s (4 x 125 MBit/s) but you got 450 MBit/s + the
    TCP/IP overhead that must be count on this on top you will be getting also nearly the macimum, or am I
    wrong with this?

    Anyhow  when I transfer a file the other direction, the algorithm uses 2 ports on pfSense, so then I'm getting closer to 1Gb in that direction.

    Then perhaps the network load you were producing with iPerf was not high enough perhaps I mean?

    Either way, I think I will upgrade to 10Gbe with the Chelsio card, that should solve any Gb bottlenecks.

    It is the best option as today in my eyes!!! The Chelsio card is fully offloading tasks such as VLANs based
    on using an ASIC/FPGA on its NIC and it is better driver supported in pfSense! So you will be able to
    fully unload from your pfSense box many TCP/IP based tasks and on top you will saving ports and
    getting more throughput then now.


Log in to reply