Slow Vlan-Vlan Performance on c2758 (Supermicro 5018A-FTN4)

oddworld19

When I test pfsense acting as server, and two clients sending data at the same time using this:

iperf -c 192.168.10.1 -P 64 -i 1 -p 5001 -f M -t 10

Then the result is that both clients each have the following result:

Client #1
[SUM] 0.0-10.2 sec 1153 MBytes 113 MBytes/sec

Client #2
[SUM] 0.0-10.3 sec 1162 MBytes 113 MBytes/sec

When I run the following, they each only get 56.9 MB/s

iperf -c 192.168.10.1 -P 1 -i 1 -p 5001 -f M -t 10

That tells me that LACP is working because I could saturate the line with -P 64

Why does a single -P stream not saturate the line?

Downloadski

It seems to me the lacp is working on the pfsense side.

I do not know juniper, but with a ciso you can only do lacp if the physical interface members have the same configuration. So port speed, duplex, mdix etc.
Perhaps you can check that ?

Read this one: http://www.juniper.net/techpubs/en_US/junos15.1/topics/concept/interfaces-hashing-lag-ecmp-understanding.html

Standard hashing is on payload it seems,mall the iperf packets might have the same payload, so they end up on 1 of the members of the link..
Change to level 2 info, your clients have different mac adresses.

oddworld19

I see more info on hashing here: https://forums.juniper.net/t5/Ethernet-Switching/EX2200-LACP-hashing-algorithm/td-p/107844

I don't see any issue with LACP. I'm expecting it to only do one gigabit links for four separate clients. I think it's strange that I need more than one stream of iperf to saturate the line. In my tests on the management interface, I plugged a linux machine directly into the management port. No switch involved.

Any idea why iperf needs -P 32 or 32 Streams to saturate the line?

Included information about my LACP link below:

root> show interfaces ae0
Physical interface: ae0, Enabled, Physical link is Up
  Interface index: 128, SNMP ifIndex: 599
  Description: pfsense
  Link-level type: Ethernet, MTU: 1514, Speed: 4Gbps, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled,
  Source filtering: Disabled, Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth needed: 0
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: XXXXXXX
  Current address: XXXXX, Hardware address: XXXXX
  Last flapped   : 2016-04-14 17:32:53 CDT (20:17:55 ago)
  Input rate     : 113307208 bps (13345 pps)
  Output rate    : 113834880 bps (13366 pps)

  Logical interface ae0.0 (Index 65) (SNMP ifIndex 603)
    Flags: SNMP-Traps 0xc0004000 Encapsulation: ENET2
    Statistics        Packets        pps         Bytes          bps
    Bundle:
        Input :         14611          0        916446            0
        Output:       2293348          0     245371565            0
    Adaptive Statistics:
        Adaptive Adjusts:          0
        Adaptive Scans  :          0
        Adaptive Updates:          0
    Protocol eth-switch
      Flags: Is-Primary, Trunk-Mode

Downloadski

What OS are the client pc's running ?
It might be the iperf clients for the OS version issues.

My test from pc with windows 7 to freebsd 10.1 server was 350 MB/sec with one session in iperf.
It also depends on buffer size, packet size tested etc, i think.

This was with 10GE link over intel cards.
Wintel combination did not want to go faster than that it seems.

Freebsd to freebsd was simply close to line rate with one session.

I don't see any issue with LACP. I'm expecting it to only do one gigabit links for four separate clients.

In normal 4 single line will be aggregated to one fat pipe that is then in numbers the 4x (400%) of that single
line as an example here showing then up as 4 GBit/s aggregated.

I think it's strange that I need more than one stream of iperf to saturate the line.

How much you will need to saturate one single line?

In my tests on the management interface, I plugged a linux machine directly into the management port. No switch involved.

And no LAG, VLAN and QoS over all or?

Any idea why iperf needs -P 32 or 32 Streams to saturate the line?

Each line has its speed limit but this is mostly also owed to other circumstances besides.

Link-level type: Ethernet, MTU: 1514, Speed: 4Gbps, BPDU Error: None, MAC-REWRITE Error:

1.- What is the MTU size on all devices in that test?
2.- What does you configure the LAG?
– (2 Lines sending and 2 lines receiving or 4 lines sending and receiving)
-- (active / active all lines are in usage or active passive one line is in usage and the rest is as spare for failover)

In normal you will have no need for that experiences to go with your set up.
You can do the following things in my eyes.
1.- Setting up a static (manual) LAG and use round robin method and on top 2 line for sending and 2 lines
for receiving by using active / active
2.- You could use your Layer3 switch to route between the VLANs only inside of that switch that will be
more nearly wire speed and the freed capacities from the pfSense box you will be perhaps able to use for
other things, or as a silent reserve.

oddworld19

How much you will need to saturate one single line?

It looks like "-P 2" will saturate the line, but "-P 1" will not.

In my tests on the management interface, I plugged a linux machine directly into the management port. No switch involved.

And no LAG, VLAN and QoS over all or?

Correct. The management port does not have any LAGG, VLAN or any other tags. Just one computer plugged directly into the pfsense machine.

1.- What is the MTU size on all devices in that test?
2.- What does you configure the LAG?
– (2 Lines sending and 2 lines receiving or 4 lines sending and receiving)
-- (active / active all lines are in usage or active passive one line is in usage and the rest is as spare for failover)

To answer #1
MTU on Juniper switch is 1514.
MTU on linux clients are 1500.
MTU on pfsense LAGG is 1500.
MTU on pfsense igb0 / igb1 / igb2 / igb3 are each 1500
Detailed ifconfig is below

To answer #2
LAGG is configured as LACP over 4 lines. Each of the 4 lines both send and receive. If one line goes down, Juniper will ignore it and then use the remaining acceptable lines. Only one line is necessary to maintain satisfactory connection.

ifconfig on pfsense:

THIS IS THE LAGG
lagg0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
        options=400bb <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,jumbo_mtu,vlan_hwcsum,vlan_hwtso>ether XX:XX:XX:XX:XX:XX
        inet6 XXXXXXXXXXXXX%lagg0 prefixlen 64 scopeid 0xb
        inet 192.168.10.1 netmask 0xffffff00 broadcast 192.168.10.255
        inet 10.10.10.1 netmask 0xffffffff broadcast 10.10.10.1
        nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect
        status: active
        laggproto lacp lagghash l2,l3,l4
        laggport: igb0 flags=1c <active,collecting,distributing>laggport: igb1 flags=1c <active,collecting,distributing>laggport: igb2 flags=1c <active,collecting,distributing>laggport: igb3 flags=1c <active,collecting,distributing>THIS IS MANAGEMENT PORT 
em1: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
        options=4009b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,vlan_hwtso>ether XXXXXXXXXXXX
        inet6 XXXXXXXXXXXX%em1 prefixlen 64 scopeid 0x2
        inet 192.168.5.1 netmask 0xffffff00 broadcast 192.168.5.255
        nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect
        status: no carrier

THIS IS ONE OF THE PORTS INCLUDED IN THE LAGG
igb0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
        options=400bb <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,jumbo_mtu,vlan_hwcsum,vlan_hwtso>ether XXXXXXXXXXXX
        nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active</full-duplex></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,jumbo_mtu,vlan_hwcsum,vlan_hwtso></up,broadcast,running,simplex,multicast></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,vlan_hwtso></up,broadcast,running,simplex,multicast></active,collecting,distributing></active,collecting,distributing></active,collecting,distributing></active,collecting,distributing></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,jumbo_mtu,vlan_hwcsum,vlan_hwtso></up,broadcast,running,simplex,multicast>

mikeisfly

PC1 core i5-3470 3.2 GHz /w 4GB or RAM SAMSUNG SSD 830 EVO, OS Windows 10 pro (10.0.10586)
PC2 core i5-2400 3.1 GHz /w 8GB of RAM SAMSUNG SSD 840 EVO , OS Windows 10 Pro (10.0.10586)
A network share was configured on PC1 , the test file:

Spartacus Season 1 Episode 1 Past Transgressions.mkv:
file size is 4,583,539 KB (About 4.3 GB)

Both PCs are connected to an HP Procurve 2810-24G and I have 4 port LAGG (LACP) going back to a Brocade FastIron 648P. From the Brocade I have a single Gigabit port going to my PfSense Firewall which is using the built in Intel NIC on the motherboard as the LAN port. The LAN port is sub-interfaced with 5 virtual ports.
GbE x4 GbE(LAG)
[PfSense]–-----------[Brocade FastIron 648P]–--------------------[ProCurve]–---------------[PC1]
|–---------[PC2]
PfSense is a core i5-3470 running at 3.2GHz with 4GB of RAM. My current version of PfSense is 2.3 Release 64bit. I have 10 Open VPN tunnels with not much traffic going across them at the moment, and my CPU usually is at 1% from what I can observe. At the time the test is being done the only traffic is YouTube from a Chromecast.

Test 1:
PC1 to PC2 on same subnet
Trial 1 took 41.01 Sec to transfer the test file which is indicated above which was calculated to be 873.17 Mbps.

Test 2:
PC1 to PC2 on Different subnets
Trial 1 took 45.28 Sec to transfer the test file which is indicated above which was calculated to be 790.83 Mbps.

These are the fastest times for each test. I ran 3 trails for each test to try to get a more accurate idea about how your network might perform. I have more data that I hope to publish later today.

oddworld19

Thanks. That's interesting. You're not maxing out either.

mikeisfly

I would say that I'm pretty close and if you look on trial 1 I'm not routing at all and I'm still not getting line rate. I'm pretty sure that has to do with the VLAN tags and also the overhead with TCP.

I would say that I'm pretty close and if you look on trial 1 I'm not routing at all and I'm still not getting line rate.

873 MBit/s + TCP overhead + VLAN TAG + QoS + all other running services that narrow down the
entire throughput of your pfSense appliance.

I'm pretty sure that has to do with the VLAN tags and also the overhead with TCP.

Each OpenVPN tunnel is taking one core from the CPU or SoC and all other packets are also "eating"
some CPU power as I know it. So what else packets and services you are running on that pfSense machine?

mikeisfly

Final Results:

Test 1 - No routing both machines on same subnet
Time (Seconds) Speed (Mbps)
Pass 1 41.69 858.9325603
Pass 2 80.43 445.2181827
Pass 3 41.01 873.1747973

Test 2 - PCs on different subnet PfSense doing the routing across vlans

Time (Seconds) Speed (Mbps)

Pass 1 45.28 790.8325627
Pass 2 45.68 783.907584
Pass 3 55.7 642.8886614

Test 3 - Cisco 2821 Router inserted and it is handling the routing between the two subnets

Time (Seconds) Speed (Mbps)
Pass 1 44.36 807.2339594
Pass 2 44.12 811.6250779
Pass 3 44.94 796.8157196

Summary - What I did here is take out the high and low of all tests and then compared test 2 and test 3 against test 1 (which is switching performance)

Performance Hit
Test 2 8.73%
Test 3 6.02%

Summary :

Switching is faster than routing (duh!), but the Asics in the Cisco Router allow it to perform at nearly the same level as my PfSense Firewall with higher end hardware. From the results here we can see that the Cisco router has about 2% better routing performance which in my mind is well worth the trade-off of what PfSense gives me! I have done nothing in-terms of optimizations which could bring PfSense even closer to my Cisco Router, and like others have stated if I put a NIC with custom silicon the gap may get even closer. The purpose for this test was not to prove one platform is better than another, I always wanted to see something by way of charts with various hardware with some numbers for people to make some decisions for what is best for them.

Lastly , the CPU in my PfSense firewall went from 1-2% load to 10-13% when routing across vlans, which at first scared me because a couple of routing streams going across vlans could be a big hit, so I decided to add simultaneous transfers which did not bring the CPU above the 10% - 13% load (Nice!)