Slow Vlan-Vlan Performance on c2758 (Supermicro 5018A-FTN4)
-
When I test pfsense acting as server, and two clients sending data at the same time using this:
iperf -c 192.168.10.1 -P 64 -i 1 -p 5001 -f M -t 10
Then the result is that both clients each have the following result:
Client #1
[SUM] 0.0-10.2 sec 1153 MBytes 113 MBytes/secClient #2
[SUM] 0.0-10.3 sec 1162 MBytes 113 MBytes/secWhen I run the following, they each only get 56.9 MB/s
iperf -c 192.168.10.1 -P 1 -i 1 -p 5001 -f M -t 10
That tells me that LACP is working because I could saturate the line with -P 64
Why does a single -P stream not saturate the line?
-
It seems to me the lacp is working on the pfsense side.
I do not know juniper, but with a ciso you can only do lacp if the physical interface members have the same configuration. So port speed, duplex, mdix etc.
Perhaps you can check that ?Read this one: http://www.juniper.net/techpubs/en_US/junos15.1/topics/concept/interfaces-hashing-lag-ecmp-understanding.html
Standard hashing is on payload it seems,mall the iperf packets might have the same payload, so they end up on 1 of the members of the link..
Change to level 2 info, your clients have different mac adresses. -
I see more info on hashing here: https://forums.juniper.net/t5/Ethernet-Switching/EX2200-LACP-hashing-algorithm/td-p/107844
I don't see any issue with LACP. I'm expecting it to only do one gigabit links for four separate clients. I think it's strange that I need more than one stream of iperf to saturate the line. In my tests on the management interface, I plugged a linux machine directly into the management port. No switch involved.
Any idea why iperf needs -P 32 or 32 Streams to saturate the line?
Included information about my LACP link below:
root> show interfaces ae0 Physical interface: ae0, Enabled, Physical link is Up Interface index: 128, SNMP ifIndex: 599 Description: pfsense Link-level type: Ethernet, MTU: 1514, Speed: 4Gbps, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled, Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth needed: 0 Device flags : Present Running Interface flags: SNMP-Traps Internal: XXXXXXX Current address: XXXXX, Hardware address: XXXXX Last flapped : 2016-04-14 17:32:53 CDT (20:17:55 ago) Input rate : 113307208 bps (13345 pps) Output rate : 113834880 bps (13366 pps) Logical interface ae0.0 (Index 65) (SNMP ifIndex 603) Flags: SNMP-Traps 0xc0004000 Encapsulation: ENET2 Statistics Packets pps Bytes bps Bundle: Input : 14611 0 916446 0 Output: 2293348 0 245371565 0 Adaptive Statistics: Adaptive Adjusts: 0 Adaptive Scans : 0 Adaptive Updates: 0 Protocol eth-switch Flags: Is-Primary, Trunk-Mode
-
What OS are the client pc's running ?
It might be the iperf clients for the OS version issues.My test from pc with windows 7 to freebsd 10.1 server was 350 MB/sec with one session in iperf.
It also depends on buffer size, packet size tested etc, i think.This was with 10GE link over intel cards.
Wintel combination did not want to go faster than that it seems.Freebsd to freebsd was simply close to line rate with one session.
-
I don't see any issue with LACP. I'm expecting it to only do one gigabit links for four separate clients.
In normal 4 single line will be aggregated to one fat pipe that is then in numbers the 4x (400%) of that single
line as an example here showing then up as 4 GBit/s aggregated.I think it's strange that I need more than one stream of iperf to saturate the line.
How much you will need to saturate one single line?
In my tests on the management interface, I plugged a linux machine directly into the management port. No switch involved.
And no LAG, VLAN and QoS over all or?
Any idea why iperf needs -P 32 or 32 Streams to saturate the line?
Each line has its speed limit but this is mostly also owed to other circumstances besides.
Link-level type: Ethernet, MTU: 1514, Speed: 4Gbps, BPDU Error: None, MAC-REWRITE Error:
1.- What is the MTU size on all devices in that test?
2.- What does you configure the LAG?
– (2 Lines sending and 2 lines receiving or 4 lines sending and receiving)
-- (active / active all lines are in usage or active passive one line is in usage and the rest is as spare for failover)In normal you will have no need for that experiences to go with your set up.
You can do the following things in my eyes.
1.- Setting up a static (manual) LAG and use round robin method and on top 2 line for sending and 2 lines
for receiving by using active / active
2.- You could use your Layer3 switch to route between the VLANs only inside of that switch that will be
more nearly wire speed and the freed capacities from the pfSense box you will be perhaps able to use for
other things, or as a silent reserve. -
How much you will need to saturate one single line?
It looks like "-P 2" will saturate the line, but "-P 1" will not.
In my tests on the management interface, I plugged a linux machine directly into the management port. No switch involved.
And no LAG, VLAN and QoS over all or?
Correct. The management port does not have any LAGG, VLAN or any other tags. Just one computer plugged directly into the pfsense machine.
1.- What is the MTU size on all devices in that test?
2.- What does you configure the LAG?
– (2 Lines sending and 2 lines receiving or 4 lines sending and receiving)
-- (active / active all lines are in usage or active passive one line is in usage and the rest is as spare for failover)To answer #1
MTU on Juniper switch is 1514.
MTU on linux clients are 1500.
MTU on pfsense LAGG is 1500.
MTU on pfsense igb0 / igb1 / igb2 / igb3 are each 1500
Detailed ifconfig is belowTo answer #2
LAGG is configured as LACP over 4 lines. Each of the 4 lines both send and receive. If one line goes down, Juniper will ignore it and then use the remaining acceptable lines. Only one line is necessary to maintain satisfactory connection.ifconfig on pfsense: THIS IS THE LAGG lagg0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 options=400bb <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,jumbo_mtu,vlan_hwcsum,vlan_hwtso>ether XX:XX:XX:XX:XX:XX inet6 XXXXXXXXXXXXX%lagg0 prefixlen 64 scopeid 0xb inet 192.168.10.1 netmask 0xffffff00 broadcast 192.168.10.255 inet 10.10.10.1 netmask 0xffffffff broadcast 10.10.10.1 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect status: active laggproto lacp lagghash l2,l3,l4 laggport: igb0 flags=1c <active,collecting,distributing>laggport: igb1 flags=1c <active,collecting,distributing>laggport: igb2 flags=1c <active,collecting,distributing>laggport: igb3 flags=1c <active,collecting,distributing>THIS IS MANAGEMENT PORT em1: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 options=4009b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,vlan_hwtso>ether XXXXXXXXXXXX inet6 XXXXXXXXXXXX%em1 prefixlen 64 scopeid 0x2 inet 192.168.5.1 netmask 0xffffff00 broadcast 192.168.5.255 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect status: no carrier THIS IS ONE OF THE PORTS INCLUDED IN THE LAGG igb0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 options=400bb <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,jumbo_mtu,vlan_hwcsum,vlan_hwtso>ether XXXXXXXXXXXX nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex>) status: active</full-duplex></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,jumbo_mtu,vlan_hwcsum,vlan_hwtso></up,broadcast,running,simplex,multicast></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,vlan_hwtso></up,broadcast,running,simplex,multicast></active,collecting,distributing></active,collecting,distributing></active,collecting,distributing></active,collecting,distributing></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,jumbo_mtu,vlan_hwcsum,vlan_hwtso></up,broadcast,running,simplex,multicast>
-
PC1 core i5-3470 3.2 GHz /w 4GB or RAM SAMSUNG SSD 830 EVO, OS Windows 10 pro (10.0.10586)
PC2 core i5-2400 3.1 GHz /w 8GB of RAM SAMSUNG SSD 840 EVO , OS Windows 10 Pro (10.0.10586)
A network share was configured on PC1 , the test file:Spartacus Season 1 Episode 1 Past Transgressions.mkv:
file size is 4,583,539 KB (About 4.3 GB)Both PCs are connected to an HP Procurve 2810-24G and I have 4 port LAGG (LACP) going back to a Brocade FastIron 648P. From the Brocade I have a single Gigabit port going to my PfSense Firewall which is using the built in Intel NIC on the motherboard as the LAN port. The LAN port is sub-interfaced with 5 virtual ports.
GbE x4 GbE(LAG)
[PfSense]–-----------[Brocade FastIron 648P]–--------------------[ProCurve]–---------------[PC1]
|–---------[PC2]
PfSense is a core i5-3470 running at 3.2GHz with 4GB of RAM. My current version of PfSense is 2.3 Release 64bit. I have 10 Open VPN tunnels with not much traffic going across them at the moment, and my CPU usually is at 1% from what I can observe. At the time the test is being done the only traffic is YouTube from a Chromecast.Test 1:
PC1 to PC2 on same subnet
Trial 1 took 41.01 Sec to transfer the test file which is indicated above which was calculated to be 873.17 Mbps.Test 2:
PC1 to PC2 on Different subnets
Trial 1 took 45.28 Sec to transfer the test file which is indicated above which was calculated to be 790.83 Mbps.These are the fastest times for each test. I ran 3 trails for each test to try to get a more accurate idea about how your network might perform. I have more data that I hope to publish later today.
-
Thanks. That's interesting. You're not maxing out either.
-
I would say that I'm pretty close and if you look on trial 1 I'm not routing at all and I'm still not getting line rate. I'm pretty sure that has to do with the VLAN tags and also the overhead with TCP.
-
I would say that I'm pretty close and if you look on trial 1 I'm not routing at all and I'm still not getting line rate.
873 MBit/s + TCP overhead + VLAN TAG + QoS + all other running services that narrow down the
entire throughput of your pfSense appliance.I'm pretty sure that has to do with the VLAN tags and also the overhead with TCP.
Each OpenVPN tunnel is taking one core from the CPU or SoC and all other packets are also "eating"
some CPU power as I know it. So what else packets and services you are running on that pfSense machine? -
Final Results:
Test 1 - No routing both machines on same subnet
Time (Seconds) Speed (Mbps)
Pass 1 41.69 858.9325603
Pass 2 80.43 445.2181827
Pass 3 41.01 873.1747973Test 2 - PCs on different subnet PfSense doing the routing across vlans
Time (Seconds) Speed (Mbps)
Pass 1 45.28 790.8325627
Pass 2 45.68 783.907584
Pass 3 55.7 642.8886614Test 3 - Cisco 2821 Router inserted and it is handling the routing between the two subnets
Time (Seconds) Speed (Mbps)
Pass 1 44.36 807.2339594
Pass 2 44.12 811.6250779
Pass 3 44.94 796.8157196Summary - What I did here is take out the high and low of all tests and then compared test 2 and test 3 against test 1 (which is switching performance)
Performance Hit
Test 2 8.73%
Test 3 6.02%Summary :
Switching is faster than routing (duh!), but the Asics in the Cisco Router allow it to perform at nearly the same level as my PfSense Firewall with higher end hardware. From the results here we can see that the Cisco router has about 2% better routing performance which in my mind is well worth the trade-off of what PfSense gives me! I have done nothing in-terms of optimizations which could bring PfSense even closer to my Cisco Router, and like others have stated if I put a NIC with custom silicon the gap may get even closer. The purpose for this test was not to prove one platform is better than another, I always wanted to see something by way of charts with various hardware with some numbers for people to make some decisions for what is best for them.
Lastly , the CPU in my PfSense firewall went from 1-2% load to 10-13% when routing across vlans, which at first scared me because a couple of routing streams going across vlans could be a big hit, so I decided to add simultaneous transfers which did not bring the CPU above the 10% - 13% load (Nice!)