10gbps performance issue



  • Hi all. Here is the setup:
    linux server [bond0] = 10gb switch = [lagg1] pfsense [lagg0]= ISP
    pfsense uses Qlogic 10Gb NICs:

    $ grep bxe /var/log/dmesg.boot | grep QL
    bxe0: <QLogic NetXtreme II BCM57810 10GbE (B0) BXE v:1.78.90
    bxe1: <QLogic NetXtreme II BCM57810 10GbE (B0) BXE v:1.78.90
    bxe2: <QLogic NetXtreme II BCM57810 10GbE (B0) BXE v:1.78.90
    bxe3: <QLogic NetXtreme II BCM57810 10GbE (B0) BXE v:1.78.90
    

    ISP connection is 1Gbps.

    When LRO is off the links perform as follows:
    linux-pfsense - 2Gbps
    pfsense-ISP - 1Gbps
    linux-ISP - 1Gbps

    (Speed is measured with iperf)

    After switching LRO on (to achieve 10Gbps at the linux-pfsense link), we get:
    linux-pfsense - 10Gbps
    pfsense-ISP - 1Gbps
    linux-ISP - 2Mbps

    I tried to tune pfSense like outlined at https://forum.netgate.com/post/738428 - didn't help at all.

    Please, help me to figure out how to tune pfsense to keep speeds at their highest for all connections.


  • Netgate Administrator

    Those are all download speeds I assume? How are you running iperf?

    What CPU is that box running?

    bxe may have a sysctl to set LRO per interface. I don't have access to anything running it to check.

    Steve



  • Hi @stephenw10 , thanks for a quick reply!

    Iperf is launched pretty simple: iperf -c at the client and iperf -s at pfsense. Tried to manipulate with the tcp window size (-w) - didn't help. Also I tried to specify multiple flows (-P 10) and was able to get ~4.5Gbps. Further increasing the flows amount did not affect the rate.
    pfSense has the following CPU:

    CPU Type 	Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
    Current: 2394 MHz, Max: 2395 MHz
    8 CPUs: 1 package(s) x 4 core(s) x 2 hardware threads
    AES-NI CPU Crypto: Yes (active) 
    

    Setting up LRO for any interface is not an option, since it degrades performance for linux-ISP connection (I gave it a try).
    Interesting, that running iperf (single flow) in the opposite direction (iperf -c at pfsense and iperf -s at linux) shows 10Gbps throughput. So, the issue is just with linux-pfsense direction.

    It's also worth to note, that there're no errors or collisions on any interfaces (linux, pfsense, switch).

    Any ideas are kindly appreciated!


  • Netgate Administrator

    I assumed you enabled LRO globally in that test where you were seeing only 2Mbps from the client to the ISP?

    Where was the iperf server in the tests involving the ISP?

    You can enable/disable LRO per interface using ifconfig as a test. Since that affected the Linux-ISP connection so badly I would assume that was a download test with pfSense receiving on the WAN?

    Steve



  • Not trying to step on the discussion here, but:

    > Iperf is launched pretty simple: iperf -c at the client and iperf -s at pfsense

    If you're actually launching "iperf -s" on your pfSense box, then you're testing speeds TO pfSense not THROUGH pfSense. You probably want to run iperf on the server behind your firewall.

    just my $.02



  • @stephenw10 ,

    1. I tried enabling LRO globally as well as per-interface (ifconfig lagg1 lro; ifconfig lagg0 -lro). 2Mbps rate happened for any variant of enabled LRO
    2. The ISP has its own iperf server - iperf.he.net
    3. iperf -c is generating traffic, so pfSense was receiving traffic on LAN and forwarded it to WAN

    A lot of docs do not recommend to turn LRO on a router (which does sound reasonable), so I'd like to achieve 10Gbps on linux-pfSense link w/o LRO (if that's possible).



  • @divsys , you're absolutely right. I tried running iperf -c against pfsense as well as the ISP's iperf server.


  • Netgate Administrator

    Yes testing directly to or from pfSense is not representative of throughput but it can be useful for pinning down a throttling problem.
    Here you were seeing 10Gb to pfSense and 1Gb from pfSense to the ISP but only 2Mb through both. Which is odd.

    However without LRO you're seeing the full 1Gb from the client to the ISP.

    What actually bandwidth throttli8ng are you seeing there? Between internal interfaces perhaps?

    Steve



  • @stephenw10,

    I guess the bandwidth between linux and pfsense should be 10Gbps without enabling LRO. If I understood your question correctly.


  • Netgate Administrator

    You might expect that but when running as a router/firewall connections are not normally terminated on the firewall. The exception might be if you're running Squid for example.

    Since your WAN is 1Gbps the actual firewall throughput for a connection to/from the internet cannot exceed that. So if you're seeing 2Gbps to the firewall it's not throttling that.

    With LRO disabled you are seeing 1Gbps from a Linux client to your ISP. That's the maximum you can get. SO where are you actually seeing less bandwidth than you expect other than testing to the firewall itself which never normally happens?

    Steve



  • We're in the process of settings things up for a new environment and would like to make sure that they work properly. I agree there're a limited number of tasks when such a high throughput required against pfSense itself, but they exist and we wouldn't like to get into a situation when we'll have to troubleshoot things on the live production system.

    Please, help me to find a reason for 2Gbps rate from a server to pfSense?


  • Netgate Administrator

    What is the CPU usage when you are running that test?

    Try running top -aSH in another console window. Are any CPU threads running at or near 100%?

    Steve



  • Here is the top output during the iperf test (linux is a client, pfsense is a server). I do not see an overload here.

    last pid: 54739;  load averages:  0.27,  0.15,  0.10                                                        up 3+02:29:33  05:12:44
    264 processes: 12 running, 198 sleeping, 54 waiting
    CPU:  1.4% user,  0.0% nice,  9.3% system, 12.6% interrupt, 76.8% idle
    Mem: 67M Active, 680M Inact, 574M Wired, 84M Buf, 10G Free
    Swap: 3881M Total, 3881M Free
    
      PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
       12 root       -92    -     0K   880K CPU1    1   2:44  99.93% [intr{irq273: bxe2:fp01}]
       11 root       155 ki31     0K   128K RUN     7  74.4H  98.14% [idle{idle: cpu7}]
       11 root       155 ki31     0K   128K CPU0    0  74.4H  97.33% [idle{idle: cpu0}]
       11 root       155 ki31     0K   128K CPU6    6  74.4H  95.98% [idle{idle: cpu6}]
       11 root       155 ki31     0K   128K RUN     5  74.4H  86.53% [idle{idle: cpu5}]
    54395 root        84    0 28552K  4508K CPU7    7   0:06  84.71% iperf -s{iperf}
       11 root       155 ki31     0K   128K CPU3    3  74.4H  80.39% [idle{idle: cpu3}]
       11 root       155 ki31     0K   128K RUN     2  74.4H  79.07% [idle{idle: cpu2}]
       11 root       155 ki31     0K   128K CPU4    4  74.4H  72.06% [idle{idle: cpu4}]
    54395 root        20    0 28552K  4508K nanslp  4   0:00   0.74% iperf -s{iperf}
    12120 root        40   20   683M   524M CPU2    2   0:25   0.18% /usr/local/bin/snort -R 41368 -D -q --suppress-config-log -l /var/
       12 root       -60    -     0K   880K WAIT    0   3:30   0.08% [intr{swi4: clock (0)}]
       11 root       155 ki31     0K   128K RUN     1  74.4H   0.07% [idle{idle: cpu1}]
    54739 root        20    0 22116K  4816K CPU5    5   0:00   0.07% top -aSH
    12587 root        40   20 51952K 17220K nanslp  5   0:08   0.03% /usr/local/bin/barnyard2 -r 41368 -f snort_41368_lagg0.u2 --pid-pa
       12 root       -92    -     0K   880K WAIT    0   0:09   0.02% [intr{irq267: bxe1:fp00}]
    

    iperf result:

    [2.4.3-RELEASE][admin@pfSense]/root: iperf -s
    ------------------------------------------------------------
    Server listening on TCP port 5001
    TCP window size:  128 KByte (default)
    ------------------------------------------------------------
    [  4] local 10.10.10.254 port 5001 connected with 10.10.10.20 port 53986
    [ ID] Interval       Transfer     Bandwidth
    [  4]  0.0-10.0 sec  2.52 GBytes  2.16 Gbits/sec
    

    The top output for the opposite direction (linux is a server, pfsense is a client):

    last pid: 21988;  load averages:  0.13,  0.16,  0.10                                                        up 3+02:32:16  05:15:27
    263 processes: 9 running, 199 sleeping, 55 waiting
    CPU:  0.1% user,  0.0% nice,  8.4% system,  8.4% interrupt, 83.0% idle
    Mem: 66M Active, 681M Inact, 575M Wired, 84M Buf, 10G Free
    Swap: 3881M Total, 3881M Free
    
      PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
       11 root       155 ki31     0K   128K CPU7    7  74.4H 100.00% [idle{idle: cpu7}]
       11 root       155 ki31     0K   128K RUN     1  74.4H  99.88% [idle{idle: cpu1}]
       11 root       155 ki31     0K   128K CPU2    2  74.4H  98.56% [idle{idle: cpu2}]
       11 root       155 ki31     0K   128K CPU5    5  74.4H  88.79% [idle{idle: cpu5}]
       11 root       155 ki31     0K   128K CPU3    3  74.4H  85.63% [idle{idle: cpu3}]
       11 root       155 ki31     0K   128K CPU4    4  74.4H  81.41% [idle{idle: cpu4}]
       11 root       155 ki31     0K   128K CPU6    6  74.4H  72.80% [idle{idle: cpu6}]
    21988 root        52    0 26376K  3852K sbwait  2   0:03  68.84% iperf -c 10.10.10.20{iperf}
       12 root       -92    -     0K   880K WAIT    0   2:57  62.64% [intr{irq272: bxe2:fp00}]
       11 root       155 ki31     0K   128K RUN     0  74.4H  37.34% [idle{idle: cpu0}]
        0 root       -92    -     0K   832K -       5   0:00   1.00% [kernel{bxe2_fp0_tq}]
    12120 root        40   20   683M   524M bpf     6   0:25   0.13% /usr/local/bin/snort -R 41368 -D -q --suppress-config-log -l /var/
    54739 root        20    0 22116K  4816K CPU1    1   0:00   0.10% top -aSH
    

    iperf result:

    [2.4.3-RELEASE][admin@pfSense]/root: iperf -c 10.10.10.20
    ------------------------------------------------------------
    Client connecting to 10.10.10.20, TCP port 5001
    TCP window size:  128 KByte (default)
    ------------------------------------------------------------
    [  3] local 10.10.10.254 port 15711 connected with 10.10.10.20 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec  11.0 GBytes  9.41 Gbits/sec
    

    Just in case, here is the local iperf test:

    [2.4.3-RELEASE][admin@pfSense]/root: iperf -c localhost
    ------------------------------------------------------------
    Client connecting to localhost, TCP port 5001
    TCP window size:  144 KByte (default)
    ------------------------------------------------------------
    [  3] local 127.0.0.1 port 13072 connected with 127.0.0.1 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec  18.0 GBytes  15.5 Gbits/sec
    

  • Netgate Administrator

    You have one CPU core running at 100% (~0% idle):

    11 root       155 ki31     0K   128K RUN     1  74.4H   0.07% [idle{idle: cpu1}]  
    

    You probably have (at least) 4 queues per NIC so it would be worth running that test with -P 4 at the client to spread the load better.

    Steve



  • [2.4.3-RELEASE][admin@pfSense]/root: iperf -s
    ------------------------------------------------------------
    Server listening on TCP port 5001
    TCP window size:  128 KByte (default)
    ------------------------------------------------------------
    [  4] local 10.10.10.254 port 5001 connected with 10.10.10.20 port 53996
    [  5] local 10.10.10.254 port 5001 connected with 10.10.10.20 port 53998
    [  6] local 10.10.10.254 port 5001 connected with 10.10.10.20 port 54000
    [  7] local 10.10.10.254 port 5001 connected with 10.10.10.20 port 54002
    [ ID] Interval       Transfer     Bandwidth
    [  4]  0.0-10.0 sec  1.36 GBytes  1.16 Gbits/sec
    [  5]  0.0-10.0 sec  1.34 GBytes  1.15 Gbits/sec
    [  6]  0.0-10.0 sec  1.32 GBytes  1.13 Gbits/sec
    [  7]  0.0-10.0 sec  1.32 GBytes  1.13 Gbits/sec
    [SUM]  0.0-10.0 sec  5.34 GBytes  4.58 Gbits/sec
    
    last pid: 34460;  load averages:  1.15,  0.32,  0.16                                                        up 3+03:52:37  06:35:48
    267 processes: 17 running, 199 sleeping, 51 waiting
    CPU:  5.9% user,  0.0% nice, 30.7% system, 50.0% interrupt, 13.4% idle
    Mem: 67M Active, 683M Inact, 576M Wired, 84M Buf, 10G Free
    Swap: 3881M Total, 3881M Free
    
      PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
       12 root       -92    -     0K   880K CPU0    0   3:08  99.93% [intr{irq272: bxe2:fp00}]
       12 root       -92    -     0K   880K CPU3    3   3:02  99.91% [intr{irq275: bxe2:fp03}]
       12 root       -92    -     0K   880K CPU2    2   3:02  99.88% [intr{irq274: bxe2:fp02}]
       12 root       -92    -     0K   880K CPU1    1   2:55  99.86% [intr{irq273: bxe2:fp01}]
    34352 root        83    0 35080K  6772K CPU6    6   0:06  77.43% iperf -s{iperf}
    34352 root        52    0 35080K  6772K CPU4    4   0:06  76.79% iperf -s{iperf}
    34352 root        52    0 35080K  6772K CPU7    7   0:06  76.59% iperf -s{iperf}
    34352 root        52    0 35080K  6772K CPU6    6   0:06  76.52% iperf -s{iperf}
       11 root       155 ki31     0K   128K RUN     7  75.8H  22.62% [idle{idle: cpu7}]
       11 root       155 ki31     0K   128K RUN     6  75.8H  22.55% [idle{idle: cpu6}]
       11 root       155 ki31     0K   128K RUN     5  75.8H  22.49% [idle{idle: cpu5}]
       11 root       155 ki31     0K   128K RUN     4  75.8H  22.46% [idle{idle: cpu4}]
    34352 root        20    0 35080K  6772K nanslp  6   0:00   2.14% iperf -s{iperf}
       11 root       155 ki31     0K   128K RUN     2  75.8H   0.12% [idle{idle: cpu2}]
       11 root       155 ki31     0K   128K RUN     3  75.8H   0.12% [idle{idle: cpu3}]
       11 root       155 ki31     0K   128K RUN     0  75.7H   0.12% [idle{idle: cpu0}]
       11 root       155 ki31     0K   128K RUN     1  75.8H   0.11% [idle{idle: cpu1}]
    34460 root        20    0 22116K  4820K CPU5    5   0:00   0.10% top -aSH
       12 root       -60    -     0K   880K WAIT    5   3:34   0.09% [intr{swi4: clock (0)}]
    

    Wow.. This does seem as a CPU limit.. Wondering why linux box's CPU (Intel(R) Xeon(R) CPU X5670 @ 2.93GHz) "eats" 10Gbps w/o issues:

    top - 06:45:18 up 4 days, 21:26,  3 users,  load average: 0.09, 0.03, 0.01
    Threads: 426 total,   2 running, 424 sleeping,   0 stopped,   0 zombie
    %Cpu0  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu1  :  0.0 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
    %Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu5  :  0.0 us,  1.3 sy,  0.0 ni, 97.3 id,  0.0 wa,  0.0 hi,  1.3 si,  0.0 st
    %Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu8  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu9  :  0.3 us, 55.9 sy,  0.0 ni, 43.4 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
    %Cpu10 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu13 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu14 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu15 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu16 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu17 :  0.3 us,  0.0 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
    %Cpu18 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu19 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu20 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu21 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu22 :  0.0 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
    %Cpu23 :  0.0 us,  2.2 sy,  0.0 ni, 97.1 id,  0.0 wa,  0.0 hi,  0.7 si,  0.0 st
    KiB Mem : 65965828 total, 63945296 free,   394004 used,  1626528 buff/cache
    KiB Swap: 67096572 total, 67096572 free,        0 used. 65039256 avail Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
    20962 myuser    20   0  236696   2208   1924 R 60.6  0.0   0:01.83 iperf -s
      125 root      20   0       0      0      0 S  1.0  0.0   0:00.87 [ksoftirqd/23]
       34 root      20   0       0      0      0 S  0.7  0.0   0:00.26 [ksoftirqd/5]
       14 root      20   0       0      0      0 S  0.3  0.0   0:00.29 [ksoftirqd/1]
       55 root      20   0       0      0      0 S  0.3  0.0   0:00.36 [ksoftirqd/9]
    16776 root      20   0       0      0      0 S  0.3  0.0   0:03.47 [kworker/9:1]
    
    [2.4.3-RELEASE][admin@pfSense]/root: iperf -c 10.10.10.20
    ------------------------------------------------------------
    Client connecting to 10.10.10.20, TCP port 5001
    TCP window size:  128 KByte (default)
    ------------------------------------------------------------
    [  3] local 10.10.10.254 port 52919 connected with 10.10.10.20 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec  10.8 GBytes  9.26 Gbits/sec
    

    Any ideas?



  • What are the NICs in the Linux box? If the NICs are the same, most likely the drivers are different.

    Also try:

    • disabling Hyperthreading on the pfsense box.
    • disabling flow-controll everywhere including the switch
    • increasing the interrupt max rate of interrupts

    To be honest it's a bit pointless to test througput. Better test PPS through the pfsense box. This will expose your PPS limit based on the CPU/NIC/Settings/Firewall Configuration combination.

    You can check with netstat -ihw 1 where the drop happens.


  • Netgate Administrator

    The load due to pf shows in those values in pfSense. Is the Linux box running any sort of firewall?

    Try disabling pf temporarily as a test.

    But this is still not a test of the firewall throughput. I'm still unsure what you're trying to achieve here. Your WAN is 1Gbps and you are able to see that fully from a client behind pfSense. If you want to test more than that use a 10Gbps WAN to see what it can pass.

    Steve



  • @stephenw10 with pf disabled it shows slightly better perfromance:

    pf disabled
    [2.4.3-RELEASE][admin@pfSense]/root:
    [2.4.3-RELEASE][admin@pfSense]/root: iperf -s
    ------------------------------------------------------------
    Server listening on TCP port 5001
    TCP window size:  128 KByte (default)
    ------------------------------------------------------------
    [  4] local 10.10.10.254 port 5001 connected with 10.10.10.20 port 54942
    [ ID] Interval       Transfer     Bandwidth
    [  4]  0.0-10.0 sec  3.91 GBytes  3.35 Gbits/sec
    

    4 flows test gives:

    [SUM]  0.0-10.0 sec  8.03 GBytes  6.88 Gbits/sec
    

    I don't think the ISP can provide us 10Gbps link at the moment. But later it's possible. And it wouldn't be great to face such an issue when all systems are in production.

    I'm trying to figure out why the speed is not the expected one. The next steps in the list is to disable hyperthreading and upgrade the CPU. I'll post the results here.
    Anyway, if you have any other ideas why the CPU is so slow comparing to the linux box, I'd be more than happy to check them.



  • I believe the devs should remove iperf from base installs....
    These iperf threads keep popping up every month & conclusion is always the same:
    Don't run iperf on pfsense

    The only way to measure throughput is like this:
    (Iperf-server)----(pfsense)----(iperf-client)
    All other measurements are pointless and inaccurate.



  • @heper hope devs would not follow your suggestion. it's like "we've got a headache. let's cut the head out". very wise.


  • Netgate Administrator

    I don't think iperf will be removed any time soon.

    But I agree with heper, what you're testing is not anything that can ever happen in normal use.

    It can be useful to run iperf on the firewall to test a single interface at a time if you are seeing very bad throughput testing through the firewall.

    You have two 10GbE interfaces there. Just setup another device connected to another interfaces and run an iperf server on that. Then test to it from the client on another interface.

    Steve


  • Rebel Alliance Global Moderator

    @heper said in 10gbps performance issue:

    I believe the devs should remove iperf from base installs…

    Its not part of base install? If it is what is the point of the iperf package? Are you suggesting that the package to install iperf be removed as an option?



  • I see no point in having it available on pfsense.
    Time and time again, it's used to reach the wrong conclusions anyways.


  • Rebel Alliance Global Moderator

    @heper said in 10gbps performance issue:

    Time and time again, it’s used to reach the wrong conclusions anyways.

    Will not disagree with you there.. But there are use cases when you understand that you might not see full speed on your interface using the tool. So for those people that don't or won't draw those conclusions when they understand the point of router is to route not as an end point device for such a tool.

    So not sure agree with removal... Removal will just have the users asking how to install it from the freebsd ports/packages even if not part of the pfsense repository.


  • Netgate Administrator

    I personally would not want to see either the package removed or iperf3 removed from our repo. I regularly use those for testing. There are many legitimate use cases.
    Often I use another pfSense box as a client/server since most of my test network is pfSense boxes for example.

    Steve


  • Netgate

    Removing access to a tool that can be misused by some while being massively-useful to others sort of reeks of the "thinking" behind 🔫 control. pkg add iperf3 please.

    (wth we still have a real gun emoji. someone's slacking.)



  • @stephenw10 we've finally replaced the CPU to Xeon X5560, but the issue is still in place. Here are the latest measurements:
    Single flow:

    [2.4.3-RELEASE][admin@pfSense]/root: iperf3 -s
    -----------------------------------------------------------
    Server listening on 5201
    -----------------------------------------------------------
    Accepted connection from 10.10.10.20, port 40256
    [  5] local 10.10.10.254 port 5201 connected to 10.10.10.20 port 40258
    [ ID] Interval           Transfer     Bitrate
    [  5]   0.00-1.00   sec   150 MBytes  1.26 Gbits/sec
    [  5]   1.00-2.00   sec   219 MBytes  1.83 Gbits/sec
    [  5]   2.00-3.00   sec   227 MBytes  1.90 Gbits/sec
    [  5]   3.00-4.00   sec   258 MBytes  2.16 Gbits/sec
    [  5]   4.00-5.00   sec   298 MBytes  2.50 Gbits/sec
    [  5]   5.00-6.00   sec   298 MBytes  2.50 Gbits/sec
    [  5]   6.00-7.00   sec   298 MBytes  2.50 Gbits/sec
    [  5]   7.00-8.00   sec   298 MBytes  2.50 Gbits/sec
    [  5]   8.00-9.00   sec   298 MBytes  2.50 Gbits/sec
    [  5]   9.00-10.00  sec   299 MBytes  2.51 Gbits/sec
    [  5]  10.00-10.01  sec  1.99 MBytes  2.48 Gbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate
    [  5]   0.00-10.01  sec  2.58 GBytes  2.22 Gbits/sec                  receiver
    

    4 flows (-P 4):

    [2.4.3-RELEASE][admin@pfSense]/root: iperf3 -s
    -----------------------------------------------------------
    Server listening on 5201
    -----------------------------------------------------------
    Accepted connection from 10.10.10.20, port 40426
    [  5] local 10.10.10.254 port 5201 connected to 10.10.10.20 port 40428
    [  8] local 10.10.10.254 port 5201 connected to 10.10.10.20 port 40430
    [ 10] local 10.10.10.254 port 5201 connected to 10.10.10.20 port 40432
    [ 12] local 10.10.10.254 port 5201 connected to 10.10.10.20 port 40434
    [ ID] Interval           Transfer     Bitrate
    [  5]   0.00-1.00   sec  45.7 MBytes   383 Mbits/sec
    [  8]   0.00-1.00   sec  48.9 MBytes   410 Mbits/sec
    [ 10]   0.00-1.00   sec  40.2 MBytes   337 Mbits/sec
    [ 12]   0.00-1.00   sec  47.4 MBytes   397 Mbits/sec
    [SUM]   0.00-1.00   sec   182 MBytes  1.53 Gbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [  5]   1.00-2.00   sec  46.3 MBytes   389 Mbits/sec
    [  8]   1.00-2.00   sec   108 MBytes   909 Mbits/sec
    [ 10]   1.00-2.00   sec  49.1 MBytes   412 Mbits/sec
    [ 12]   1.00-2.00   sec  38.7 MBytes   325 Mbits/sec
    [SUM]   1.00-2.00   sec   243 MBytes  2.03 Gbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [  5]   2.00-3.00   sec  46.9 MBytes   394 Mbits/sec
    [  8]   2.00-3.00   sec   108 MBytes   907 Mbits/sec
    [ 10]   2.00-3.00   sec  36.6 MBytes   307 Mbits/sec
    [ 12]   2.00-3.00   sec  25.9 MBytes   217 Mbits/sec
    [SUM]   2.00-3.00   sec   218 MBytes  1.83 Gbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [  5]   3.00-4.00   sec  58.5 MBytes   491 Mbits/sec
    [  8]   3.00-4.00   sec  94.0 MBytes   788 Mbits/sec
    [ 10]   3.00-4.00   sec  44.5 MBytes   374 Mbits/sec
    [ 12]   3.00-4.00   sec  37.4 MBytes   314 Mbits/sec
    [SUM]   3.00-4.00   sec   234 MBytes  1.97 Gbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [  5]   4.00-5.00   sec  56.7 MBytes   475 Mbits/sec
    [  8]   4.00-5.00   sec  79.0 MBytes   663 Mbits/sec
    [ 10]   4.00-5.00   sec  44.4 MBytes   372 Mbits/sec
    [ 12]   4.00-5.00   sec  38.5 MBytes   323 Mbits/sec
    [SUM]   4.00-5.00   sec   219 MBytes  1.83 Gbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [  5]   5.00-6.00   sec  61.9 MBytes   520 Mbits/sec
    [  8]   5.00-6.00   sec  70.0 MBytes   587 Mbits/sec
    [ 10]   5.00-6.00   sec  48.5 MBytes   407 Mbits/sec
    [ 12]   5.00-6.00   sec  42.3 MBytes   354 Mbits/sec
    [SUM]   5.00-6.00   sec   223 MBytes  1.87 Gbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [  5]   6.00-7.00   sec  68.5 MBytes   575 Mbits/sec
    [  8]   6.00-7.00   sec  54.1 MBytes   454 Mbits/sec
    [ 10]   6.00-7.00   sec  54.6 MBytes   458 Mbits/sec
    [ 12]   6.00-7.00   sec  47.7 MBytes   400 Mbits/sec
    [SUM]   6.00-7.00   sec   225 MBytes  1.89 Gbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [  5]   7.00-8.00   sec  65.1 MBytes   546 Mbits/sec
    [  8]   7.00-8.00   sec  55.4 MBytes   464 Mbits/sec
    [ 10]   7.00-8.00   sec  49.2 MBytes   413 Mbits/sec
    [ 12]   7.00-8.00   sec  49.9 MBytes   419 Mbits/sec
    [SUM]   7.00-8.00   sec   220 MBytes  1.84 Gbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [  5]   8.00-9.00   sec  67.0 MBytes   562 Mbits/sec
    [  8]   8.00-9.00   sec  51.9 MBytes   435 Mbits/sec
    [ 10]   8.00-9.00   sec  48.3 MBytes   405 Mbits/sec
    [ 12]   8.00-9.00   sec  56.3 MBytes   472 Mbits/sec
    [SUM]   8.00-9.00   sec   224 MBytes  1.88 Gbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [  5]   9.00-10.00  sec  65.1 MBytes   546 Mbits/sec
    [  8]   9.00-10.00  sec  52.0 MBytes   436 Mbits/sec
    [ 10]   9.00-10.00  sec  54.7 MBytes   459 Mbits/sec
    [ 12]   9.00-10.00  sec  65.1 MBytes   546 Mbits/sec
    [SUM]   9.00-10.00  sec   237 MBytes  1.99 Gbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [  5]  10.00-10.01  sec   636 KBytes   432 Mbits/sec
    [  8]  10.00-10.01  sec   636 KBytes   432 Mbits/sec
    [ 10]  10.00-10.01  sec   663 KBytes   450 Mbits/sec
    [ 12]  10.00-10.01  sec   764 KBytes   519 Mbits/sec
    [SUM]  10.00-10.01  sec  2.64 MBytes  1.83 Gbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate
    [  5]   0.00-10.01  sec   582 MBytes   488 Mbits/sec                  receiver
    [  8]   0.00-10.01  sec   722 MBytes   605 Mbits/sec                  receiver
    [ 10]   0.00-10.01  sec   471 MBytes   395 Mbits/sec                  receiver
    [ 12]   0.00-10.01  sec   450 MBytes   377 Mbits/sec                  receiver
    [SUM]   0.00-10.01  sec  2.17 GBytes  1.86 Gbits/sec                  receiver
    -----------------------------------------------------------
    

    top output during the tests:

    [2.4.3-RELEASE][admin@pfSense]/root: top -aSH
    last pid: 97946;  load averages:  0.72,  0.28,  0.12                                                        up 2+12:43:41  09:10:01
    329 processes: 17 running, 241 sleeping, 71 waiting
    CPU:  0.1% user,  0.5% nice,  2.5% system,  5.4% interrupt, 91.6% idle
    Mem: 214M Active, 565M Inact, 830M Wired, 232M Buf, 30G Free
    Swap: 3712M Total, 3712M Free
    
      PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
       11 root       155 ki31     0K   256K CPU7    7  60.6H 100.00% [idle{idle: cpu7}]
       11 root       155 ki31     0K   256K CPU1    1  60.6H 100.00% [idle{idle: cpu1}]
       11 root       155 ki31     0K   256K CPU2    2  60.5H 100.00% [idle{idle: cpu2}]
       11 root       155 ki31     0K   256K CPU9    9  60.5H 100.00% [idle{idle: cpu9}]
       11 root       155 ki31     0K   256K CPU11  11  60.5H 100.00% [idle{idle: cpu11}]
       11 root       155 ki31     0K   256K CPU13  13  60.5H 100.00% [idle{idle: cpu13}]
       11 root       155 ki31     0K   256K CPU6    6  60.6H  99.99% [idle{idle: cpu6}]
       11 root       155 ki31     0K   256K CPU4    4  60.6H  99.88% [idle{idle: cpu4}]
       11 root       155 ki31     0K   256K CPU3    3  60.5H  98.83% [idle{idle: cpu3}]
       11 root       155 ki31     0K   256K RUN    12  60.5H  98.45% [idle{idle: cpu12}]
       11 root       155 ki31     0K   256K CPU10  10  60.5H  94.15% [idle{idle: cpu10}]
       11 root       155 ki31     0K   256K CPU14  14  60.5H  89.01% [idle{idle: cpu14}]
       12 root       -92    -     0K  1136K WAIT    0   2:07  84.99% [intr{irq277: bxe3:fp00}]
       11 root       155 ki31     0K   256K CPU5    5  60.6H  84.03% [idle{idle: cpu5}]
    24259 root        52    0 19752K  5628K select  9   0:20  75.33% iperf3 -s
       11 root       155 ki31     0K   256K CPU8    8  60.5H  70.66% [idle{idle: cpu8}]
       11 root       155 ki31     0K   256K CPU15  15  60.5H  52.12% [idle{idle: cpu15}]
       11 root       155 ki31     0K   256K CPU0    0  60.5H  14.83% [idle{idle: cpu0}]
      254 root        23    0   266M 44468K accept 12   0:28   1.20% php-fpm: pool nginx (php-fpm){php-fpm}
    97017 root        40   20   728M   570M bpf     8   4:14   0.57% /usr/local/bin/snort -R 41368 -D -q --suppress-config-log -l /var/
       12 root       -100    -     0K  1136K WAIT    0   0:53   0.25% [intr{irq20: hpet0 uhci3}]
       12 root       -60    -     0K  1136K WAIT    9   3:06   0.12% [intr{swi4: clock (0)}]
    82170 root        20    0 22116K  4796K CPU12  12   0:00   0.10% top -aSH
       12 root       -92    -     0K  1136K WAIT    1   1:36   0.09% [intr{irq273: bxe2:fp01}]
    10462 root        20    0 20356K  6412K select 11   0:10   0.07% /usr/local/sbin/openvpn --config /var/etc/openvpn/server1.conf
       12 root       -92    -     0K  1136K WAIT    1   1:52   0.07% [intr{irq268: bxe1:fp01}]
       12 root       -92    -     0K  1136K WAIT    0   2:19   0.06% [intr{irq267: bxe1:fp00}]
       12 root       -92    -     0K  1136K WAIT    1   1:30   0.06% [intr{irq263: bxe0:fp01}]
       12 root       -92    -     0K  1136K WAIT    2   2:15   0.06% [intr{irq264: bxe0:fp02}]
       12 root       -92    -     0K  1136K WAIT    2   1:59   0.06% [intr{irq279: bxe3:fp02}]
    60178 www         20    0 58924K 12688K kqread  9   0:01   0.05% /usr/local/sbin/haproxy -f /var/etc/haproxy/haproxy.cfg -p /var/ru
       12 root       -92    -     0K  1136K WAIT    3   2:28   0.05% [intr{irq280: bxe3:fp03}]
    60322 www         20    0 58924K 12632K kqread 11   0:01   0.04% /usr/local/sbin/haproxy -f /var/etc/haproxy/haproxy.cfg -p /var/ru
    

    Do you still see a CPU bottleneck here?



  • To me it looks like each adapter (bxX) is using one queue each. The queues seem to be there, but they are not in use.

       12 root       -92    -     0K  1136K WAIT    0   2:07  84.99% [intr{irq277: bxe3:fp00}]
       12 root       -92    -     0K  1136K WAIT    1   1:36   0.09% [intr{irq273: bxe2:fp01}]
       12 root       -92    -     0K  1136K WAIT    1   1:52   0.07% [intr{irq268: bxe1:fp01}]
       12 root       -92    -     0K  1136K WAIT    0   2:19   0.06% [intr{irq267: bxe1:fp00}]
       12 root       -92    -     0K  1136K WAIT    1   1:30   0.06% [intr{irq263: bxe0:fp01}]
       12 root       -92    -     0K  1136K WAIT    2   2:15   0.06% [intr{irq264: bxe0:fp02}]
       12 root       -92    -     0K  1136K WAIT    2   1:59   0.06% [intr{irq279: bxe3:fp02}]
    

    I would expect to see the load distributed between all of them.


  • Netgate Administrator

    Have you tried a test through the firewall as opposed to terminating on it?

    Steve



  • Thanks, @xciter327 ! It does look like an answer! I gonna check it soon.

    @stephenw10 , not yet. I have it on my checklist.


  • Netgate Administrator

    Mmm, it certainly isn't load spreading well. However no CPU core is at 100% either so that in itself should not be a restriction.

    Steve



  • It appears there's a known issue with Broadcom BCM57810 adapters in FreeBSD (LACP bonding is not working well): https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213606

    Today I tried to make some tests thru the HAProxy running on the firewall and the server has just screwed up after reaching ~140000 connections. Log contained:

    Aug  9 05:20:17 pfSense kernel: bxe0: ERROR: ECORE: timeout waiting for state 1
    Aug  9 05:20:17 pfSense kernel: bxe0: ERROR: Queue(3) SETUP failed (rc = -4)
    Aug  9 05:20:17 pfSense kernel: bxe0: ERROR: Queue(3) setup failed rc = -4
    Aug  9 05:20:18 pfSense rc.gateway_alarm[19058]: >>> Gateway alarm: WANGW (Addr:a.b.c.d Alarm:1 RTT:2000271ms RTTsd:3249226ms Loss:21%)
    ...
    Aug  9 05:20:28 pfSense kernel: bxe1: ERROR: TX watchdog timeout on fp[01], resetting!
    Aug  9 05:20:34 pfSense kernel: bxe1: ERROR: ECORE: timeout waiting for state 7
    Aug  9 05:21:02 pfSense kernel: bxe0: ERROR: FW failed to respond!
    Aug  9 05:21:02 pfSense kernel: bxe0: ERROR: Initialization failed, stack notified driver is NOT running!
    Aug  9 05:21:17 pfSense rc.gateway_alarm[45717]: >>> Gateway alarm: WANGW (Addr:a.b.c.d Alarm:1 RTT:0ms RTTsd:0ms Loss:100%)
    ...
    Aug  9 05:21:31 pfSense kernel: bxe2: Interface stopped DISTRIBUTING, possible flapping
    Aug  9 05:21:42 pfSense sshd[82110]: Timeout, client not responding.
    Aug  9 05:21:54 pfSense sshd[19888]: Timeout, client not responding.
    Aug  9 05:21:55 pfSense kernel: bxe0: Interface stopped DISTRIBUTING, possible flapping
    Aug  9 05:22:43 pfSense kernel: bxe1: ERROR: ECORE: timeout waiting for state 1
    Aug  9 05:22:43 pfSense kernel: bxe1: ERROR: Queue(0) SETUP failed (rc = -4)
    Aug  9 05:22:43 pfSense kernel: bxe1: ERROR: Setup leading failed! rc = -4
    Aug  9 05:23:14 pfSense kernel: bxe1: ERROR: Initialization failed, stack notified driver is NOT running!
    Aug  9 05:23:36 pfSense kernel: bxe3: Interface stopped DISTRIBUTING, possible flapping
    Aug  9 05:24:23 pfSense kernel: bxe1: Interface stopped DISTRIBUTING, possible flapping
    

    Going to change the adapters to Intel.


 

© Copyright 2002 - 2018 Rubicon Communications, LLC | Privacy Policy