Performance – Throughput / Speed Issues



  • Hi All,

    FYI - This is my first time using pfSense, so hopefully I'm not wasting everyone’s time because I missed something stupid.  Apologies in advance if this turns out to be a gross oversight on my part!!!

    This is a new pfSense deployment.  This build is intended to replace a production FortiGate 100D that cannot keep up with the recent service upgrade.  The 100D can only manage about 600Mbps of straight, unencrypted real-world throughput.  It comes nowhere near to the sales specs of 2.5 Gbps.

    The issue is, my average speed loss when testing with the new pfSense server is about 15.5%, but I've had tests with as high as 27% loss.

    Hopefully the following is enough information about the hardware and topology involved.  If I am missing anything, please let me know and I will try and get the information as soon as possible.

    Speeds with ComCast Gateway connected directly to HP ProCurve Switch
    Latency Download Upload
    9.561 918.2455 36.91273
    10.924 950.1666 48.54074
    9.573 949.4188 24.99929

    Speeds with ComCast Gateway connected to pfSense Server
    Latency Download Upload
    9.442 787.2579 38.67714
    11.421 797.3936 50.65541
    9.565 795.5139 28.34676

    Things I've tried…

    I've tried the default kern.ipc.nmbclusters setting, as well as having it set to 1000000

    I've tried the following combinations under System / Advanced / Networking:
    Hardware Checksum Offloading = UnChecked
    Hardware TCP Segmentation Offloading = Checked
    Hardware Large Receive Offloading = Checked

    Hardware Checksum Offloading = Checked
    Hardware TCP Segmentation Offloading = Checked
    Hardware Large Receive Offloading = Checked

    Hardware Checksum Offloading = UnChecked
    Hardware TCP Segmentation Offloading = UnChecked
    Hardware Large Receive Offloading = UnChecked

    Connection
    ComCast Business Internet 1 Gig

    Network Equipment
    ComCast DOCSIS 3.1 Gateway
    Vendor: Technicolor
    Model: CGA4131COM

    HP ProCurve V1910-48G - JE009A Switch
    104 Gbps switching capacity, max.
    77.4 Mpps forwarding rate, max.

    pfSense Server Hardware
    Supermicro SuperServer 5018D-FN8T
    Intel® Xeon® processor D-1518 2.2GHz Quad Core
    16GB 2400MHZ DDR4 ECC Reg CL17 DIMM 1RX4
    Dual 10G SFP+ ports from D-1500 SoC
    Quad 1GbE with Intel I350-AM4
    Dual 1GbE with Intel I210

    Added
    Intel I350-T2 Server NIC (WAN & LAN Connected here)
    1 x Samsung SSD 960 EVO NVMe M.2 250GB
    1 x Samsung SSD 860 EVO 250GB (pfSense installed here for now on ZFS)

    Network Topology
    ComCast Gateway <–> pfSense <--> HP ProCurve Switch <--> Workstations

    pfSense Configuration
    (It's basically a default install right out of the box right now for testing)

    System / Advanced / Networking
    Hardware Checksum Offloading = UnChecked
    Hardware TCP Segmentation Offloading = Checked
    Hardware Large Receive Offloading = Checked

    System / Routing / Gateways
    GW_WAN (default) only

    System / Advanced / System Tunables
    kern.ipc.nmbclusters = 1000000

    Interfaces / WAN
    Block private networks and loopback addresses = Checked
    Block bogon networks = Checked

    Interfaces / LAN
    Block private networks and loopback addresses = UnChecked
    Block bogon networks = UnChecked

    Firewall / NAT / Outbound
    Auto Created mappings only

    Firewall / Rules / WAN
    1 rule added after default installation
    Action: Pass / Interface: wan / Protocol: icmp / ICMP Subtypes: any / Source: any / Destination: any

    Firewall / Rules / LAN
    Default Anti-Lockout Rule
    Default allow LAN to any rule
    Default allow LAN IPv6 to any rule

    pfSense Server Hardware Performance Details

    IPerf Results - Verify Cables, Switch, and ComCast Gateway conectivity
    (pfSense running IPerf is more than capable of hitting 940 Mbits/sec on both the WAN & LAN Ports on the network)
    –----------------------------------------------------------
    External NIC (WAN Port)
    TCP window size: 64.2 KByte (default)

    [ ID] Interval      Transfer    Bandwidth
    [  3]  0.0-10.0 sec  1.10 GBytes  947 Mbits/sec

    –----------------------------------------------------------
    Internal NIC (LAN Port)
    TCP window size: 64.2 KByte (default)

    [ ID] Interval      Transfer    Bandwidth
    [  3]  0.0-10.0 sec  1.10 GBytes  948 Mbits/sec

    TOP Command Results
    3 sample/average "top" command results from pfSense while running speed tests.
    –----------------------------------------------------------
    last pid: 55894;  load averages:  0.50,  0.34,  0.21 up 0+23:49:1813:22:17
    601 processes: 10 running, 471 sleeping, 8 zombie, 112 waiting
    CPU:    % user,    % nice,    % system,    % interrupt,    % idle
    Mem: 31M Active, 129M Inact, 549M Wired, 132K Buf, 15G Free
    ARC: 155M Total, 423K MFU, 151M MRU, 156K Anon, 534K Header, 2556K Other
        41M Compressed, 113M Uncompressed, 2.74:1 Ratio
    Swap: 2048M Total, 2048M Free

    PID USERNAME  PRI NICE  SIZE    RES STATE  C  TIME    WCPU COMMAND
      11 root      155 ki31    0K  128K CPU0    0  23.8H  99.85% [idle{idle: cpu0}]
      11 root      155 ki31    0K  128K CPU1    1  23.8H  99.56% [idle{idle: cpu1}]
      11 root      155 ki31    0K  128K CPU5    5  23.8H  98.39% [idle{idle: cpu5}]
      11 root      155 ki31    0K  128K CPU2    2  23.8H  98.10% [idle{idle: cpu2}]
      11 root      155 ki31    0K  128K CPU6    6  23.8H  97.85% [idle{idle: cpu6}]
      11 root      155 ki31    0K  128K CPU7    7  23.8H  95.07% [idle{idle: cpu7}]
      11 root      155 ki31    0K  128K RUN    4  23.8H  94.09% [idle{idle: cpu4}]
      11 root      155 ki31    0K  128K CPU3    3  23.8H  82.67% [idle{idle: cpu3}]
      12 root      -92    -    0K  1808K CPU3    3  0:09  17.38% [intr{irq290: igb0:que 3}]
      12 root      -92    -    0K  1808K WAIT    4  0:04  5.76% [intr{irq300: igb1:que 4}]
      12 root      -92    -    0K  1808K WAIT    7  0:08  4.69% [intr{irq294: igb0:que 7}]
      12 root      -92    -    0K  1808K WAIT    2  0:08  4.69% [intr{irq289: igb0:que 2}]
      12 root      -92    -    0K  1808K WAIT    0  0:07  1.86% [intr{irq296: igb1:que 0}]
      12 root      -92    -    0K  1808K WAIT    5  0:14  1.27% [intr{irq292: igb0:que 5}]
      12 root      -92    -    0K  1808K WAIT    0  0:21  1.17% [intr{irq287: igb0:que 0}]
      12 root      -92    -    0K  1808K WAIT    2  0:04  0.98% [intr{irq298: igb1:que 2}]
      12 root      -92    -    0K  1808K WAIT    6  0:04  0.68% [intr{irq302: igb1:que 6}]
      12 root      -92    -    0K  1808K WAIT    6  0:16  0.59% [intr{irq293: igb0:que 6}]
    53541 root        20    0 39364K  4148K bpf    7  0:03  0.49% /usr/local/bandwidthd/bandwidthd
    53015 root        20    0 39364K  4368K bpf    0  0:03  0.49% /usr/local/bandwidthd/bandwidthd
    52709 root        20    0 39364K  4592K bpf    1  0:03  0.39% /usr/local/bandwidthd/bandwidthd
    53227 root        20    0 39364K  4352K bpf    4  0:03  0.39% /usr/local/bandwidthd/bandwidthd
    53853 root        20    0 39364K  4136K bpf    2  0:03  0.39% /usr/local/bandwidthd/bandwidthd
    52658 root        20    0 39364K  4604K bpf    6  0:03  0.29% /usr/local/bandwidthd/bandwidthd
    48506 root        21    0  268M 37828K accept  2  0:09  0.10% php-fpm: pool nginx (php-fpm){php-fpm}
    53454 root        20    0 39364K  4164K bpf    1  0:03  0.10% /usr/local/bandwidthd/bandwidthd
        0 root      -16    -    0K  5632K swapin  0  18.6H  0.00% [kernel{swapper}]
      24 root      -16    -    0K    16K pftm    4  0:20  0.00% [pf purge]
      12 root      -60    -    0K  1808K WAIT    0  0:18  0.00% [intr{swi4: clock (0)}]
      327 root        52    0  266M 35800K accept  5  0:11  0.00% php-fpm: pool nginx (php-fpm){php-fpm}
      326 root        52    0  266M 35052K accept  0  0:11  0.00% php-fpm: pool nginx (php-fpm){php-fpm}
      12 root      -92    -    0K  1808K WAIT    1  0:10  0.00% [intr{irq288: igb0:que 1}]
      22 root        -8    -    0K  128K tx->tx  1  0:10  0.00% [zfskern{txg_thread_enter}]
    47952 root        20    0 10488K  2568K select  5  0:10  0.00% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /etc/syslog.con
      25 root      -16    -    0K    16K -      0  0:09  0.00% [rand_harvestq]
      12 root      -92    -    0K  1808K WAIT    4  0:08  0.00% [intr{irq291: igb0:que 4}]
    9287 root        20    0 12736K  1904K bpf    0  0:07  0.00% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
        0 root      -12    -    0K  5632K -      7  0:06  0.00% [kernel{zio_write_issue_0}]
        0 root      -12    -    0K  5632K -      5  0:06  0.00% [kernel{zio_write_issue_4}]
        0 root      -12    -    0K  5632K -      1  0:06  0.00% [kernel{zio_write_issue_5}]
        0 root      -12    -    0K  5632K -      1  0:06  0.00% [kernel{zio_write_issue_2}]
        0 root      -12    -    0K  5632K -      2  0:06  0.00% [kernel{zio_write_issue_1}]
        0 root      -12    -    0K  5632K -      4  0:06  0.00% [kernel{zio_write_issue_3}]
        0 root      -16    -    0K  5632K -      4  0:05  0.00% [kernel{zio_write_intr_7}]
        0 root      -16    -    0K  5632K -      5  0:05  0.00% [kernel{zio_write_intr_4}]
        0 root      -16    -    0K  5632K -      0  0:05  0.00% [kernel{zio_write_intr_3}]
        0 root      -16    -    0K  5632K -      2  0:05  0.00% [kernel{zio_write_intr_0}]
        0 root      -16    -    0K  5632K -      1  0:05  0.00% [kernel{zio_write_intr_6}]
        0 root      -16    -    0K  5632K -      6  0:05  0.00% [kernel{zio_write_intr_5}]
        0 root      -16    -    0K  5632K -      3  0:05  0.00% [kernel{zio_write_intr_1}]
        0 root      -16    -    0K  5632K -      5  0:05  0.00% [kernel{zio_write_intr_2}]
    –----------------------------------------------------------
    last pid: 96083;  load averages:  0.22,  0.15,  0.13 up 0+23:46:1113:19:10
    601 processes: 9 running, 471 sleeping, 8 zombie, 113 waiting
    CPU:    % user,    % nice,    % system,    % interrupt,    % idle
    Mem: 39M Active, 123M Inact, 548M Wired, 132K Buf, 15G Free
    ARC: 155M Total, 423K MFU, 151M MRU, 568K Anon, 535K Header, 2551K Other
        41M Compressed, 113M Uncompressed, 2.74:1 Ratio
    Swap: 2048M Total, 2048M Free

    PID USERNAME  PRI NICE  SIZE    RES STATE  C  TIME    WCPU COMMAND
      11 root      155 ki31    0K  128K CPU1    1  23.7H 100.00% [idle{idle: cpu1}]
      11 root      155 ki31    0K  128K CPU2    2  23.7H 100.00% [idle{idle: cpu2}]
      11 root      155 ki31    0K  128K CPU6    6  23.7H 100.00% [idle{idle: cpu6}]
      11 root      155 ki31    0K  128K CPU0    0  23.7H 100.00% [idle{idle: cpu0}]
      11 root      155 ki31    0K  128K CPU3    3  23.7H  99.07% [idle{idle: cpu3}]
      11 root      155 ki31    0K  128K CPU5    5  23.7H  97.66% [idle{idle: cpu5}]
      11 root      155 ki31    0K  128K CPU7    7  23.7H  97.27% [idle{idle: cpu7}]
      11 root      155 ki31    0K  128K RUN    4  23.7H  92.48% [idle{idle: cpu4}]
      12 root      -92    -    0K  1808K WAIT    4  0:07  6.69% [intr{irq291: igb0:que 4}]
      12 root      -92    -    0K  1808K WAIT    5  0:09  4.79% [intr{irq292: igb0:que 5}]
      12 root      -92    -    0K  1808K WAIT    7  0:05  3.96% [intr{irq294: igb0:que 7}]
      12 root      -92    -    0K  1808K WAIT    0  0:04  3.47% [intr{irq296: igb1:que 0}]
      12 root      -92    -    0K  1808K WAIT    4  0:01  2.78% [intr{irq300: igb1:que 4}]
      12 root      -92    -    0K  1808K WAIT    0  0:13  2.49% [intr{irq287: igb0:que 0}]
      12 root      -92    -    0K  1808K WAIT    3  0:04  2.20% [intr{irq290: igb0:que 3}]
      12 root      -92    -    0K  1808K WAIT    6  0:11  1.27% [intr{irq293: igb0:que 6}]
      12 root      -92    -    0K  1808K WAIT    2  0:05  1.27% [intr{irq289: igb0:que 2}]
      12 root      -92    -    0K  1808K WAIT    7  0:01  0.78% [intr{irq303: igb1:que 7}]
      12 root      -92    -    0K  1808K WAIT    3  0:02  0.68% [intr{irq299: igb1:que 3}]
      12 root      -92    -    0K  1808K WAIT    1  0:02  0.39% [intr{irq297: igb1:que 1}]
    53541 root        20    0 39364K  4148K bpf    1  0:02  0.29% /usr/local/bandwidthd/bandwidthd
    53853 root        20    0 39364K  4136K bpf    7  0:02  0.29% /usr/local/bandwidthd/bandwidthd
    52709 root        20    0 39364K  4592K bpf    4  0:02  0.20% /usr/local/bandwidthd/bandwidthd
    52658 root        20    0 39364K  4604K bpf    1  0:02  0.20% /usr/local/bandwidthd/bandwidthd
      327 root        22    0  268M 37660K accept  5  0:11  0.10% php-fpm: pool nginx (php-fpm){php-fpm}
    53227 root        20    0 39364K  4352K bpf    6  0:02  0.10% /usr/local/bandwidthd/bandwidthd
    53537 root        20    0 39364K  4152K bpf    0  0:02  0.10% /usr/local/bandwidthd/bandwidthd
        0 root      -16    -    0K  5632K swapin  2  18.6H  0.00% [kernel{swapper}]
      24 root      -16    -    0K    16K pftm    0  0:19  0.00% [pf purge]
      12 root      -60    -    0K  1808K WAIT    3  0:18  0.00% [intr{swi4: clock (0)}]
      326 root        52    0  266M 35048K accept  4  0:10  0.00% php-fpm: pool nginx (php-fpm){php-fpm}
      22 root        -8    -    0K  128K tx->tx  3  0:10  0.00% [zfskern{txg_thread_enter}]
    47952 root        20    0 10488K  2568K select  3  0:10  0.00% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /etc/syslog.con
    48506 root        52    0  268M 37828K accept  3  0:09  0.00% php-fpm: pool nginx (php-fpm){php-fpm}
      25 root      -16    -    0K    16K -      0  0:09  0.00% [rand_harvestq]
    9287 root        20    0 12736K  1904K bpf    0  0:06  0.00% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
        0 root      -12    -    0K  5632K -      4  0:06  0.00% [kernel{zio_write_issue_0}]
        0 root      -12    -    0K  5632K -      2  0:06  0.00% [kernel{zio_write_issue_4}]
        0 root      -12    -    0K  5632K -      3  0:06  0.00% [kernel{zio_write_issue_5}]
        0 root      -12    -    0K  5632K -      1  0:06  0.00% [kernel{zio_write_issue_2}]
        0 root      -12    -    0K  5632K -      5  0:06  0.00% [kernel{zio_write_issue_1}]
        0 root      -12    -    0K  5632K -      0  0:06  0.00% [kernel{zio_write_issue_3}]
      12 root      -92    -    0K  1808K WAIT    1  0:06  0.00% [intr{irq288: igb0:que 1}]
        0 root      -16    -    0K  5632K -      2  0:05  0.00% [kernel{zio_write_intr_7}]
        0 root      -16    -    0K  5632K -      3  0:05  0.00% [kernel{zio_write_intr_4}]
        0 root      -16    -    0K  5632K -      7  0:05  0.00% [kernel{zio_write_intr_3}]
        0 root      -16    -    0K  5632K -      5  0:05  0.00% [kernel{zio_write_intr_0}]
        0 root      -16    -    0K  5632K -      5  0:05  0.00% [kernel{zio_write_intr_6}]
        0 root      -16    -    0K  5632K -      6  0:05  0.00% [kernel{zio_write_intr_5}]
        0 root      -16    -    0K  5632K -      3  0:05  0.00% [kernel{zio_write_intr_1}]
        0 root      -16    -    0K  5632K -      4  0:05  0.00% [kernel{zio_write_intr_2}]
    –----------------------------------------------------------
    last pid:  3002;  load averages:  0.21,  0.15,  0.13 up 0+23:46:1713:19:16
    601 processes: 9 running, 471 sleeping, 8 zombie, 113 waiting
    CPU:    % user,    % nice,    % system,    % interrupt,    % idle
    Mem: 39M Active, 123M Inact, 548M Wired, 132K Buf, 15G Free
    ARC: 155M Total, 423K MFU, 151M MRU, 152K Anon, 535K Header, 2551K Other
        41M Compressed, 113M Uncompressed, 2.74:1 Ratio
    Swap: 2048M Total, 2048M Free

    PID USERNAME  PRI NICE  SIZE    RES STATE  C  TIME    WCPU COMMAND
      11 root      155 ki31    0K  128K CPU1    1  23.7H  97.85% [idle{idle: cpu1}]
      11 root      155 ki31    0K  128K CPU6    6  23.7H  96.19% [idle{idle: cpu6}]
      11 root      155 ki31    0K  128K CPU2    2  23.7H  95.46% [idle{idle: cpu2}]
      11 root      155 ki31    0K  128K CPU0    0  23.7H  94.58% [idle{idle: cpu0}]
      11 root      155 ki31    0K  128K RUN    3  23.7H  94.09% [idle{idle: cpu3}]
      11 root      155 ki31    0K  128K CPU7    7  23.7H  92.68% [idle{idle: cpu7}]
      11 root      155 ki31    0K  128K CPU5    5  23.7H  92.19% [idle{idle: cpu5}]
      11 root      155 ki31    0K  128K CPU4    4  23.7H  87.35% [idle{idle: cpu4}]
      12 root      -92    -    0K  1808K WAIT    4  0:07  7.67% [intr{irq291: igb0:que 4}]
      12 root      -92    -    0K  1808K WAIT    5  0:09  6.69% [intr{irq292: igb0:que 5}]
      12 root      -92    -    0K  1808K WAIT    0  0:04  3.96% [intr{irq296: igb1:que 0}]
      12 root      -92    -    0K  1808K WAIT    0  0:13  3.86% [intr{irq287: igb0:que 0}]
      12 root      -92    -    0K  1808K WAIT    7  0:05  3.76% [intr{irq294: igb0:que 7}]
      12 root      -92    -    0K  1808K WAIT    4  0:01  3.27% [intr{irq300: igb1:que 4}]
      12 root      -92    -    0K  1808K WAIT    3  0:04  3.17% [intr{irq290: igb0:que 3}]
      12 root      -92    -    0K  1808K WAIT    2  0:05  3.08% [intr{irq289: igb0:que 2}]
      12 root      -92    -    0K  1808K WAIT    6  0:12  1.86% [intr{irq293: igb0:que 6}]
      12 root      -92    -    0K  1808K WAIT    7  0:01  1.56% [intr{irq303: igb1:que 7}]
      12 root      -92    -    0K  1808K WAIT    3  0:02  0.78% [intr{irq299: igb1:que 3}]
      12 root      -92    -    0K  1808K WAIT    1  0:03  0.49% [intr{irq297: igb1:que 1}]
    53541 root        20    0 39364K  4148K bpf    3  0:02  0.29% /usr/local/bandwidthd/bandwidthd
    53853 root        20    0 39364K  4136K bpf    0  0:02  0.29% /usr/local/bandwidthd/bandwidthd
      326 root        22    0  266M 35048K accept  4  0:10  0.20% php-fpm: pool nginx (php-fpm){php-fpm}
    52709 root        20    0 39364K  4592K bpf    0  0:02  0.20% /usr/local/bandwidthd/bandwidthd
    52658 root        20    0 39364K  4604K bpf    6  0:02  0.20% /usr/local/bandwidthd/bandwidthd
    53227 root        20    0 39364K  4352K bpf    3  0:02  0.10% /usr/local/bandwidthd/bandwidthd
    53015 root        20    0 39364K  4368K bpf    3  0:02  0.10% /usr/local/bandwidthd/bandwidthd
    53537 root        20    0 39364K  4152K bpf    6  0:02  0.10% /usr/local/bandwidthd/bandwidthd
        0 root      -16    -    0K  5632K swapin  6  18.6H  0.00% [kernel{swapper}]
      24 root      -16    -    0K    16K pftm    6  0:19  0.00% [pf purge]
      12 root      -60    -    0K  1808K WAIT    0  0:18  0.00% [intr{swi4: clock (0)}]
      327 root        52    0  268M 37660K accept  1  0:11  0.00% php-fpm: pool nginx (php-fpm){php-fpm}
      22 root        -8    -    0K  128K tx->tx  7  0:10  0.00% [zfskern{txg_thread_enter}]
    47952 root        20    0 10488K  2568K select  3  0:10  0.00% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /etc/syslog.con
    48506 root        52    0  268M 37828K accept  3  0:09  0.00% php-fpm: pool nginx (php-fpm){php-fpm}
      25 root      -16    -    0K    16K -      3  0:09  0.00% [rand_harvestq]
    9287 root        20    0 12736K  1904K bpf    7  0:06  0.00% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
        0 root      -12    -    0K  5632K -      1  0:06  0.00% [kernel{zio_write_issue_0}]
        0 root      -12    -    0K  5632K -      4  0:06  0.00% [kernel{zio_write_issue_4}]
        0 root      -12    -    0K  5632K -      1  0:06  0.00% [kernel{zio_write_issue_5}]
        0 root      -12    -    0K  5632K -      2  0:06  0.00% [kernel{zio_write_issue_2}]
        0 root      -12    -    0K  5632K -      7  0:06  0.00% [kernel{zio_write_issue_1}]
        0 root      -12    -    0K  5632K -      2  0:06  0.00% [kernel{zio_write_issue_3}]
      12 root      -92    -    0K  1808K WAIT    1  0:06  0.00% [intr{irq288: igb0:que 1}]
        0 root      -16    -    0K  5632K -      7  0:05  0.00% [kernel{zio_write_intr_7}]
        0 root      -16    -    0K  5632K -      7  0:05  0.00% [kernel{zio_write_intr_4}]
        0 root      -16    -    0K  5632K -      0  0:05  0.00% [kernel{zio_write_intr_3}]
        0 root      -16    -    0K  5632K -      1  0:05  0.00% [kernel{zio_write_intr_0}]
        0 root      -16    -    0K  5632K -      2  0:05  0.00% [kernel{zio_write_intr_6}]
        0 root      -16    -    0K  5632K -      0  0:05  0.00% [kernel{zio_write_intr_5}]
        0 root      -16    -    0K  5632K -      6  0:05  0.00% [kernel{zio_write_intr_1}]
    –----------------------------------------------------------

    NETSTAT Command Results
    3 sample/average "netstat -m" command results from pfSense while running speed tests.
    –----------------------------------------------------------
    17202/6333/23535 mbufs in use (current/cache/total)
    16790/3210/20000/1000000 mbuf clusters in use (current/cache/total/max)
    16790/3197 mbuf+clusters out of packet secondary zone in use (current/cache)
    0/837/837/504803 4k (page size) jumbo clusters in use (current/cache/total/max)
    0/0/0/149571 9k jumbo clusters in use (current/cache/total/max)
    0/0/0/84133 16k jumbo clusters in use (current/cache/total/max)
    37880K/11351K/49231K bytes allocated to network (current/cache/total)
    0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
    0/0/0 requests for jumbo clusters denied (4k/9k/16k)
    120 sendfile syscalls
    59 sendfile syscalls completed without I/O request
    86 requests for I/O initiated by sendfile
    809 pages read by sendfile as part of a request
    874 pages were valid at time of a sendfile request
    0 pages were requested for read ahead by applications
    0 pages were read ahead by sendfile
    0 times sendfile encountered an already busy page
    0 requests for sfbufs denied
    0 requests for sfbufs delayed

    17206/6329/23535 mbufs in use (current/cache/total)
    16794/3206/20000/1000000 mbuf clusters in use (current/cache/total/max)
    16794/3193 mbuf+clusters out of packet secondary zone in use (current/cache)
    0/837/837/504803 4k (page size) jumbo clusters in use (current/cache/total/max)
    0/0/0/149571 9k jumbo clusters in use (current/cache/total/max)
    0/0/0/84133 16k jumbo clusters in use (current/cache/total/max)
    37889K/11342K/49231K bytes allocated to network (current/cache/total)
    0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
    0/0/0 requests for jumbo clusters denied (4k/9k/16k)
    120 sendfile syscalls
    59 sendfile syscalls completed without I/O request
    86 requests for I/O initiated by sendfile
    809 pages read by sendfile as part of a request
    874 pages were valid at time of a sendfile request
    0 pages were requested for read ahead by applications
    0 pages were read ahead by sendfile
    0 times sendfile encountered an already busy page
    0 requests for sfbufs denied
    0 requests for sfbufs delayed

    17194/6341/23535 mbufs in use (current/cache/total)
    16782/3218/20000/1000000 mbuf clusters in use (current/cache/total/max)
    16782/3205 mbuf+clusters out of packet secondary zone in use (current/cache)
    0/837/837/504803 4k (page size) jumbo clusters in use (current/cache/total/max)
    0/0/0/149571 9k jumbo clusters in use (current/cache/total/max)
    0/0/0/84133 16k jumbo clusters in use (current/cache/total/max)
    37862K/11369K/49231K bytes allocated to network (current/cache/total)
    0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
    0/0/0 requests for jumbo clusters denied (4k/9k/16k)
    120 sendfile syscalls
    59 sendfile syscalls completed without I/O request
    86 requests for I/O initiated by sendfile
    809 pages read by sendfile as part of a request
    874 pages were valid at time of a sendfile request
    0 pages were requested for read ahead by applications
    0 pages were read ahead by sendfile
    0 times sendfile encountered an already busy page
    0 requests for sfbufs denied
    0 requests for sfbufs delayed

    Thank you very much for taking time out of your day to read my post!

    Cheers,
    Clint



  • Seems like decent hardware. I assume Hyperthreading is enabled with queues numbered 0-7. Try disabling HT. I'm not sure how much of a difference it may or may not make, but I've hard bad things about HT for certain workloads, like firewalls. And FreeBSD doesn't like much more than 4 threads for network.



  • Harvy66,

    First off, thank you so very much for responding, any help at all is greatly appreciated at this point!!!

    I disabled hyper-threading and re-ran the tests.  Unfortunately, there really was no discernable difference in the results.  The throughput seems to max out just shy of 800Mbps per interface (1.6Gbps total) no matter what I try.

    Latency Download Upload
    10.209 769.0828 30.01639
    11.294 794.7166 42.77486
    11.709 797.9307 45.43179

    last pid: 60228;  load averages:  0.41,  0.22,  0.10 up 0+00:06:5607:57:25
    49 processes:  1 running, 46 sleeping, 2 zombie
    CPU 0:  2.0% user,  0.0% nice,  0.8% system,  7.9% interrupt, 89.4% idle
    CPU 1:  2.0% user,  0.0% nice,  0.4% system,  7.5% interrupt, 90.2% idle
    CPU 2:  1.2% user,  0.0% nice,  0.0% system, 20.1% interrupt, 78.7% idle
    CPU 3:  1.2% user,  0.0% nice,  0.8% system, 13.0% interrupt, 85.0% idle
    Mem: 109M Active, 12M Inact, 450M Wired, 132K Buf, 15G Free
    ARC: 119M Total, 176K MFU, 117M MRU, 16K Anon, 382K Header, 1672K Other
        34M Compressed, 86M Uncompressed, 2.52:1 Ratio
    Swap: 2048M Total, 2048M Free

    last pid: 84188;  load averages:  0.13,  0.17,  0.09 up 0+00:08:2407:58:53
    49 processes:  1 running, 46 sleeping, 2 zombie
    CPU 0:  0.5% user,  0.0% nice,  0.0% system, 15.8% interrupt, 83.8% idle
    CPU 1:  0.5% user,  0.0% nice,  0.5% system, 14.9% interrupt, 84.2% idle
    CPU 2:  0.5% user,  0.0% nice,  0.5% system,  9.5% interrupt, 89.6% idle
    CPU 3:  0.5% user,  0.0% nice,  0.5% system,  0.0% interrupt, 99.1% idle
    Mem: 109M Active, 12M Inact, 451M Wired, 132K Buf, 15G Free
    ARC: 119M Total, 176K MFU, 117M MRU, 160K Anon, 383K Header, 1673K Other
        34M Compressed, 86M Uncompressed, 2.52:1 Ratio
    Swap: 2048M Total, 2048M Free

    last pid: 83979;  load averages:  0.18,  0.18,  0.09 up 0+00:07:4007:58:09
    49 processes:  1 running, 46 sleeping, 2 zombie
    CPU 0:  0.0% user,  0.0% nice,  0.5% system, 14.3% interrupt, 85.2% idle
    CPU 1:  1.6% user,  0.0% nice,  0.5% system,  0.0% interrupt, 97.9% idle
    CPU 2:  0.5% user,  0.0% nice,  0.0% system, 13.8% interrupt, 85.7% idle
    CPU 3:  1.1% user,  0.0% nice,  0.0% system, 11.6% interrupt, 87.3% idle
    Mem: 109M Active, 12M Inact, 451M Wired, 132K Buf, 15G Free
    ARC: 119M Total, 176K MFU, 117M MRU, 160K Anon, 383K Header, 1673K Other
        34M Compressed, 86M Uncompressed, 2.52:1 Ratio
    Swap: 2048M Total, 2048M Free
    Displaying per-CPU statistics.
      PID USERNAME    THR PRI NICE  SIZE    RES STATE  C  TIME    WCPU COMMAND
    52257 root          1  20    0 39364K  3888K bpf    2  0:00  0.72% bandwidthd
    51875 root          1  20    0 39364K  3888K bpf    1  0:00  0.71% bandwidthd
    50749 root          1  20    0 39364K  4056K bpf    0  0:00  0.70% bandwidthd
    51594 root          1  20    0 39364K  3896K bpf    0  0:00  0.68% bandwidthd
    50861 root          1  20    0 39364K  3896K bpf    1  0:00  0.67% bandwidthd
    51329 root          1  20    0 39364K  3896K bpf    0  0:00  0.65% bandwidthd
    52168 root          1  20    0 39364K  3888K bpf    1  0:00  0.65% bandwidthd
    50483 root          1  20    0 39364K  4068K bpf    2  0:00  0.62% bandwidthd
    83979 root          1  20    0 20068K  3152K CPU3    3  0:00  0.08% top
    67292 root          1  20    0 78872K  7248K select  0  0:00  0.02% sshd
    45690 root          1  20    0 10488K  2564K select  1  0:00  0.01% syslogd
    9036 root          5  52    0 13036K  1920K uwait  2  0:00  0.01% dpinger
    14638 root          2  20    0 24660K 12500K select  1  0:00  0.01% ntpd
    7102 root          1  20    0 12736K  1764K bpf    1  0:00  0.00% filterlog
      321 root          1  20    0  259M 18412K kqread  3  0:00  0.00% php-fpm
    23744 root          1  20    0 96068K 47132K select  0  0:00  0.00% bsnmpd
      323 root          1  22    0  261M 28944K accept  3  0:00  0.00% php-fpm
      322 root          1  20    0  261M 27900K accept  1  0:00  0.00% php-fpm
    26063 root          1  52  20 13096K  2420K wait    1  0:00  0.00% sh
    71280 root          1  20    0 13400K  3376K pause  3  0:00  0.00% tcsh
    93481 root          1  52    0 39440K  2704K wait    1  0:00  0.00% login
      336 root          1  40  20 19456K  2956K kqread  0  0:00  0.00% check_reload_status
    67826 root          1  52    0 13096K  2584K wait    0  0:00  0.00% sh
    94899 root          1  52    0 13096K  2704K wait    0  0:00  0.00% sh
    94787 root          2  20    0 10588K  2320K piperd  2  0:00  0.00% sshlockout_pf
    14169 root          1  20    0 12504K  1832K nanslp  3  0:00  0.00% cron
      374 root          1  20    0  9176K  4692K select  0  0:00  0.00% devd
    95106 root          1  52    0 13096K  2584K ttyin  1  0:00  0.00% sh
    94666 root          1  52    0 10396K  2132K ttyin  2  0:00  0.00% getty
    13522 root          1  20    0 53524K  4584K select  3  0:00  0.00% sshd
    94078 root          1  52    0 10396K  2132K ttyin  3  0:00  0.00% getty
    94421 root          1  52    0 10396K  2132K ttyin  1  0:00  0.00% getty
    94237 root          1  52    0 10396K  2132K ttyin  2  0:00  0.00% getty
    93518 root          1  52    0 10396K  2132K ttyin  0  0:00  0.00% getty
    94167 root          1  52    0 10396K  2132K ttyin  0  0:00  0.00% getty
    93789 root          1  52    0 10396K  2132K ttyin  3  0:00  0.00% getty
    13806 root          1  52    0 25416K  5024K kqread  3  0:00  0.00% nginx
    14093 root          1  52    0 25416K  5020K kqread  0  0:00  0.00% nginx
    13731 root          1  52    0 25416K  4552K pause  2  0:00  0.00% nginx
    83834 root          1  52  20  6180K  1936K nanslp  0  0:00  0.00% sleep
    48054 root          1  52    0  8232K  2012K wait    3  0:00  0.00% minicron
    48186 root          1  20    0  8232K  2028K nanslp  3  0:00  0.00% minicron
    49134 root          1  52    0  8232K  2012K wait    1  0:00  0.00% minicron
    48418 root          1  52    0  8232K  2012K wait    3  0:00  0.00% minicron
      338 root          1  52  20 19456K  2860K kqread  1  0:00  0.00% check_reload_status
    48856 root          1  52    0  8232K  2028K nanslp  1  0:00  0.00% minicron
    49525 root          1  52    0  8232K  2028K nanslp  1  0:00  0.00% minicron

    I’m going to install Windows 10 on the hardware so I can verify an “Apples to Apples” speed test (current workstation vs. the new pfSense hardware).  If those results show the hardware can in fact get into the 930Mbps/950Mbps range, my next step will be to do a fresh install of pfSense, but this time I won’t run the WAN/LAN links on the same Intel I350-T2 Server NIC.  I “doubt” splitting the links will solve it, an Intel I350-T2 NIC should have more than enough horsepower to run two unencrypted links wide open, but I still need to rule it out just the same.

    Again, thank you very much for the reply!!!

    Cheers,
    Clint



  • @Harvy66:

    Seems like decent hardware.

    That's the same thought came to me while reading…wished OP good luck as well.



  • I'm not much experienced in troubleshooting other hardware, but lets give this a shot while no other people are posting. Backup your current config and do a fresh install. Don't make any changes to the default settings, other than the bare minimum, and see how it performs. Weed out all variables. Out of the box, pfSense has pretty good performance.



  • First, again, thank you to Harvy66 and also NollipfSense, I appreciate any insight you and others may have!

    Ok, so here’s what I’ve done so far:
    •  Installed Windows 10 on the pfSense hardware, ran same speed tests as I’ve been testing with from my workstation, the results are on par with my workstation, 920Mbps to just over 950Mbps.

    •  Installed Windows 2016 Server on the pfSense hardware, ran same speed tests as I’ve been testing with from my workstation, the results are on par with my workstation, 920Mbps to just over 950Mbps.
    (Unfortunately, this only tests one of the of the 2 Intel I350-T2 NIC ports at a time, so while it is on par with my workstation, it’s not a true throughput test of both NIC ports simultaneously.)

    •  Performed a fresh install of pfSense 2.4.3 and the only changes made were whatever the initial setup wizard changes and I enabled the Secure Shell.
    (Unfortunately, after the fresh, unaltered install, there was no difference. The results were exactly as before, a speed loss of at least 15% to the mid/high 20% range.)

    So, I did some more research to make sure my math was correct in regards to the available PCIe bandwidth and the Intel I350-T2 Server NIC.  My math sanity check appears correct, PCIe 2.x has a theoretical max bandwidth of 500MB/s per lane, so a 4-lane PCI Express 2.1 card should have 2,000MB/s, or 16,000 Mbps available to it (in a perfect world).  Obviously this is way more than card or I need, so we're good.

    But…  From the Intel specs:
    "Intel® Ethernet Controller I350 with PCI Express* V2.1 (5 GT/s) Support enables customers to take full advantage of 1 GbE by providing maximum bi-directional throughput per port on a single adapter"

    Could this be more typical marketing speak??? They say “providing maximum bi-directional throughput per port”, they do not specifically say a full 1 GbE per port simultaneously!  Hummmmmmm.  Also, I noticed the “V2.1 (5 GT/s)”.  I have the motherboard currently set to Gen 3 (8 GT/s).

    Next Steps:
    •  Change the BIOS settings for Slot 7 on the motherboard to Gen 2 (5 GT/s) and retest.

    •  Split the WAN/LAN links like I mentioned before, but haven’t had time to get to this one yet.

    If changing the PCI-E settings in the BIOS for Slot 7 does it, great.  But if splitting the WAN/LAN fixes it, what does that tell us?  The FreeBSD 11.1 Release doesn’t have the greatest support/driver for Intel I350-T2 NIC’s???  If so, no one has ever caught this before???  These are just some questions to think about…

    Thanx for reading and hanging in there with me!!!

    Cheers,
    Clint



  • Hi -

    I have been using essentially the exact same hardware setup with a symmetric gigabit fiber connection for about a year now and it has worked great.  The hardware in the 5018D-FN8T has no trouble saturating a gigabit link and that's even with the Snort IDS package also running on the interfaces.  The only difference in my case is that I used the onboard NIC ports (I210 and I350) for the WAN And LAN connections vs. an add-on card in the PCI Express slot.  As you have already suggested I would first try to the other system network ports to see if that makes a difference in speed.  If not, there are also some NIC parameters we can tune in FreeBSD to improve performance with very high speed internet connections.

    Hope this helps.



  • @Clint:

    The FreeBSD 11.1 Release doesn’t have the greatest support/driver for Intel I350-T2 NIC’s???

    I suspected this and filed bug report; however, it was regarding the issue with Suricata in inline mode and netmap. It would be great to share your finds with them (https://bugs.freebsd.org/bugzilla/enter_bug.cgi).



  • Hi All,

    (Sorry I don’t have more at this time, I have been slammed with other issues at work.)

    Here’s the latest update.  I’m pretty sure this issue is the add-on Intel I350-T2 Server NIC in PCIe Slot 7.  Changing the BIOS settings for Slot 7 on the motherboard to Gen 2 (5 GT/s), if anything, appears to have had a slightly negative effect, although pretty negligible.

    So I looked into some NIC Performance tuning for FreeBSD, most of what I found was for 10GbE, but I was hoping it may help.  Here’s a summary of the settings I tried (I’m not listing every combination I tested, but suffice it to say, most, if not all of these changes actually made things worse by an additional 5-7% or so):

    /boot/loader.conf
    kern.ipc.nmbclusters="1000000"

    System Tunables

    set to at least 16MB for 10GE hosts

    kern.ipc.maxsockbuf=16777216

    set autotuning maximum to at least 16MB too

    net.inet.tcp.sendbuf_max=16777216
    net.inet.tcp.recvbuf_max=16777216

    enable send/recv autotuning

    net.inet.tcp.sendbuf_auto=1
    net.inet.tcp.recvbuf_auto=1

    increase autotuning step size

    net.inet.tcp.sendbuf_inc=16384
    net.inet.tcp.recvbuf_inc=524288

    As I mentioned, I’m slammed at work right now, but as soon as I get a chance, my next, and I believe final step, is to do yet another fresh, unaltered install and test splitting the WAN/LAN links between the onboard I210, the onboard I350-AM4, and the add-on I350-T2.  All signs point to the add-on Intel I350-T2.  Now whether it’s the design limitations of NIC itself, a faulty NIC, or the FreeBSD driver, I’m not sure how to go about verifying that part.  I have to check and see if I have any additional 2-Port Server NICs here I can test with.

    • tman222, thank you very much for your input, it’s nice to know I picked the right hardware!  Are the NIC params you mentioned the same as the ones I tried (listed above)?

    • NollipfSense, I don’t have an ID for the site you referenced, but I would be more than happy to share anything you think may help!

    Cheers,
    Clint



  • Hi Clint,

    Here are some helpful threads and resources on network tuning to get your started.  There are some other parameters you can tweak as well beyond the ones that you have already adjusted:

    https://forum.pfsense.org/index.php?topic=113496.0
    https://forum.pfsense.org/index.php?topic=132345

    https://calomel.org/freebsd_network_tuning.html

    For further troubleshooting, please also see the "Where is the bottleneck ?" section here:
    https://bsdrp.net/documentation/technical_docs/performance

    Hope this helps - please let us know if you have more questions.


 

© Copyright 2002 - 2018 Rubicon Communications, LLC | Privacy Policy