Preferred 'Intel QPI Bandwidth Priority' setting for pfSense



  • We’re looking at deploying pfSense in our production environment to support roughly 50Mbps of throughput and 20k packets per second with the aim of increasing this to 100Mbps and 40k packets per second in the future. We will use aliases for ports, hosts and networks to allow us to condense the firewall rules to roughly 50. We will be using of High Availability Sync, CARP, NAT and IPsec (upto 20 tunnels using Hardware encryption).

    The hardware we’ve spec’d includes 32GB RAM, 2x Intel X5650 CPUs and 2x 143GB 15.2K SAS HDs in mirror volume on PERC H700 controller.

    My question is, what would the preferred setting be for Intel QPI Bandwidth Priority, the options are either ‘Compute’ (default) or ‘I/O’ (optimised for I/O intensive workloads)?

    This is what we’ve found on the suppliers website:
    The Intel QPI Bandwidth priority has two settings, compute and I/O. It is set to “compute” by default.
    This option determines the number and priority of requests on the QPI bus. Recall that the QPI
    connects the processor sockets as well as the IOH. The “compute” setting favors computational traffic
    while the “I/O” setting is optimized for IO intensive workloads.

    The last thing we want is an I/O bottlenecks but after researching this on Intel’s website I believe the above ‘I/O’ setting is specifically aimed at prioritising traffic between the memory controller (RAM) and CPU, while ‘Compute’ prioritises computational traffic. Does anyone know what the preference should be for pfSense in this scenario?

    Thanks,
    Shaun



  • With dual socket being total overkill for 100Mb/s, wouldn't going single socket alleviate your QPI worries? I'm assuming you're talking about QPI between sockets and not to your busses because that would just be silly.



  • I hear IPsec traffic can be very processor intensive (we will make use of AES-NI on the CPU for hardware encryption but this can only help so much), roughly 20% of our traffic will be IPsec. We also need to support at least 10,000 endpoints, some traffic will be NAT'd but most traffic will be routed.

    We're currently using Cisco 7200VXRs with NPE-G1 Processor cards and VPN encryption modules but they're near capacity.

    The QPI connects to the processor sockets as well as the IOH (shared I/O controller). I believe setting this to 'I/O' gives traffic with the I/O controller priority, settings this to 'Compute' gives processor traffic priority.



  • 100mbit of ipsec with aes-gcm can be done on a quad-core atom CPU (maybe even a dualcore)

    @gonzopancho:

    Yes, that would be interesting, but I've not gotten to it yet.

    I am testing with iperf/iperf3 now, because things have gotten too fast for my "curl" test.

    I'm getting a consistent 310-315Mbps with iperf between to work, across my home 1Gbps link.
    (C2758 on one end, FW-7551 (C2358) on the other, both running today's snapshot.)

    Earlier, I setup a pair of 1U boxes each with a E3-1275 at 3.5ghz. Each has a 10G Intel X520 Dual port card.
    The endpoints for load generation are two Dell R200 servers running stock FreeBSD (10.0 on one, 10.1 on the other), each with a Chelsio card.

    Quick-n-dirty single stream (iperf3) test results:
    AES-GCM no AES-NI - ~125 Mbps
    AES-GCM with AES-NI - ~1.75 Gbps.
    AES-GCM with AES-NI and pf disabled, ~2.2 Gbps.
    AES-CBC runs at around 415-425 Mbps.

    straight-up iperf3 between the same hosts
    pf Enabled, single stream:  3.28 Gbps
    pf Enabled, 10 stream: 5.15 Gbps
    pf Disabled, single stream: 4.63 Gbps
    pf Disabled, 10 stream: 6.91 Gbps

    That's all for now.



  • That looks promising for IPsec traffic, from what you've said hypothetically if we wanted to get the best possible performance our bottleneck would most likely be I/O before processor traffic.