• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

How pfSense utilize multicore processors and multi-CPU systems ?

General pfSense Questions
hardware multi core multi cpu setup tuning
6
23
19.5k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S
    Sergei_Shablovsky
    last edited by Sergei_Shablovsky Feb 11, 2021, 4:10 AM Jan 16, 2020, 7:42 AM

    Hi, pfSense Gurus!

    Looking on perspective of upgrading to multi-CPU systems we have 2 main question:

    1. How pfSense utilize multicore processors in ONE CPU systems ?
    2. How pfSense utilize multicore processors in multi-CPU systems ?

    UPDATE - Feb 2021

    Hm. Looks like hard to find right answer...
    I need a little bit to explain the topic start question:

    What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...):
    a) 1 CPU with 4-10 cores, hi-frequency
    b) 2-4 CPU with 4-6 cores, mid-frequency

    And how the cache in CPU L2 (2-56Mb) and L3 (2-57Mb) impact on network-related operation (in cooperation with NIC card) ?

    In general the situation looks like this: due the pfSense are based on FreeBSD, the answer may be somewhere between pfSense threads ability to effectively utilize several CPU cores (the only way for app to utilize more than one core is to execute more than one thread) and FreeBSD kernel drivers ability to utilize several CPUs (kernel is responsible for binding threads to cores.).

    As for app (in our case this is pfSense) the answer are in using cpuset(2) function.

    I try to searching in archives but unsuccessfully, or information was 5-10 years old.

    Some peoples write that would be a big waste of CPU power unless you plan on terminating a few thousand IPSEC VPN. But I agree with one actor " It's massive overkill or not.. the problem is which one has multi-core support?"

    So, the question start from is pfSense a strictly single-threaded?

    As I know pfSense are up layer on pf base, and pf - up layer on FreeBSD network system. And pfSense developers make a lot of great modification of original FreeBSD pf.

    So, what are Your answer on questions at the top of this message ?

    —
    CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
    Help Ukraine to resist, save civilians people’s lives !
    (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

    G S 3 Replies Last reply Jan 16, 2020, 10:43 AM Reply Quote 0
    • G
      Gertjan @Sergei_Shablovsky
      last edited by Jan 16, 2020, 10:43 AM

      @Sergei_Shablovsky said in How pfSense utilize multicore processors and multi-CPU systems ?:

      So, the question start from is pfSense a strictly single-threaded?

      The underlying FreeBSD is multicore and multithreading.
      As are most FreeBSD applications and tools used by pfSense.
      pfSense is a web interface that enables you to manipulate all the settings using a GUI, not a command line. Basically, it's a web interface and a lot of PHP script file ( I over-simplify )

      The thing is : are your administrate this devices, or are you allowing hundreds or thousands of admins to do so ? ^^

      See here https://www.netgate.com/products/appliances/ so you can see what Netgate itself uses for it's devices.

      Example :
      Intel(R) Pentium(R) 4 CPU 3.20GHz
      2 CPUs: 1 package(s) x 2 hardware threads
      AES-NI CPU Crypto: No
      ancient "PC device" (15 years old) handles Gbit connections easily.

      Btw : Netgate (pfSense) doesn't modify the original FreeBSD source a lot. It would be far to much work to bring out newer versions. I guess there will be some patches.

      No "help me" PM's please. Use the forum, the community will thank you.
      Edit : and where are the logs ??

      1 Reply Last reply Reply Quote 1
      • S
        stephenw10 Netgate Administrator
        last edited by Jan 16, 2020, 6:02 PM

        pfSense is not single threaded. pf is no longer single threaded so there are certainly advantages to use multiple CPU cores.
        Some things are still single threaded. OpenVPN and PPPoE are two we most commonly see. Some NIC drivers cannot use more than one queue but most now do.
        There's no significant difference between multiple cpus and multiple cores in a single CPU as far as I know.

        Steve

        S 2 Replies Last reply Jan 16, 2020, 11:11 PM Reply Quote 0
        • S
          Sergei_Shablovsky @stephenw10
          last edited by Jan 16, 2020, 11:11 PM

          @stephenw10 said in How pfSense utilize multicore processors and multi-CPU systems ?:

          Some NIC drivers cannot use more than one queue but most now do.

          Where am I able to see a list of NICs that able to using multitreads on FreeBSD ?

          —
          CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
          Help Ukraine to resist, save civilians people’s lives !
          (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

          N 1 Reply Last reply Jan 17, 2020, 1:26 PM Reply Quote 0
          • G
            Gertjan
            last edited by Jan 17, 2020, 12:42 AM

            Well ....
            Stay away from Realtek
            Prefer Intel
            and your good.

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            S 1 Reply Last reply Jan 17, 2020, 12:58 PM Reply Quote 0
            • S
              Sergei_Shablovsky @Gertjan
              last edited by Jan 17, 2020, 12:58 PM

              @Gertjan said in How pfSense utilize multicore processors and multi-CPU systems ?:

              Well ....
              Stay away from Realtek
              Prefer Intel
              and your good.

              Thank You for advise!

              Are You sure about intel ? Because even in pfSense official doc i able to see from all NICs troubleshootings at least 2 issues linked to Broadcom and 2 issues linked to Intel. No other NICs.
              From statistical point of view this may be not good result.
              Also search on this forum also give a point that many issues linked to Intel. Of course may be a lot of users prefer to using Intel NICs, and some of them have an issues...

              —
              CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
              Help Ukraine to resist, save civilians people’s lives !
              (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

              S 1 Reply Last reply Jan 17, 2020, 2:12 PM Reply Quote 0
              • S
                Sergei_Shablovsky @stephenw10
                last edited by Jan 17, 2020, 1:06 PM

                @stephenw10 said in How pfSense utilize multicore processors and multi-CPU systems ?:

                pfSense is not single threaded. pf is no longer single threaded so there are certainly advantages to use multiple CPU cores.
                Some things are still single threaded. OpenVPN and PPPoE are two we most commonly see. Some NIC drivers cannot use more than one queue but most now do.
                There's no significant difference between multiple cpus and multiple cores in a single CPU as far as I know.

                You mean "no significant difference" from FreeBSD CPU-related kernel drivers that care about apps threads?

                —
                CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                Help Ukraine to resist, save civilians people’s lives !
                (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                1 Reply Last reply Reply Quote 0
                • N
                  NogBadTheBad @Sergei_Shablovsky
                  last edited by Jan 17, 2020, 1:26 PM

                  @Sergei_Shablovsky said in How pfSense utilize multicore processors and multi-CPU systems ?:

                  @stephenw10 said in How pfSense utilize multicore processors and multi-CPU systems ?:

                  Some NIC drivers cannot use more than one queue but most now do.

                  Where am I able to see a list of NICs that able to using multitreads on FreeBSD ?

                  The FreeBSD web pages would be a good place to start.

                  A lot of the drivers are provided by the chip manufacturers igb for example is written by Intel.

                  https://www.freebsd.org/cgi/man.cgi?query=igb&sektion=4&manpath=freebsd-release-ports

                  https://www.freebsd.org/releases/11.2R/hardware.html#ethernet

                  Andy

                  1 x Netgate SG-4860 - 3 x Linksys LGS308P - 1 x Aruba InstantOn AP22

                  1 Reply Last reply Reply Quote 0
                  • S
                    stephenw10 Netgate Administrator @Sergei_Shablovsky
                    last edited by Jan 17, 2020, 2:12 PM

                    @Sergei_Shablovsky said in How pfSense utilize multicore processors and multi-CPU systems ?:

                    Are You sure about intel ?

                    Very sure. Use Intel based NICs if you want the least likelihood of seeing issues.

                    Steve

                    S 1 Reply Last reply Jan 17, 2020, 7:07 PM Reply Quote 0
                    • S
                      Sergei_Shablovsky @stephenw10
                      last edited by Jan 17, 2020, 7:07 PM

                      @stephenw10 said in How pfSense utilize multicore processors and multi-CPU systems ?:

                      Steve

                      Appreciate Your help, Steve! :)

                      —
                      CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                      Help Ukraine to resist, save civilians people’s lives !
                      (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                      1 Reply Last reply Reply Quote 0
                      • S
                        Sergei_Shablovsky
                        last edited by Sergei_Shablovsky Dec 16, 2020, 3:31 AM Dec 16, 2020, 3:20 AM

                        After FreeBSD coming and pfSense have a several major updates, time to return to this question.

                        In FreeBSD 9-11 separate process was creating for each card
                        (for example for Intels cards with 2Eth)
                        intr{irq273: igb1:que}
                        intr{irq292: igb3:que}
                        ...and so on...

                        And because FreeBSD (BTW for a long time!) not able to paralleling PPPOE traffic for several threads, in FreeBSD 9-11 this processes going to several cores by using cpuset. This working nod bad until FreeBSD 12 come in.

                        Now on FreeBSD 12 all processes are together
                        kernel{if_io_tqg_0}
                        kernel{if_io_tqg_1}
                        kernel{if_io_tqg_2}
                        kernel{if_io_tqg_3}
                        ....and so on....

                        And looks like no ability to assign each card to separate core.

                        As a result we have first core 75-80% loaded in middle, and up to 100% - at peak traffic loading.

                        Some people’s suggest tuning the iflib settings (sometime in conjunction with switching OFF hyper threading)

                        In loader:
                        net.isr.maxthreads="1024" # Use at most this many CPUs for netisr processing
                        net.isr.bindthreads="1" # Bind netisr threads to CPUs.
                        In sysctl:
                        net.isr.dispatch=deferred # direct / hybrid / deffered // Interrupt handling via multiple CPU, but with context switc

                        Or
                        dev.igb.0.iflib.tx_abdicate=1
                        dev.igb.0.iflib.separate_txrx=1

                        So the question is still: how to effectively manage loading on multi-core multi-CPU systems?

                        Especially when problems with powerd and est drivers for ALL Intel CPU still exist (look of this thread about SpeedStep & TurboBoost work together in FreeBSD https://forum.netgate.com/topic/112201/issue-with-intel-speedstep-settings)

                        —
                        CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                        Help Ukraine to resist, save civilians people’s lives !
                        (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                        1 Reply Last reply Reply Quote 1
                        • S
                          Sergei_Shablovsky
                          last edited by Dec 16, 2020, 4:01 AM

                          Also this post about FreeBSD optimization and tuning for networking for Yours attention https://calomel.org/freebsd_network_tuning.html

                          —
                          CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                          Help Ukraine to resist, save civilians people’s lives !
                          (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                          1 Reply Last reply Reply Quote 0
                          • S
                            Sergei_Shablovsky
                            last edited by Sergei_Shablovsky Dec 17, 2020, 4:14 AM Dec 17, 2020, 4:09 AM

                            And another one example how to manually dispatching processes to certain CPUs cores:

                            for l in `cat ${basedir}/ix_cpu_16core_2nic`; do
                                if [ -n "$l" ]; then
                                    iface=`echo $l | cut -f 1 -d ":"`
                                    queue=`echo $l | cut -f 2 -d ":"`
                                    cpu=`echo $l | cut -f 3 -d ":"`
                                    irq=`vmstat -i | grep "${iface}:q${queue}" | cut -f 1 -d ":" | sed "s/irq//g"`
                                    echo "Binding ${iface} queue #${queue} (irq ${irq}) -> CPU${cpu}"
                                    cpuset -l $cpu -x $irq
                                fi
                            done
                             
                            ix0:0:1
                            ix0:1:2
                            ix0:2:3
                            ix0:3:4
                            ix0:4:5
                            ix0:5:6
                            ix0:6:7
                            ix0:7:8
                            ix1:0:9
                            ix1:1:10
                            ix1:2:11
                            ix1:3:12
                            ix1:4:13
                            ix1:5:14
                            ix1:6:15
                            ix1:7:16
                            

                            Totally 8 interrupts: 1 interrupt to 1 CPU core, 0 for dummynet

                            manual interrupts on cpu cores

                            From here (You need a Google translator): https://local.com.ua/forum/topic/117570-freebsd-gateway-10g/

                            —
                            CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                            Help Ukraine to resist, save civilians people’s lives !
                            (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                            1 Reply Last reply Reply Quote 0
                            • S
                              Sergei_Shablovsky
                              last edited by Sergei_Shablovsky Dec 18, 2020, 3:19 PM Dec 17, 2020, 4:32 AM

                              And at last another one interesting thread about Binding igb(4) IRQs and dummynet to CPUs https://dadv.livejournal.com/139366.html (Use a translate.google.com to read)

                              Shortly to say, because igb(4) driver queues linking algorithm (when first queue created, they linking to fires core - CPU0, and doing the same for each card) , PPPoE/GRE traffic in first queue on each of Intel cards linked strongly to CPU0, and because this traffic are high load -> CPU0 become quickly overloaded -> packets are more holding in NIC buffers -> we have dramatically latency increasing

                              Another interesting thing are the FreeBSD behavior of system thread that service dummynet: manually linking dummynet to CPU0 decrease core loading from 80% to 0,1%

                              Article worth to read.

                              —
                              CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                              Help Ukraine to resist, save civilians people’s lives !
                              (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                              1 Reply Last reply Reply Quote 0
                              • S
                                Sergei_Shablovsky
                                last edited by Sergei_Shablovsky Dec 17, 2020, 9:03 AM Dec 17, 2020, 8:54 AM

                                Need to note that mostly all this links are about high-load PPPoE/PPtP/GRE traffic with 90% of traffic are ~600 bytes size.

                                Interesting to read detailed comments from pfSense developers side, even we speak about Netgate-branded hardware (SuperMicro motherboard and case, yes?) because Intel CPUs are the same, FreeBSD are the same and all drivers are the same for Your own bare metal and Netgate hardware.

                                And in near future we see only frequency increasing, numbers of cores increasing, and energy consuming decrease. Se the proper using multi cores CPUs in case of specialized solution like “network packet grinder” pfSense still actual.

                                —
                                CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                                Help Ukraine to resist, save civilians people’s lives !
                                (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                                1 Reply Last reply Reply Quote 0
                                • S
                                  stephenw10 Netgate Administrator
                                  last edited by Dec 17, 2020, 2:04 PM

                                  What sort of increase in throughput do you see by applying that?

                                  Were you seeing very uneven CPU core loading before applying it?

                                  PPPoE is a special case in FreeBSD/pfSense. Only one Rx queue on a NIC will ever be used so only one core.

                                  Steve

                                  1 Reply Last reply Reply Quote 0
                                  • S
                                    Sergei_Shablovsky
                                    last edited by Sergei_Shablovsky Dec 19, 2020, 7:13 AM Dec 19, 2020, 7:02 AM

                                    Need to note that I understand that dumminet was written by Luigi Rizzo as system shaper for imitating environment of a low-quality channels (with big latency, packet drops, etc.), that exist more in 2008-2010.
                                    For nowadays ALTQ and NetGraph working better on fast 1-10-100G links.

                                    I not speak especially about dummynet, or PPPoE/GRE but more about how to effectively loading multi-CPUs systems. Because in case firewalls, system with 2-4 Intel CPUs (E or X server series) and independent RAM banks on each CPU ARE MORE EFFECTIVE THAN system based on 1 CPU, but hi-frequency.

                                    Effectively because, this mean ability to “fine tuning” the pfSense (FreeBSD) to professional cases (for example):

                                    • in small ISP where exist hi-loading by PPPoE/GRE traffic;
                                    • in middle companies networking with a lot of traffic with small packets (~500~800 bytes);
                                    • in broadcasting services/platforms oriented on mobile clients (with a lot of reconnections and small packets size);
                                    • ...

                                    The initial question in this thread mean
                                    1. How processes and FreeBSD services in pfSense bundle utilize the cores and memory in multi-CPUs systems? Which behavior ?
                                    2. When I understand each process / system service behavior, I able easy tuning pfSense in each usecase to achieve MORE BANDWITH, LESS LATENCY with not spending another $2-3k on a new server + NICs.

                                    From my point of view this is reasonable in nowadays when each company try to cutting costs on a tight budget due economic situation from one side and online services needs rapidly increasing (due COVID-19) from the other side.

                                    —
                                    CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                                    Help Ukraine to resist, save civilians people’s lives !
                                    (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                                    1 Reply Last reply Reply Quote 0
                                    • S
                                      Sergei_Shablovsky @Sergei_Shablovsky
                                      last edited by Feb 11, 2021, 4:12 AM

                                      @sergei_shablovsky

                                      Hm. Looks like hard to find right answer...

                                      I need a little bit to explain the topic start question:

                                      What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...):
                                      a) 1 CPU with 4-10 cores, hi-frequency
                                      b) 2-4 CPU with 4-6 cores, mid-frequency

                                      And how the cache in CPU L2 (2-56Mb) and L3 (2-57Mb) impact on network-related operation (in cooperation with NIC card) ?

                                      —
                                      CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                                      Help Ukraine to resist, save civilians people’s lives !
                                      (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                                      1 Reply Last reply Reply Quote 0
                                      • S
                                        Sergei_Shablovsky @Sergei_Shablovsky
                                        last edited by Jun 29, 2022, 3:36 AM

                                        Is anything changes in this after FreeBSD 13-based pfSense rolled out? Better CPU using? More cores better than CPU frequency? Etc...

                                        —
                                        CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                                        Help Ukraine to resist, save civilians people’s lives !
                                        (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                                        AndyRHA 1 Reply Last reply Jun 29, 2022, 1:48 PM Reply Quote 0
                                        • AndyRHA
                                          AndyRH @Sergei_Shablovsky
                                          last edited by Jun 29, 2022, 1:48 PM

                                          @sergei_shablovsky Your question from 2/2021 is slightly flawed. The CPU package count is not relevant. Cores (threads) and frequency are relevant.
                                          For tasks that are single threaded frequency is what you want, for tasks that are multi threaded you want enough threads to allow the concurrency you need. The result is a balance based on your goals. If single threaded tasks are your number one concern, you will lean to frequency at the expense of cores. However if you have several packages and many NICs, you will lean to core count at the expense of frequency because you will have many threads needing to execute at the same time and it is more efficient for the computer to have many threads vs having to share.

                                          I hope that helps.

                                          o||||o
                                          7100-1u

                                          1 Reply Last reply Reply Quote 1
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.