Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    How pfSense utilize multicore processors and multi-CPU systems ?

    General pfSense Questions
    hardware multi core multi cpu setup tuning
    6
    23
    19.5k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Sergei_ShablovskyS
      Sergei_Shablovsky
      last edited by Sergei_Shablovsky

      Hi, pfSense Gurus!

      Looking on perspective of upgrading to multi-CPU systems we have 2 main question:

      1. How pfSense utilize multicore processors in ONE CPU systems ?
      2. How pfSense utilize multicore processors in multi-CPU systems ?

      UPDATE - Feb 2021

      Hm. Looks like hard to find right answer...
      I need a little bit to explain the topic start question:

      What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...):
      a) 1 CPU with 4-10 cores, hi-frequency
      b) 2-4 CPU with 4-6 cores, mid-frequency

      And how the cache in CPU L2 (2-56Mb) and L3 (2-57Mb) impact on network-related operation (in cooperation with NIC card) ?

      In general the situation looks like this: due the pfSense are based on FreeBSD, the answer may be somewhere between pfSense threads ability to effectively utilize several CPU cores (the only way for app to utilize more than one core is to execute more than one thread) and FreeBSD kernel drivers ability to utilize several CPUs (kernel is responsible for binding threads to cores.).

      As for app (in our case this is pfSense) the answer are in using cpuset(2) function.

      I try to searching in archives but unsuccessfully, or information was 5-10 years old.

      Some peoples write that would be a big waste of CPU power unless you plan on terminating a few thousand IPSEC VPN. But I agree with one actor " It's massive overkill or not.. the problem is which one has multi-core support?"

      So, the question start from is pfSense a strictly single-threaded?

      As I know pfSense are up layer on pf base, and pf - up layer on FreeBSD network system. And pfSense developers make a lot of great modification of original FreeBSD pf.

      So, what are Your answer on questions at the top of this message ?

      —
      CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
      Help Ukraine to resist, save civilians people’s lives !
      (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

      GertjanG Sergei_ShablovskyS 3 Replies Last reply Reply Quote 0
      • GertjanG
        Gertjan @Sergei_Shablovsky
        last edited by

        @Sergei_Shablovsky said in How pfSense utilize multicore processors and multi-CPU systems ?:

        So, the question start from is pfSense a strictly single-threaded?

        The underlying FreeBSD is multicore and multithreading.
        As are most FreeBSD applications and tools used by pfSense.
        pfSense is a web interface that enables you to manipulate all the settings using a GUI, not a command line. Basically, it's a web interface and a lot of PHP script file ( I over-simplify )

        The thing is : are your administrate this devices, or are you allowing hundreds or thousands of admins to do so ? ^^

        See here https://www.netgate.com/products/appliances/ so you can see what Netgate itself uses for it's devices.

        Example :
        Intel(R) Pentium(R) 4 CPU 3.20GHz
        2 CPUs: 1 package(s) x 2 hardware threads
        AES-NI CPU Crypto: No
        ancient "PC device" (15 years old) handles Gbit connections easily.

        Btw : Netgate (pfSense) doesn't modify the original FreeBSD source a lot. It would be far to much work to bring out newer versions. I guess there will be some patches.

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        1 Reply Last reply Reply Quote 1
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          pfSense is not single threaded. pf is no longer single threaded so there are certainly advantages to use multiple CPU cores.
          Some things are still single threaded. OpenVPN and PPPoE are two we most commonly see. Some NIC drivers cannot use more than one queue but most now do.
          There's no significant difference between multiple cpus and multiple cores in a single CPU as far as I know.

          Steve

          Sergei_ShablovskyS 2 Replies Last reply Reply Quote 0
          • Sergei_ShablovskyS
            Sergei_Shablovsky @stephenw10
            last edited by

            @stephenw10 said in How pfSense utilize multicore processors and multi-CPU systems ?:

            Some NIC drivers cannot use more than one queue but most now do.

            Where am I able to see a list of NICs that able to using multitreads on FreeBSD ?

            —
            CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
            Help Ukraine to resist, save civilians people’s lives !
            (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

            NogBadTheBadN 1 Reply Last reply Reply Quote 0
            • GertjanG
              Gertjan
              last edited by

              Well ....
              Stay away from Realtek
              Prefer Intel
              and your good.

              No "help me" PM's please. Use the forum, the community will thank you.
              Edit : and where are the logs ??

              Sergei_ShablovskyS 1 Reply Last reply Reply Quote 0
              • Sergei_ShablovskyS
                Sergei_Shablovsky @Gertjan
                last edited by

                @Gertjan said in How pfSense utilize multicore processors and multi-CPU systems ?:

                Well ....
                Stay away from Realtek
                Prefer Intel
                and your good.

                Thank You for advise!

                Are You sure about intel ? Because even in pfSense official doc i able to see from all NICs troubleshootings at least 2 issues linked to Broadcom and 2 issues linked to Intel. No other NICs.
                From statistical point of view this may be not good result.
                Also search on this forum also give a point that many issues linked to Intel. Of course may be a lot of users prefer to using Intel NICs, and some of them have an issues...

                —
                CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                Help Ukraine to resist, save civilians people’s lives !
                (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                stephenw10S 1 Reply Last reply Reply Quote 0
                • Sergei_ShablovskyS
                  Sergei_Shablovsky @stephenw10
                  last edited by

                  @stephenw10 said in How pfSense utilize multicore processors and multi-CPU systems ?:

                  pfSense is not single threaded. pf is no longer single threaded so there are certainly advantages to use multiple CPU cores.
                  Some things are still single threaded. OpenVPN and PPPoE are two we most commonly see. Some NIC drivers cannot use more than one queue but most now do.
                  There's no significant difference between multiple cpus and multiple cores in a single CPU as far as I know.

                  You mean "no significant difference" from FreeBSD CPU-related kernel drivers that care about apps threads?

                  —
                  CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                  Help Ukraine to resist, save civilians people’s lives !
                  (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                  1 Reply Last reply Reply Quote 0
                  • NogBadTheBadN
                    NogBadTheBad @Sergei_Shablovsky
                    last edited by

                    @Sergei_Shablovsky said in How pfSense utilize multicore processors and multi-CPU systems ?:

                    @stephenw10 said in How pfSense utilize multicore processors and multi-CPU systems ?:

                    Some NIC drivers cannot use more than one queue but most now do.

                    Where am I able to see a list of NICs that able to using multitreads on FreeBSD ?

                    The FreeBSD web pages would be a good place to start.

                    A lot of the drivers are provided by the chip manufacturers igb for example is written by Intel.

                    https://www.freebsd.org/cgi/man.cgi?query=igb&sektion=4&manpath=freebsd-release-ports

                    https://www.freebsd.org/releases/11.2R/hardware.html#ethernet

                    Andy

                    1 x Netgate SG-4860 - 3 x Linksys LGS308P - 1 x Aruba InstantOn AP22

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator @Sergei_Shablovsky
                      last edited by

                      @Sergei_Shablovsky said in How pfSense utilize multicore processors and multi-CPU systems ?:

                      Are You sure about intel ?

                      Very sure. Use Intel based NICs if you want the least likelihood of seeing issues.

                      Steve

                      Sergei_ShablovskyS 1 Reply Last reply Reply Quote 0
                      • Sergei_ShablovskyS
                        Sergei_Shablovsky @stephenw10
                        last edited by

                        @stephenw10 said in How pfSense utilize multicore processors and multi-CPU systems ?:

                        Steve

                        Appreciate Your help, Steve! :)

                        —
                        CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                        Help Ukraine to resist, save civilians people’s lives !
                        (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                        1 Reply Last reply Reply Quote 0
                        • Sergei_ShablovskyS
                          Sergei_Shablovsky
                          last edited by Sergei_Shablovsky

                          After FreeBSD coming and pfSense have a several major updates, time to return to this question.

                          In FreeBSD 9-11 separate process was creating for each card
                          (for example for Intels cards with 2Eth)
                          intr{irq273: igb1:que}
                          intr{irq292: igb3:que}
                          ...and so on...

                          And because FreeBSD (BTW for a long time!) not able to paralleling PPPOE traffic for several threads, in FreeBSD 9-11 this processes going to several cores by using cpuset. This working nod bad until FreeBSD 12 come in.

                          Now on FreeBSD 12 all processes are together
                          kernel{if_io_tqg_0}
                          kernel{if_io_tqg_1}
                          kernel{if_io_tqg_2}
                          kernel{if_io_tqg_3}
                          ....and so on....

                          And looks like no ability to assign each card to separate core.

                          As a result we have first core 75-80% loaded in middle, and up to 100% - at peak traffic loading.

                          Some people’s suggest tuning the iflib settings (sometime in conjunction with switching OFF hyper threading)

                          In loader:
                          net.isr.maxthreads="1024" # Use at most this many CPUs for netisr processing
                          net.isr.bindthreads="1" # Bind netisr threads to CPUs.
                          In sysctl:
                          net.isr.dispatch=deferred # direct / hybrid / deffered // Interrupt handling via multiple CPU, but with context switc

                          Or
                          dev.igb.0.iflib.tx_abdicate=1
                          dev.igb.0.iflib.separate_txrx=1

                          So the question is still: how to effectively manage loading on multi-core multi-CPU systems?

                          Especially when problems with powerd and est drivers for ALL Intel CPU still exist (look of this thread about SpeedStep & TurboBoost work together in FreeBSD https://forum.netgate.com/topic/112201/issue-with-intel-speedstep-settings)

                          —
                          CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                          Help Ukraine to resist, save civilians people’s lives !
                          (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                          1 Reply Last reply Reply Quote 1
                          • Sergei_ShablovskyS
                            Sergei_Shablovsky
                            last edited by

                            Also this post about FreeBSD optimization and tuning for networking for Yours attention https://calomel.org/freebsd_network_tuning.html

                            —
                            CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                            Help Ukraine to resist, save civilians people’s lives !
                            (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                            1 Reply Last reply Reply Quote 0
                            • Sergei_ShablovskyS
                              Sergei_Shablovsky
                              last edited by Sergei_Shablovsky

                              And another one example how to manually dispatching processes to certain CPUs cores:

                              for l in `cat ${basedir}/ix_cpu_16core_2nic`; do
                                  if [ -n "$l" ]; then
                                      iface=`echo $l | cut -f 1 -d ":"`
                                      queue=`echo $l | cut -f 2 -d ":"`
                                      cpu=`echo $l | cut -f 3 -d ":"`
                                      irq=`vmstat -i | grep "${iface}:q${queue}" | cut -f 1 -d ":" | sed "s/irq//g"`
                                      echo "Binding ${iface} queue #${queue} (irq ${irq}) -> CPU${cpu}"
                                      cpuset -l $cpu -x $irq
                                  fi
                              done
                               
                              ix0:0:1
                              ix0:1:2
                              ix0:2:3
                              ix0:3:4
                              ix0:4:5
                              ix0:5:6
                              ix0:6:7
                              ix0:7:8
                              ix1:0:9
                              ix1:1:10
                              ix1:2:11
                              ix1:3:12
                              ix1:4:13
                              ix1:5:14
                              ix1:6:15
                              ix1:7:16
                              

                              Totally 8 interrupts: 1 interrupt to 1 CPU core, 0 for dummynet

                              manual interrupts on cpu cores

                              From here (You need a Google translator): https://local.com.ua/forum/topic/117570-freebsd-gateway-10g/

                              —
                              CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                              Help Ukraine to resist, save civilians people’s lives !
                              (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                              1 Reply Last reply Reply Quote 0
                              • Sergei_ShablovskyS
                                Sergei_Shablovsky
                                last edited by Sergei_Shablovsky

                                And at last another one interesting thread about Binding igb(4) IRQs and dummynet to CPUs https://dadv.livejournal.com/139366.html (Use a translate.google.com to read)

                                Shortly to say, because igb(4) driver queues linking algorithm (when first queue created, they linking to fires core - CPU0, and doing the same for each card) , PPPoE/GRE traffic in first queue on each of Intel cards linked strongly to CPU0, and because this traffic are high load -> CPU0 become quickly overloaded -> packets are more holding in NIC buffers -> we have dramatically latency increasing

                                Another interesting thing are the FreeBSD behavior of system thread that service dummynet: manually linking dummynet to CPU0 decrease core loading from 80% to 0,1%

                                Article worth to read.

                                —
                                CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                                Help Ukraine to resist, save civilians people’s lives !
                                (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                                1 Reply Last reply Reply Quote 0
                                • Sergei_ShablovskyS
                                  Sergei_Shablovsky
                                  last edited by Sergei_Shablovsky

                                  Need to note that mostly all this links are about high-load PPPoE/PPtP/GRE traffic with 90% of traffic are ~600 bytes size.

                                  Interesting to read detailed comments from pfSense developers side, even we speak about Netgate-branded hardware (SuperMicro motherboard and case, yes?) because Intel CPUs are the same, FreeBSD are the same and all drivers are the same for Your own bare metal and Netgate hardware.

                                  And in near future we see only frequency increasing, numbers of cores increasing, and energy consuming decrease. Se the proper using multi cores CPUs in case of specialized solution like “network packet grinder” pfSense still actual.

                                  —
                                  CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                                  Help Ukraine to resist, save civilians people’s lives !
                                  (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    What sort of increase in throughput do you see by applying that?

                                    Were you seeing very uneven CPU core loading before applying it?

                                    PPPoE is a special case in FreeBSD/pfSense. Only one Rx queue on a NIC will ever be used so only one core.

                                    Steve

                                    1 Reply Last reply Reply Quote 0
                                    • Sergei_ShablovskyS
                                      Sergei_Shablovsky
                                      last edited by Sergei_Shablovsky

                                      Need to note that I understand that dumminet was written by Luigi Rizzo as system shaper for imitating environment of a low-quality channels (with big latency, packet drops, etc.), that exist more in 2008-2010.
                                      For nowadays ALTQ and NetGraph working better on fast 1-10-100G links.

                                      I not speak especially about dummynet, or PPPoE/GRE but more about how to effectively loading multi-CPUs systems. Because in case firewalls, system with 2-4 Intel CPUs (E or X server series) and independent RAM banks on each CPU ARE MORE EFFECTIVE THAN system based on 1 CPU, but hi-frequency.

                                      Effectively because, this mean ability to “fine tuning” the pfSense (FreeBSD) to professional cases (for example):

                                      • in small ISP where exist hi-loading by PPPoE/GRE traffic;
                                      • in middle companies networking with a lot of traffic with small packets (~500~800 bytes);
                                      • in broadcasting services/platforms oriented on mobile clients (with a lot of reconnections and small packets size);
                                      • ...

                                      The initial question in this thread mean
                                      1. How processes and FreeBSD services in pfSense bundle utilize the cores and memory in multi-CPUs systems? Which behavior ?
                                      2. When I understand each process / system service behavior, I able easy tuning pfSense in each usecase to achieve MORE BANDWITH, LESS LATENCY with not spending another $2-3k on a new server + NICs.

                                      From my point of view this is reasonable in nowadays when each company try to cutting costs on a tight budget due economic situation from one side and online services needs rapidly increasing (due COVID-19) from the other side.

                                      —
                                      CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                                      Help Ukraine to resist, save civilians people’s lives !
                                      (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                                      1 Reply Last reply Reply Quote 0
                                      • Sergei_ShablovskyS
                                        Sergei_Shablovsky @Sergei_Shablovsky
                                        last edited by

                                        @sergei_shablovsky

                                        Hm. Looks like hard to find right answer...

                                        I need a little bit to explain the topic start question:

                                        What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...):
                                        a) 1 CPU with 4-10 cores, hi-frequency
                                        b) 2-4 CPU with 4-6 cores, mid-frequency

                                        And how the cache in CPU L2 (2-56Mb) and L3 (2-57Mb) impact on network-related operation (in cooperation with NIC card) ?

                                        —
                                        CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                                        Help Ukraine to resist, save civilians people’s lives !
                                        (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                                        1 Reply Last reply Reply Quote 0
                                        • Sergei_ShablovskyS
                                          Sergei_Shablovsky @Sergei_Shablovsky
                                          last edited by

                                          Is anything changes in this after FreeBSD 13-based pfSense rolled out? Better CPU using? More cores better than CPU frequency? Etc...

                                          —
                                          CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                                          Help Ukraine to resist, save civilians people’s lives !
                                          (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                                          AndyRHA 1 Reply Last reply Reply Quote 0
                                          • AndyRHA
                                            AndyRH @Sergei_Shablovsky
                                            last edited by

                                            @sergei_shablovsky Your question from 2/2021 is slightly flawed. The CPU package count is not relevant. Cores (threads) and frequency are relevant.
                                            For tasks that are single threaded frequency is what you want, for tasks that are multi threaded you want enough threads to allow the concurrency you need. The result is a balance based on your goals. If single threaded tasks are your number one concern, you will lean to frequency at the expense of cores. However if you have several packages and many NICs, you will lean to core count at the expense of frequency because you will have many threads needing to execute at the same time and it is more efficient for the computer to have many threads vs having to share.

                                            I hope that helps.

                                            o||||o
                                            7100-1u

                                            1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.