• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

High CPU load on single CPU core

Scheduled Pinned Locked Moved General pfSense Questions
13 Posts 6 Posters 1.8k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Y
    yswery
    last edited by yswery Aug 16, 2022, 11:25 AM Aug 16, 2022, 10:41 AM

    Hi All

    We have been seeing a very weird issue on our pfsense box where we see WAN latency (1Gbit WAN) going up from 0.7ms to over 100ms at times. While this occurs I noticed that there is one single CPU core maxed out throughout.

    In htop no command shows up that is maxing out the CPU, but when looking at SYSTEM ACTIVITY in the pfsense UI we can see this:

      PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
        0 root        -76    -     0B  1184K CPU1     1 868:11  99.76% [kernel{if_io_tqg_1}]
    

    Which matches with our maxed out CPU core:

    last pid: 39007;  load averages:  2.01,  1.96,  1.99                                                                                                                                up 0+16:58:19  22:36:57
    67 processes:  3 running, 64 sleeping
    CPU 0:  8.6% user,  0.0% nice,  1.2% system,  0.0% interrupt, 90.2% idle
    CPU 1:  0.0% user,  0.0% nice,  100% system,  0.0% interrupt,  0.0% idle
    CPU 2:  6.9% user,  0.0% nice,  1.2% system,  0.0% interrupt, 91.9% idle
    CPU 3:  0.0% user,  0.0% nice,  0.8% system,  6.3% interrupt, 93.0% idle
    CPU 4:  0.4% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.6% idle
    CPU 5:  0.4% user,  0.0% nice,  0.8% system,  0.0% interrupt, 98.8% idle
    CPU 6:  2.0% user,  0.0% nice,  2.0% system,  0.0% interrupt, 96.1% idle
    CPU 7:  0.0% user,  0.0% nice, 22.7% system,  0.0% interrupt, 77.3% idle
    

    Does anyone know what if_io_tqg_1 is and what we might need to do to further diagnose whats going on?

    PfSense Specs:

    CPU: Intel(R) Atom(TM) CPU C3758 @ 2.20GHz
    Ram: 32GB
    NICs: Ethernet Connection X553 1GbE
    Wan uplink: 1Gbit
    Approx traffic via wan: 200Mbit
    
    1 Reply Last reply Reply Quote 0
    • B
      bmeeks
      last edited by Aug 16, 2022, 12:17 PM

      @yswery said in High CPU load on single CPU core:

      Does anyone know what if_io_tqg_1 is and what we might need to do to further diagnose whats going on?

      That represents network queue handlers. Found some related posts for you, but no real solutions.

      https://forum.netgate.com/topic/173523/what-is-kernel-if_io_tqq_x

      https://forums.freebsd.org/threads/what-is-kernel-if_io_tqg-100-load-of-core.70642/

      1 Reply Last reply Reply Quote 0
      • S
        stephenw10 Netgate Administrator
        last edited by Aug 16, 2022, 4:47 PM

        Mmm, that is where the load from pf itself appears. 100% of one CPU core on a C3758 is a lot for 200Mbps though. And pf loading would normally be spread across queues/cores unless the NICs are being deliberately limited to one queue.

        Steve

        Y 1 Reply Last reply Aug 16, 2022, 6:19 PM Reply Quote 0
        • Y
          yswery @stephenw10
          last edited by Aug 16, 2022, 6:19 PM

          @stephenw10 said in High CPU load on single CPU core:

          Mmm, that is where the load from pf itself appears. 100% of one CPU core on a C3758 is a lot for 200Mbps though. And pf loading would normally be spread across queues/cores unless the NICs are being deliberately limited to one queue.

          Steve

          Do you have any idea or hint where I might be able to see if there is any (accidental?) setting to use only one core per NIC?

          While this occurs (which is more and more frequently in the past 2 weeks for us) we are seeing these spikes in latency and packet loss to our (directly connected) upstream.

          Screen Shot 2022-08-16 at 7.04.26 PM (2).png When this issue isnt occurring we usually see under 1ms latency

          Is there a way to see what type of traffic is trigging this CPU use?
          The reason why I think its a "certain type of traffic" is because we sometimes see 200Mbps (or more if we run speedtests) without any issues at all

          Here is an example of (external) monitoring 7116f23e-1ff8-4ee0-ae22-5882a61887e4-image.png

          1 Reply Last reply Reply Quote 0
          • S
            stephenw10 Netgate Administrator
            last edited by Aug 16, 2022, 7:40 PM

            What pfSense version are you running?

            Easiest way to check the NIC queues is usually the boot log. You should see something like:

            ix0: <Intel(R) X553 N (SFP+)> mem 0x80400000-0x805fffff,0x80604000-0x80607fff at device 0.0 on pci9
            ix0: Using 2048 TX descriptors and 2048 RX descriptors
            ix0: Using 4 RX queues 4 TX queues
            

            Steve

            1 Reply Last reply Reply Quote 0
            • Y
              yswery
              last edited by Aug 16, 2022, 9:46 PM

              @stephenw10 said in High CPU load on single CPU core:

              Easiest way to check the NIC queues is usually the boot log. You should see something like:

              Oh interesting, this is what I see:

              $ cat /var/log/dmesg.boot | grep ix3
              ix3: <Intel(R) X553 (1GbE)> mem 0xdd200000-0xdd3fffff,0xdd600000-0xdd603fff at device 0.1 on pci7
              ix3: Using 2048 TX descriptors and 2048 RX descriptors
              ix3: Using an MSI interrupt
              ix3: allocated for 1 queues
              ix3: allocated for 1 rx queues
              ix3: Ethernet address: ac:1f:6b:b1:d8:af
              ix3: eTrack 0x8000087c
              ix3: netmap queues/slots: TX 1/2048, RX 1/2048
              

              ix3 being the WAN interface, but all 4 ixN devices are all showing "allocated for 1 queues"

              This is pfsense 2.6 CE but this box is a few years old from v2.4 days and has been incrementally updated. (so might have some settings it now should not)

              1 Reply Last reply Reply Quote 0
              • S
                stephenw10 Netgate Administrator
                last edited by Aug 16, 2022, 9:55 PM

                Ah, maybe you have set queues to 1 in /boot/loader.conf.local?

                That was a common tweak back in the FreeBSD 10 (or was it 8?) era when multiqueue drivers could prove unstable.

                But those should be at least 4 queues.

                I would still have expected it to pass far more though even with single queue NICs. But that does explain why you are seeing the load on one core.

                Steve

                Y M 2 Replies Last reply Aug 16, 2022, 11:16 PM Reply Quote 2
                • Y
                  yswery @stephenw10
                  last edited by Aug 16, 2022, 11:16 PM

                  @stephenw10

                  So I just removed a bunch of older configs now this is what I see:

                  ix3: <Intel(R) X553 (1GbE)> mem 0xdd200000-0xdd3fffff,0xdd600000-0xdd603fff at device 0.1 on pci7
                  ix3: Using 2048 TX descriptors and 2048 RX descriptors
                  ix3: Using 8 RX queues 8 TX queues
                  ix3: Using MSI-X interrupts with 9 vectors
                  ix3: allocated for 8 queues
                  ix3: allocated for 8 rx queues
                  

                  And holy wow, things are running beautifully (for now)

                  What even more crazy, is when the network is under-utilised we WERE getting ~0.7ms to our transit provider, and now we're seeing 0.3ms stable (and staying at 0.3ms under our regular 200mbit load)

                  I might be counting the chickens before the hatch, but this change alone seems to have made a dramatic improvement (better than we have recorded in our historic smokepings for the past 2 years even)

                  Thanks pointing this out!

                  1 Reply Last reply Reply Quote 0
                  • S
                    stephenw10 Netgate Administrator
                    last edited by Aug 17, 2022, 2:03 PM

                    Nice. πŸ‘

                    1 Reply Last reply Reply Quote 0
                    • M
                      MoonKnight @stephenw10
                      last edited by MoonKnight Aug 18, 2022, 4:36 AM Aug 18, 2022, 4:26 AM

                      @stephenw10
                      Hi,
                      I was looking at my own loader.conf since i didn't have the loader.conf.local

                      Is this normal?

                      kern.cam.boot_delay=10000
                      kern.ipc.nmbclusters="1000000"
                      kern.ipc.nmbjumbop="524288"
                      kern.ipc.nmbjumbo9="524288"
                      opensolaris_load="YES"
                      zfs_load="YES"
                      kern.geom.label.gptid.enable="0"
                      kern.geom.label.disk_ident.enable="0"
                      kern.geom.label.disk_ident.enable="0"
                      kern.geom.label.gptid.enable="0"
                      opensolaris_load="YES"
                      zfs_load="YES"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      net.link.ifqmaxlen="128"
                      autoboot_delay="3"
                      hw.hn.vf_transparent="0"
                      hw.hn.use_if_start="1"
                      net.link.ifqmaxlen="128"
                      

                      Why so many "net.link.ifqmaxlen="128"
                      And some other double lines too

                      --- 24.11 ---
                      Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz
                      Kingston DDR4 2666MHz 16GB ECC
                      2 x HyperX Fury SSD 120GB (ZFS-mirror)
                      2 x Intel i210 (ports)
                      4 x Intel i350 (ports)

                      V F 2 Replies Last reply Aug 18, 2022, 9:29 AM Reply Quote 0
                      • V
                        Vollans @MoonKnight
                        last edited by Aug 18, 2022, 9:29 AM

                        @moonknight I get those double lines as well. It’s really weird!

                        1 Reply Last reply Reply Quote 0
                        • F
                          fireodo @MoonKnight
                          last edited by Aug 18, 2022, 9:46 AM

                          @moonknight said in High CPU load on single CPU core:

                          Why so many "net.link.ifqmaxlen="128"

                          Your machine is stuttering ... 😁 (joke)

                          Kettop Mi4300YL CPU: i5-4300Y @ 1.60GHz RAM: 8GB Ethernet Ports: 4
                          SSD: SanDisk pSSD-S2 16GB (ZFS) WiFi: WLE200NX
                          pfsense 2.7.2 CE
                          Packages: Apcupsd Cron Iftop Iperf LCDproc Nmap pfBlockerNG RRD_Summary Shellcmd Snort Speedtest System_Patches.

                          1 Reply Last reply Reply Quote 1
                          • S
                            stephenw10 Netgate Administrator
                            last edited by Aug 18, 2022, 11:27 AM

                            It's a known issue but it's only cosmetic. The duplicate entries don't hurt anything.

                            Steve

                            1 Reply Last reply Reply Quote 3
                            13 out of 13
                            • First post
                              13/13
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                              This community forum collects and processes your personal information.
                              consent.not_received