Traffic drop and see that kernel{if_io_tqg is spiking

tsp

Suddenly we get high latency trough the pfsense. This is a Netgate 1541. You can see the icmp spike here:
64 bytes from X.X.X.X: icmp_seq=340 ttl=251 time=1.48 ms
64 bytes from X.X.X.X: icmp_seq=341 ttl=251 time=1.55 ms
64 bytes from X.X.X.X: icmp_seq=342 ttl=251 time=1.48 ms
64 bytes from X.X.X.X: icmp_seq=343 ttl=251 time=1.60 ms
64 bytes from X.X.X.X: icmp_seq=344 ttl=251 time=1.35 ms
64 bytes from X.X.X.X: icmp_seq=345 ttl=251 time=1.52 ms
64 bytes from X.X.X.X: icmp_seq=346 ttl=251 time=68.0 ms
64 bytes from X.X.X.X: icmp_seq=347 ttl=251 time=242 ms
64 bytes from X.X.X.X: icmp_seq=348 ttl=251 time=241 ms
64 bytes from X.X.X.X: icmp_seq=349 ttl=251 time=1.36 ms
64 bytes from X.X.X.X: icmp_seq=350 ttl=251 time=1.39 ms
64 bytes from X.X.X.X: icmp_seq=351 ttl=251 time=500 ms
64 bytes from X.X.X.X: icmp_seq=352 ttl=251 time=562 ms
64 bytes from X.X.X.X: icmp_seq=353 ttl=251 time=190 ms
64 bytes from X.X.X.X: icmp_seq=354 ttl=251 time=1.57 ms
64 bytes from X.X.X.X: icmp_seq=355 ttl=251 time=1.34 ms
64 bytes from X.X.X.X: icmp_seq=356 ttl=251 time=358 ms
64 bytes from X.X.X.X: icmp_seq=357 ttl=251 time=78.8 ms
64 bytes from X.X.X.X: icmp_seq=358 ttl=251 time=141 ms
64 bytes from X.X.X.X: icmp_seq=359 ttl=251 time=195 ms
64 bytes from X.X.X.X: icmp_seq=360 ttl=251 time=1.40 ms
64 bytes from X.X.X.X: icmp_seq=361 ttl=251 time=1.28 ms
64 bytes from X.X.X.X: icmp_seq=362 ttl=251 time=1.40 ms
64 bytes from X.X.X.X: icmp_seq=363 ttl=251 time=1.30 ms

I can see that his impact traffic going trough or too the firewall. When the latency spikes i can see that the {if_io_tqg use more CPU. There are not much traffic going trough this. Here is the CPU log:

last pid: 13671;  load averages:  1.61,  1.50,  1.43  up 628+01:24:49    17:40:38
378 threads:   18 running, 314 sleeping, 46 waiting
CPU:  0.5% user,  0.1% nice,  1.5% system,  0.1% interrupt, 97.8% idle
Mem: 33M Active, 606M Inact, 2063M Wired, 227M Buf, 28G Free
Swap: 4096M Total, 4096M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        155 ki31     0B   256K CPU1     1    ??? 100.00% [idle{idle: cpu1}]
   11 root        155 ki31     0B   256K CPU5     5    ??? 100.00% [idle{idle: cpu5}]
   11 root        155 ki31     0B   256K CPU9     9    ??? 100.00% [idle{idle: cpu9}]
   11 root        155 ki31     0B   256K CPU11   11    ??? 100.00% [idle{idle: cpu11}]
   11 root        155 ki31     0B   256K CPU13   13    ??? 100.00% [idle{idle: cpu13}]
   11 root        155 ki31     0B   256K CPU15   15    ???  99.17% [idle{idle: cpu15}]
   11 root        155 ki31     0B   256K CPU7     7    ???  98.39% [idle{idle: cpu7}]
   11 root        155 ki31     0B   256K CPU3     3    ???  98.19% [idle{idle: cpu3}]
   11 root        155 ki31     0B   256K RUN      4    ???  65.28% [idle{idle: cpu4}]
   11 root        155 ki31     0B   256K CPU14   14    ???  62.50% [idle{idle: cpu14}]
   11 root        155 ki31     0B   256K CPU2     2    ???  61.38% [idle{idle: cpu2}]
   11 root        155 ki31     0B   256K CPU10   10    ???  60.60% [idle{idle: cpu10}]
   11 root        155 ki31     0B   256K CPU8     8    ???  60.50% [idle{idle: cpu8}]
   11 root        155 ki31     0B   256K CPU0     0    ???  59.47% [idle{idle: cpu0}]
   11 root        155 ki31     0B   256K RUN      6    ???  59.08% [idle{idle: cpu6}]
   11 root        155 ki31     0B   256K CPU12   12    ???  55.27% [idle{idle: cpu12}]
    0 root        -76    -     0B  1568K -       12 375.5H  43.07% [kernel{if_io_tqg_12}]
    0 root        -76    -     0B  1568K -        6 419.4H  41.46% [kernel{if_io_tqg_6}]
    0 root        -76    -     0B  1568K -        0 358.8H  41.46% [kernel{if_io_tqg_0}]
    0 root        -76    -     0B  1568K CPU10   10 335.3H  40.87% [kernel{if_io_tqg_10}]
    0 root        -76    -     0B  1568K -        8 349.9H  39.99% [kernel{if_io_tqg_8}]
    0 root        -76    -     0B  1568K -       14 317.0H  39.45% [kernel{if_io_tqg_14}]
    0 root        -76    -     0B  1568K -        4 356.4H  38.67% [kernel{if_io_tqg_4}]
    0 root        -76    -     0B  1568K -        2 352.1H  37.16% [kernel{if_io_tqg_2}]
 6896 root         22    0    12M  3432K bpf     13 440.1H   3.17% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
38608 root         21    0    11M  2636K select   7 333.1H   2.29% /usr/sbin/syslogd -O rfc3164 -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /etc/syslog.conf
63615 root         22    0   137M    68M piperd  11   0:18   0.49% php-fpm: pool nginx (php-fpm){php-fpm}
   12 root        -72    -     0B   736K WAIT     6 101.1H   0.39% [intr{swi1: pfsync}]
    0 root        -16    -     0B  1568K swapin   9 8486.2   0.00% [kernel{swapper}]
    0 root        -92    -     0B  1568K -        0  35.9H   0.00% [kernel{dummynet}]
   32 root        -16    -     0B    16K pftm     6  28.1H   0.00% [pf purge]
24000 root         20    0    19M  8432K select   7  26.8H   0.00% /usr/local/sbin/openvpn --config /var/etc/openvpn/server1/config.ovpn{openvpn}
   12 root        -60    -     0B   736K WAIT    11 605:53   0.00% [intr{swi4: clock (0)+}]
   12 root        -72    -     0B   736K WAIT     4 387:17   0.00% [intr{swi1: netisr 14}]
 8780 root         20    0    11M  2172K select   7 316:36   0.00% /usr/sbin/powerd -b hadp -a hadp -n hadp
   33 root        -16    -     0B    16K -       12 278:08   0.00% [rand_harvestq]
    0 root        -76    -     0B  1568K -        2 231:42   0.00% [kernel{if_config_tqg_0}]
   12 root        -72    -     0B   736K WAIT    13 203:50   0.00% [intr{swi1: netisr 0}]
97179 root         20    0    79M    30M uwait    9 201:15   0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon}
97179 root         20    0    79M    30M uwait   11 201:06   0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon}
97179 root         20    0    79M    30M uwait    1 200:50   0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon}
97179 root         20    0    79M    30M uwait    0 200:41   0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon}
97179 root         20    0    79M    30M uwait   13 200:36   0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon}
97179 root         20    0    79M    30M uwait   13 200:22   0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon}
97179 root         20    0    79M    30M uwait   11 199:46   0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon}
97179 root         20    0    79M    30M uwait    5 197:10   0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon}
97179 root         20    0    79M    30M uwait   10 196:43   0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon}
97179 root         20    0    79M    30M uwait    2 196:37   0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon}
97179 root         20    0    79M    30M uwait    3 196:36   0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon}
97179 root         20    0    79M    30M uwait    5 196:34   0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon}
   14 root         -8    -     0B    48K -       11 157:36   0.00% [geom{g_up}]
97128 root         20    0    19M  6508K select  13  79:36   0.00% /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid{ntpd}
   34 root        -16    -     0B    48K psleep   4  55:36   0.00% [pagedaemon{dom0}]
   12 root        -88    -     0B   736K WAIT    11  52:15   0.00% [intr{irq337: ahci0}]
   41 root         -8    -     0B    16K m:w1     7  49:05   0.00% [g_mirror pfSenseMir]
97179 root         20    0    79M    30M uwait   13  42:35   0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon}
   14 root         -8    -     0B    48K -        1  41:38   0.00% [geom{g_down}]
64808 root         20    0    47M  5928K select   5  36:16   0.00% /usr/local/sbin/watchfrr -d -r /usr/local/etc/rc.d/frrbBrestartbB%s -s /usr/local/etc/rc.d/frrbBstartbB%s -k /usr/local/etc/rc.d/frrbBstopbB%s -b bB -t 30 zebra staticd bgpd
   36 root         20    -     0B   144K qsleep  14  36:01   0.00% [bufdaemon{bufdaemon}]
   36 root         20    -     0B   144K sdflus   9  35:00   0.00% [bufdaemon{/ worker}]
   38 root         16    -     0B    16K syncer   5  29:52   0.00% [syncer]
   12 root        -72    -     0B   736K WAIT     7  26:45   0.00% [intr{swi1: netisr 10}]
   26 root        -68    -     0B   240K -       15  24:43   0.00% [usb{usbus0}]

I can see the spike in latency on the gateway monitoring also.

I also see this error when I am trying to go to State Summary.

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /usr/local/www/diag_states_summary.php on line 41

Crash report begins. Anonymous machine information:

amd64
12.3-STABLE
FreeBSD 12.3-STABLE plus-RELENG_22_01-n202637-97cc5c23e13 pfSense

Crash report details:

PHP Errors:
[17-Aug-2024 17:58:12 Europe/Oslo] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /usr/local/www/diag_states_summary.php on line 53
[17-Aug-2024 17:58:20 Europe/Oslo] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /usr/local/www/diag_states_summary.php on line 41

No FreeBSD crash data found.

Have anyone seen this before?

stephenw10

@tsp said in Traffic drop and see that kernel{if_io_tqg is spiking:

There are not much traffic going trough this.

How much is not much? That looks like significant load from pushing traffic.

Those PHP errors are the process exhausting the allowed memory when trying to display the state table which implies you have a very large number of states. What does the dashboard show for state usage?

You are running pfSense Plus 22.01 which at this point is very old. You should upgrade as soon as possible.

Steve

tsp

@stephenw10 Total traffic now is 10Mbps. That is not much, I will check the table.

stephenw10

Mmm, that doesn't seem much. If the state table is huge perhaps you are seeing a DoS attack of some sort? Are you allowing inbound traffic?

michmoor

@tsp do you have any limiters configured?

tsp

@michmoor Hello, sorry for late reply. No limiter configured on this.