Traffic drop and see that kernel{if_io_tqg is spiking
-
Suddenly we get high latency trough the pfsense. This is a Netgate 1541. You can see the icmp spike here:
64 bytes from X.X.X.X: icmp_seq=340 ttl=251 time=1.48 ms
64 bytes from X.X.X.X: icmp_seq=341 ttl=251 time=1.55 ms
64 bytes from X.X.X.X: icmp_seq=342 ttl=251 time=1.48 ms
64 bytes from X.X.X.X: icmp_seq=343 ttl=251 time=1.60 ms
64 bytes from X.X.X.X: icmp_seq=344 ttl=251 time=1.35 ms
64 bytes from X.X.X.X: icmp_seq=345 ttl=251 time=1.52 ms
64 bytes from X.X.X.X: icmp_seq=346 ttl=251 time=68.0 ms
64 bytes from X.X.X.X: icmp_seq=347 ttl=251 time=242 ms
64 bytes from X.X.X.X: icmp_seq=348 ttl=251 time=241 ms
64 bytes from X.X.X.X: icmp_seq=349 ttl=251 time=1.36 ms
64 bytes from X.X.X.X: icmp_seq=350 ttl=251 time=1.39 ms
64 bytes from X.X.X.X: icmp_seq=351 ttl=251 time=500 ms
64 bytes from X.X.X.X: icmp_seq=352 ttl=251 time=562 ms
64 bytes from X.X.X.X: icmp_seq=353 ttl=251 time=190 ms
64 bytes from X.X.X.X: icmp_seq=354 ttl=251 time=1.57 ms
64 bytes from X.X.X.X: icmp_seq=355 ttl=251 time=1.34 ms
64 bytes from X.X.X.X: icmp_seq=356 ttl=251 time=358 ms
64 bytes from X.X.X.X: icmp_seq=357 ttl=251 time=78.8 ms
64 bytes from X.X.X.X: icmp_seq=358 ttl=251 time=141 ms
64 bytes from X.X.X.X: icmp_seq=359 ttl=251 time=195 ms
64 bytes from X.X.X.X: icmp_seq=360 ttl=251 time=1.40 ms
64 bytes from X.X.X.X: icmp_seq=361 ttl=251 time=1.28 ms
64 bytes from X.X.X.X: icmp_seq=362 ttl=251 time=1.40 ms
64 bytes from X.X.X.X: icmp_seq=363 ttl=251 time=1.30 msI can see that his impact traffic going trough or too the firewall. When the latency spikes i can see that the {if_io_tqg use more CPU. There are not much traffic going trough this. Here is the CPU log:
last pid: 13671; load averages: 1.61, 1.50, 1.43 up 628+01:24:49 17:40:38 378 threads: 18 running, 314 sleeping, 46 waiting CPU: 0.5% user, 0.1% nice, 1.5% system, 0.1% interrupt, 97.8% idle Mem: 33M Active, 606M Inact, 2063M Wired, 227M Buf, 28G Free Swap: 4096M Total, 4096M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 155 ki31 0B 256K CPU1 1 ??? 100.00% [idle{idle: cpu1}] 11 root 155 ki31 0B 256K CPU5 5 ??? 100.00% [idle{idle: cpu5}] 11 root 155 ki31 0B 256K CPU9 9 ??? 100.00% [idle{idle: cpu9}] 11 root 155 ki31 0B 256K CPU11 11 ??? 100.00% [idle{idle: cpu11}] 11 root 155 ki31 0B 256K CPU13 13 ??? 100.00% [idle{idle: cpu13}] 11 root 155 ki31 0B 256K CPU15 15 ??? 99.17% [idle{idle: cpu15}] 11 root 155 ki31 0B 256K CPU7 7 ??? 98.39% [idle{idle: cpu7}] 11 root 155 ki31 0B 256K CPU3 3 ??? 98.19% [idle{idle: cpu3}] 11 root 155 ki31 0B 256K RUN 4 ??? 65.28% [idle{idle: cpu4}] 11 root 155 ki31 0B 256K CPU14 14 ??? 62.50% [idle{idle: cpu14}] 11 root 155 ki31 0B 256K CPU2 2 ??? 61.38% [idle{idle: cpu2}] 11 root 155 ki31 0B 256K CPU10 10 ??? 60.60% [idle{idle: cpu10}] 11 root 155 ki31 0B 256K CPU8 8 ??? 60.50% [idle{idle: cpu8}] 11 root 155 ki31 0B 256K CPU0 0 ??? 59.47% [idle{idle: cpu0}] 11 root 155 ki31 0B 256K RUN 6 ??? 59.08% [idle{idle: cpu6}] 11 root 155 ki31 0B 256K CPU12 12 ??? 55.27% [idle{idle: cpu12}] 0 root -76 - 0B 1568K - 12 375.5H 43.07% [kernel{if_io_tqg_12}] 0 root -76 - 0B 1568K - 6 419.4H 41.46% [kernel{if_io_tqg_6}] 0 root -76 - 0B 1568K - 0 358.8H 41.46% [kernel{if_io_tqg_0}] 0 root -76 - 0B 1568K CPU10 10 335.3H 40.87% [kernel{if_io_tqg_10}] 0 root -76 - 0B 1568K - 8 349.9H 39.99% [kernel{if_io_tqg_8}] 0 root -76 - 0B 1568K - 14 317.0H 39.45% [kernel{if_io_tqg_14}] 0 root -76 - 0B 1568K - 4 356.4H 38.67% [kernel{if_io_tqg_4}] 0 root -76 - 0B 1568K - 2 352.1H 37.16% [kernel{if_io_tqg_2}] 6896 root 22 0 12M 3432K bpf 13 440.1H 3.17% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid 38608 root 21 0 11M 2636K select 7 333.1H 2.29% /usr/sbin/syslogd -O rfc3164 -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /etc/syslog.conf 63615 root 22 0 137M 68M piperd 11 0:18 0.49% php-fpm: pool nginx (php-fpm){php-fpm} 12 root -72 - 0B 736K WAIT 6 101.1H 0.39% [intr{swi1: pfsync}] 0 root -16 - 0B 1568K swapin 9 8486.2 0.00% [kernel{swapper}] 0 root -92 - 0B 1568K - 0 35.9H 0.00% [kernel{dummynet}] 32 root -16 - 0B 16K pftm 6 28.1H 0.00% [pf purge] 24000 root 20 0 19M 8432K select 7 26.8H 0.00% /usr/local/sbin/openvpn --config /var/etc/openvpn/server1/config.ovpn{openvpn} 12 root -60 - 0B 736K WAIT 11 605:53 0.00% [intr{swi4: clock (0)+}] 12 root -72 - 0B 736K WAIT 4 387:17 0.00% [intr{swi1: netisr 14}] 8780 root 20 0 11M 2172K select 7 316:36 0.00% /usr/sbin/powerd -b hadp -a hadp -n hadp 33 root -16 - 0B 16K - 12 278:08 0.00% [rand_harvestq] 0 root -76 - 0B 1568K - 2 231:42 0.00% [kernel{if_config_tqg_0}] 12 root -72 - 0B 736K WAIT 13 203:50 0.00% [intr{swi1: netisr 0}] 97179 root 20 0 79M 30M uwait 9 201:15 0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon} 97179 root 20 0 79M 30M uwait 11 201:06 0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon} 97179 root 20 0 79M 30M uwait 1 200:50 0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon} 97179 root 20 0 79M 30M uwait 0 200:41 0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon} 97179 root 20 0 79M 30M uwait 13 200:36 0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon} 97179 root 20 0 79M 30M uwait 13 200:22 0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon} 97179 root 20 0 79M 30M uwait 11 199:46 0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon} 97179 root 20 0 79M 30M uwait 5 197:10 0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon} 97179 root 20 0 79M 30M uwait 10 196:43 0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon} 97179 root 20 0 79M 30M uwait 2 196:37 0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon} 97179 root 20 0 79M 30M uwait 3 196:36 0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon} 97179 root 20 0 79M 30M uwait 5 196:34 0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon} 14 root -8 - 0B 48K - 11 157:36 0.00% [geom{g_up}] 97128 root 20 0 19M 6508K select 13 79:36 0.00% /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid{ntpd} 34 root -16 - 0B 48K psleep 4 55:36 0.00% [pagedaemon{dom0}] 12 root -88 - 0B 736K WAIT 11 52:15 0.00% [intr{irq337: ahci0}] 41 root -8 - 0B 16K m:w1 7 49:05 0.00% [g_mirror pfSenseMir] 97179 root 20 0 79M 30M uwait 13 42:35 0.00% /usr/local/libexec/ipsec/charon --use-syslog{charon} 14 root -8 - 0B 48K - 1 41:38 0.00% [geom{g_down}] 64808 root 20 0 47M 5928K select 5 36:16 0.00% /usr/local/sbin/watchfrr -d -r /usr/local/etc/rc.d/frrbBrestartbB%s -s /usr/local/etc/rc.d/frrbBstartbB%s -k /usr/local/etc/rc.d/frrbBstopbB%s -b bB -t 30 zebra staticd bgpd 36 root 20 - 0B 144K qsleep 14 36:01 0.00% [bufdaemon{bufdaemon}] 36 root 20 - 0B 144K sdflus 9 35:00 0.00% [bufdaemon{/ worker}] 38 root 16 - 0B 16K syncer 5 29:52 0.00% [syncer] 12 root -72 - 0B 736K WAIT 7 26:45 0.00% [intr{swi1: netisr 10}] 26 root -68 - 0B 240K - 15 24:43 0.00% [usb{usbus0}]
I can see the spike in latency on the gateway monitoring also.
I also see this error when I am trying to go to State Summary.
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /usr/local/www/diag_states_summary.php on line 41
Crash report begins. Anonymous machine information:
amd64
12.3-STABLE
FreeBSD 12.3-STABLE plus-RELENG_22_01-n202637-97cc5c23e13 pfSenseCrash report details:
PHP Errors:
[17-Aug-2024 17:58:12 Europe/Oslo] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /usr/local/www/diag_states_summary.php on line 53
[17-Aug-2024 17:58:20 Europe/Oslo] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /usr/local/www/diag_states_summary.php on line 41No FreeBSD crash data found.
Have anyone seen this before?
-
@tsp said in Traffic drop and see that kernel{if_io_tqg is spiking:
There are not much traffic going trough this.
How much is not much? That looks like significant load from pushing traffic.
Those PHP errors are the process exhausting the allowed memory when trying to display the state table which implies you have a very large number of states. What does the dashboard show for state usage?
You are running pfSense Plus 22.01 which at this point is very old. You should upgrade as soon as possible.
Steve
-
@stephenw10 Total traffic now is 10Mbps. That is not much, I will check the table.
-
Mmm, that doesn't seem much. If the state table is huge perhaps you are seeing a DoS attack of some sort? Are you allowing inbound traffic?
-
@tsp do you have any limiters configured?
-
@michmoor Hello, sorry for late reply. No limiter configured on this.