24.11 on SG-2100 first impression (and issues)
-
Creating a separate topic from this one.
-
-
one day in with 24.11 on my SG-2100. I notice high CPU levels in the dash: 100% most of the time, occasionally dropping to 50% and 25%, then back to 100%.
Navigating around the menus appears to be a little more slowish than with 24.03.
Tried full reboot, no improvement.
No patches applied yet, will do apply all as a next step.
Any advice to give at this point?
-
@stephenw10 said in pfSense Plus Software Version 24.11 is here!:
Probably the changed widget reload behaviour in 24.11. Especially if you have a lot of widgets enabled?
Yes lots: System Information, Gateways, Installed Packages, Services Status, pfBlockerNG, OpenVPN, Interfaces, Traffic Graphs (9 VLANs of which 3 actively used), NTP Status
Try checking the CPU usage at the CLI using `top -HaSP'. Check it without the webgui open at all. If it's normal there you can try [reverting the widget change.
last pid: 98095; load averages: 2.76, 3.29, 2.05 up 0+00:09:08 12:25:06 250 threads: 4 running, 227 sleeping, 19 waiting CPU 0: 52.6% user, 0.8% nice, 18.9% system, 2.4% interrupt, 25.3% idle CPU 1: 64.2% user, 0.6% nice, 19.3% system, 2.1% interrupt, 13.7% idle Mem: 257M Active, 117M Inact, 389M Wired, 2537M Free ARC: 133M Total, 48M MFU, 78M MRU, 1234K Anon, 958K Header, 4848K Other 101M Compressed, 248M Uncompressed, 2.45:1 Ratio
After the first reboot, KEA IPv6 didn't come back up and I cannot start the kea ipv6 service manually. This is a problem as I've lost of my ipv6 leases on devices that use them, which is a lot. I can still fall back to ipv4 but still.
System Activity will show the cpu activity and update the readings every two seconds, but if I also open the dash in a second browser window, and click back to the system activity window, it will stop updating altogether, even after a page reload. The cpu appears to be very busy with that dash.
-
@Cabledude Don’t leave the dashboard visible, is the quick fix. Or there is the patch mentioned in the other thread. 25.03 will have a fix also. https://docs.netgate.com/pfsense/en/latest/releases/25-03.html#dashboard
Does Kea log anything? There is this but I’m not sure it applies to 24.11. And is under a v4 heading.
https://docs.netgate.com/pfsense/en/latest/releases/25-03.html#dhcp-ipv4 -
After removing the "Traffic Graphs" and "Interfaces widgets" from the dash the cpu (dash still open) drops mildly to around 85% at first. It still keeps hitting 100% occasionally. Then after a couple of minutes it goes to 25, then 50, then 63, 36, 20, 73 and so forth, so it's restless but no peaks to 100% anymore.
-
These are the warnings and errors I get from DHCP. Please note also from ipv4.
Mar 26 12:46:51 dhcp6c 79068 dhcp6c Received INFO Mar 26 12:46:51 dhcp6c 79068 Sending Renew Mar 26 12:46:41 dhcp6c 79068 add an address 2001:xxxx:xxxx:0:ac03:be86:8d6e:96dd/128 on mvneta0 Mar 26 12:46:41 dhcp6c 79068 dhcp6c Received INFO Mar 26 12:46:41 dhcp6c 79068 Sending Renew Mar 26 12:46:41 dhcp6c 79068 Sending Renew Mar 26 12:29:50 kea-dhcp6 377 ERROR [kea-dhcp6.dhcp6.0x474e46812000] DHCP6_INIT_FAIL failed to initialize Kea server: configuration error using file '/usr/local/etc/kea/kea-dhcp6.conf': subnet with the prefix of '2001::xxxx:xxxx:a000::/63' already exists (/usr/local/etc/kea/kea-dhcp6.conf:86:13) Mar 26 12:29:50 kea-dhcp6 377 ERROR [kea-dhcp6.dhcp6.0x474e46812000] DHCP6_CONFIG_LOAD_FAIL configuration error using file: /usr/local/etc/kea/kea-dhcp6.conf, reason: subnet with the prefix of '2001::xxxx:xxxx:a000::/63' already exists (/usr/local/etc/kea/kea-dhcp6.conf:86:13) Mar 26 12:29:50 kea-dhcp6 377 ERROR [kea-dhcp6.dhcp6.0x474e46812000] DHCP6_PARSER_FAIL failed to create or run parser for configuration element subnet6: subnet with the prefix of '2001::xxxx:xxxx:a000::/63' already exists (/usr/local/etc/kea/kea-dhcp6.conf:86:13) Mar 26 12:29:50 kea-dhcp6 377 WARN [kea-dhcp6.dhcp6.0x474e46812000] DHCP6_RESERVATIONS_LOOKUP_FIRST_ENABLED Multi-threading is enabled and host reservations lookup is always performed first. Mar 26 12:29:50 kea-dhcp6 377 WARN [kea-dhcp6.dhcpsrv.0x474e46812000] DHCPSRV_MT_DISABLED_QUEUE_CONTROL disabling dhcp queue control when multi-threading is enabled. Mar 26 12:29:40 kea-dhcp4 30070 WARN [kea-dhcp4.dhcp4.0x7765c3012000] DHCP4_MULTI_THREADING_INFO enabled: yes, number of threads: 2, queue size: 64 Mar 26 12:29:39 kea-dhcp4 30070 WARN [kea-dhcp4.dhcp4.0x7765c3012000] DHCP4_RESERVATIONS_LOOKUP_FIRST_ENABLED Multi-threading is enabled and host reservations lookup is always performed first. Mar 26 12:29:39 kea-dhcp4 30070 WARN [kea-dhcp4.dhcpsrv.0x7765c3012000] DHCPSRV_MT_DISABLED_QUEUE_CONTROL disabling dhcp queue control when multi-threading is enabled. Mar 26 12:17:56 kea-dhcp4 40085 WARN [kea-dhcp4.dhcp4.0x1ed194012000] DHCP4_MULTI_THREADING_INFO enabled: yes, number of threads: 2, queue size: 64 Mar 26 12:17:56 kea-dhcp4 40085 WARN [kea-dhcp4.dhcp4.0x1ed194012000] DHCP4_RESERVATIONS_LOOKUP_FIRST_ENABLED Multi-threading is enabled and host reservations lookup is always performed first. Mar 26 12:17:56 kea-dhcp4 40085 WARN [kea-dhcp4.dhcpsrv.0x1ed194012000] DHCPSRV_MT_DISABLED_QUEUE_CONTROL disabling dhcp queue control when multi-threading is enabled. Mar 26 12:17:15 kea-dhcp6 65938 ERROR [kea-dhcp6.dhcp6.0x537dbec12000] DHCP6_INIT_FAIL failed to initialize Kea server: configuration error using file '/usr/local/etc/kea/kea-dhcp6.conf': subnet with the prefix of '2001::xxxx:xxxx:a000::/63' already exists (/usr/local/etc/kea/kea-dhcp6.conf:86:13) Mar 26 12:17:15 kea-dhcp6 65938 ERROR [kea-dhcp6.dhcp6.0x537dbec12000] DHCP6_CONFIG_LOAD_FAIL configuration error using file: /usr/local/etc/kea/kea-dhcp6.conf, reason: subnet with the prefix of '2001::xxxx:xxxx:a000::/63' already exists (/usr/local/etc/kea/kea-dhcp6.conf:86:13) Mar 26 12:17:15 kea-dhcp6 65938 ERROR [kea-dhcp6.dhcp6.0x537dbec12000] DHCP6_PARSER_FAIL failed to create or run parser for configuration element subnet6: subnet with the prefix of '2001::xxxx:xxxx:a000::/63' already exists (/usr/local/etc/kea/kea-dhcp6.conf:86:13) Mar 26 12:17:15 kea-dhcp6 65938 WARN [kea-dhcp6.dhcp6.0x537dbec12000] DHCP6_RESERVATIONS_LOOKUP_FIRST_ENABLED Multi-threading is enabled and host reservations lookup is always performed first. Mar 26 12:17:15 kea-dhcp6 65938 WARN [kea-dhcp6.dhcpsrv.0x537dbec12000] DHCPSRV_MT_DISABLED_QUEUE_CONTROL disabling dhcp queue control when multi-threading is enabled. Mar 26 12:17:11 kea-dhcp4 40085 WARN [kea-dhcp4.dhcp4.0x1ed194012000] DHCP4_MULTI_THREADING_INFO enabled: yes, number of threads: 2, queue size: 64 Mar 26 12:17:10 kea-dhcp4 40085 WARN [kea-dhcp4.dhcp4.0x1ed194012000] DHCP4_RESERVATIONS_LOOKUP_FIRST_ENABLED Multi-threading is enabled and host reservations lookup is always performed first. Mar 26 12:17:10 kea-dhcp4 40085 WARN [kea-dhcp4.dhcpsrv.0x1ed194012000] DHCPSRV_MT_DISABLED_QUEUE_CONTROL disabling dhcp queue control when multi-threading is enabled.
I haven't looked into these yet, now just providing the logs. These errors came after the 24.11 update.
-
@stephenw10 said in pfSense Plus Software Version 24.11 is here!:
Try checking the CPU usage at the CLI using `top -HaSP'. Check it without the webgui open at all. If it's normal there you can try reverting the widget change.
Re Kea,
We have only lightly tested Kea since it’s still in preview. I would imagine “already exists” means the subnet is in there twice…? -
@SteveITS said in 24.11 on SG-2100 first impression (and issues):
I would imagine “already exists” means the subnet is in there twice…?
Yes. I have DHCPv6 enabled on LAN1 and VLAN10 and the subnets are identical. No changes from my side compared to what I had with 24.03 so these issues are new as from 24.11.
As I currently don't have any ipv6 clients on LAN1, I disabled DHCPv6 for LAN1 and now my clients on VLAN10 get ipv6 leases.All I see in DHCP logs now are some warnings about multithreading.
So for now it's been solved but I hope these KEA issues will be looked into soon.
And Steve: thanks for your help!
-
@Cabledude said in 24.11 on SG-2100 first impression (and issues):
on LAN1 and VLAN10 and the subnets are identical
Hm, normally that's a problem for pfSense in that it won't know where to route those packets.
-
What is using the CPU cycles as shown in the top output?
For example I expect to see something like:
last pid: 43534; load averages: 1.06, 1.19, 1.15 up 0+03:08:08 14:39:09 293 threads: 3 running, 267 sleeping, 23 waiting CPU 0: 6.7% user, 0.0% nice, 14.5% system, 0.4% interrupt, 78.4% idle CPU 1: 9.0% user, 0.0% nice, 11.0% system, 0.4% interrupt, 79.6% idle Mem: 106M Active, 295M Inact, 322M Wired, 2584M Free ARC: 133M Total, 50M MFU, 76M MRU, 544K Anon, 1035K Header, 5207K Other 103M Compressed, 242M Uncompressed, 2.36:1 Ratio PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 187 ki31 0B 32K CPU1 1 166:11 80.29% [idle{idle: cpu1}] 11 root 187 ki31 0B 32K RUN 0 166:13 79.11% [idle{idle: cpu0}] 531 root 68 0 140M 59M accept 1 0:18 1.82% php-fpm: pool nginx (php-fpm) 0 root -12 - 0B 1040K - 1 0:30 0.68% [kernel{z_wr_iss}] 37128 root 20 0 14M 4496K CPU0 0 0:00 0.54% top -HaSP 61254 root 20 0 33M 11M kqread 0 0:03 0.39% nginx: worker process (nginx) 7 root -16 - 0B 16K pftm 1 0:17 0.37% [pf purge] 17 root -16 - 0B 16K mmcsd 0 0:15 0.34% [mmcsd0: mmc/sd card] 38317 root 20 0 140M 57M accept 1 0:18 0.28% php-fpm: pool nginx (php-fpm) 0 root -16 - 0B 1040K - 0 0:11 0.26% [kernel{z_wr_int}] 73138 root 68 0 107M 46M accept 1 0:00 0.22% php-fpm: pool nginx (php-fpm) 2 root -60 - 0B 32K WAIT 0 0:30 0.18% [clock{clock (0)}] 4898 root 20 0 1300M 64M uwait 0 0:08 0.16% /usr/local/bin/pfnet-controller -conf /var/etc/pfnet-controller/pfnet-cont
That's actually with the dashboard open but in 25.03-beta.
-
@stephenw10 said in 24.11 on SG-2100 first impression (and issues):
What is using the CPU cycles as shown in the top output?
Well what I showed you above is all I get to see. I go to Menu / Diagnostics / Command Prompt, type "top -HaSP" in the Execute Shell Command box and click Execute.
When I go to System Activity I see the processes like in your example.
-
This is what I see when I leave the dash open and the cpu in the dash shows close to 100%, but it's changing all the time:
last pid: 22481; load averages: 3.26, 1.96, 1.03 up 0+03:53:56 16:09:54 257 threads: 5 running, 233 sleeping, 19 waiting CPU: 11.2% user, 0.9% nice, 7.6% system, 2.0% interrupt, 78.4% idle Mem: 162M Active, 275M Inact, 480M Wired, 2381M Free ARC: 197M Total, 80M MFU, 107M MRU, 1216K Anon, 1396K Header, 6983K Other 157M Compressed, 419M Uncompressed, 2.67:1 Ratio PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 187 ki31 0B 32K RUN 0 185:20 52.69% [idle{idle: cpu0}] 99138 root 100 0 34M 23M CPU1 1 0:03 28.17% /usr/local/bin/python3.11 11 root 187 ki31 0B 32K RUN 1 181:12 17.19% [idle{idle: cpu1}] 27 root 68 0 112M 45M piperd 0 0:02 14.45% php-fpm: pool nginx (php-fpm) 49103 root 53 0 112M 49M accept 1 0:16 12.89% php-fpm: pool nginx (php-fpm) 26105 root 68 0 141M 63M accept 1 0:21 10.99% php-fpm: pool nginx (php-fpm) 5217 root 68 0 141M 56M accept 0 0:03 10.69% php-fpm: pool nginx (php-fpm) 7323 root 68 0 141M 62M piperd 0 0:14 9.86% php-fpm: pool nginx (php-fpm) 11530 root 68 0 145M 66M lockf 0 0:29 9.67% php-fpm: pool nginx (php-fpm){php-fpm} 35200 root 68 0 112M 49M accept 0 0:11 6.98% php-fpm: pool nginx (php-fpm)
-
@Cabledude said in 24.11 on SG-2100 first impression (and issues):
I go to Menu / Diagnostics / Command Prompt, type "top -HaSP" in the Execute Shell Command box and click Execute.
Ah OK. That's not the CLI (command line interface). I meant to run that command at the real command prompt so either via SSH or using the console directly. The command prompt page in the webgui is only for commands with a static output. And in fact anything run using the webgui uses significant CPU by itself.
-
@stephenw10 Ah I see. Never used CLI except wired. I enabled SSH since I had to learn sometime
This is what I get, without dashboard opened or active:
last pid: 76645; load averages: 0.56, 0.49, 0.40 up 0+06:10:24 18:26:22 258 threads: 3 running, 236 sleeping, 19 waiting CPU 0: 18.6% user, 0.0% nice, 9.1% system, 5.9% interrupt, 66.4% idle CPU 1: 21.4% user, 0.0% nice, 9.1% system, 5.9% interrupt, 63.6% idle Mem: 170M Active, 251M Inact, 506M Wired, 2371M Free ARC: 199M Total, 78M MFU, 112M MRU, 770K Anon, 1406K Header, 7176K Other 159M Compressed, 424M Uncompressed, 2.67:1 Ratio PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 187 ki31 0B 32K RUN 0 306:59 64.97% [idle{idle: cpu0}] 11 root 187 ki31 0B 32K CPU1 1 302:40 64.56% [idle{idle: cpu1}] 97139 unbound 55 0 137M 113M kqread 1 0:20 32.05% /usr/local/sbin/unbound -c /var 97139 unbound 52 0 137M 113M kqread 0 0:16 23.78% /usr/local/sbin/unbound -c /var 12 root -64 - 0B 272K WAIT 1 4:25 2.43% [intr{gic0,s45: mvneta1}] 12 root -64 - 0B 272K WAIT 0 3:12 2.09% [intr{gic0,s42: mvneta0}] 12 root -60 - 0B 272K WAIT 1 2:23 1.97% [intr{swi1: netisr 1}] 12 root -60 - 0B 272K WAIT 0 3:23 1.88% [intr{swi1: netisr 0}] 2 root -60 - 0B 32K WAIT 0 4:13 1.38% [clock{clock (0)}] 0 root -12 - 0B 992K - 1 2:06 1.03% [kernel{z_wr_iss}] 94582 root 20 0 19M 9584K kqread 1 0:15 0.91% /usr/local/sbin/lighttpd_pfb -f 0 root -16 - 0B 992K - 1 0:42 0.56% [kernel{z_wr_int}] 12 root -64 - 0B 272K WAIT 1 0:25 0.49% [intr{gic0,s27: ahci0}] 36084 SPK 20 0 14M 4424K CPU0 0 0:01 0.46% top -HaSP 4 root -16 - 0B 48K - 1 0:15 0.40% [cam{doneq0}] 76679 avahi 20 0 14M 4524K select 1 1:25 0.17% avahi-daemon: running [SPK.loca 0 root -16 - 0B 992K - 1 0:03 0.16% [kernel{z_null_int}] 0 root -16 - 0B 992K - 1 0:02 0.12% [kernel{z_flush_int}] 7 root -16 - 0B 16K pftm 1 0:37 0.11% [pf purge] 20652 SPK 20 0 22M 11M select 1 0:00 0.09% sshd: SPK@pts/0 (sshd)
Looking at this output and seeing the cpu is largely idle and any significant usage is down to pfBlocker, I must assume that the high cpu I get with dashboard open is caused by the new dashboard widget build in 24.11.
I made a short video showing the "top -HaSP" command CLI output with/without dash open. It's 1.3MB in size.
Not sure how I can make the video available to you though.Takeaway is it's mostly idle now with dash off:
last pid: 28321; load averages: 0.30, 0.33, 0.40 up 0+06:29:57 18:45:55 258 threads: 3 running, 236 sleeping, 19 waiting CPU 0: 0.0% user, 0.0% nice, 1.2% system, 1.6% interrupt, 97.3% idle CPU 1: 0.0% user, 0.0% nice, 0.8% system, 0.8% interrupt, 98.4% idle Mem: 197M Active, 233M Inact, 493M Wired, 2375M Free ARC: 199M Total, 78M MFU, 113M MRU, 276K Anon, 1406K Header, 7113K Other 159M Compressed, 425M Uncompressed, 2.68:1 Ratio PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 187 ki31 0B 32K CPU1 1 318:59 98.50% [idle{idle: cpu1}] 11 root 187 ki31 0B 32K RUN 0 323:28 97.43% [idle{idle: cpu0}] 2 root -60 - 0B 32K WAIT 0 4:27 1.15% [clock{clock (0)}] 12 root -64 - 0B 272K WAIT 1 4:39 0.83% [intr{gic0,s45: mvneta1}] 12 root -60 - 0B 272K WAIT 0 3:33 0.79% [intr{swi1: netisr 0}] 47523 SPK 20 0 14M 4848K CPU0 0 0:03 0.40% top -HaSP 12 root -64 - 0B 272K WAIT 0 3:19 0.15% [intr{gic0,s42: mvneta0}] 76679 avahi 20 0 14M 4524K select 0 1:30 0.13% avahi-daemon: running [SPK.loca 7 root -16 - 0B 16K pftm 1 0:38 0.10% [pf purge] 12 root -60 - 0B 272K WAIT 1 2:29 0.08% [intr{swi1: netisr 1}] 20652 SPK
So I suppose the unit is doing fine now with 24.11, only the dash widget refresh is problematic. Might be better on the intel models such as 4100 4200 6100 etc. but I wanted the ultimate low power firewall...
-
I mean Unbound usage there is expected. Nothing there really looks like an issue.
I would try that patch I linked to. That should help with widget refreshes on the dashboard using CPU.
-
@stephenw10 said in 24.11 on SG-2100 first impression (and issues):
I mean Unbound usage there is expected. Nothing there really looks like an issue.
I agree, I had that feeling myself too, when I wrote it's doing fine now, but I appreciate your opinion because you're the master 8-)
I would try that patch I linked to. That should help with widget refreshes on the dashboard using CPU.
I appreciate the link, I reverted and now the cpu in the dashboard page is much lower, but the number updates only every 30 seconds, where before it was more like between 5-8 seconds.
Do you reckon more work will be done on the widget refresh engine? So that it will perform on the ARM units like 24.03 and before?
-
@Cabledude :)
@SteveITS said in 24.11 on SG-2100 first impression (and issues):
25.03 will have a fix also. https://docs.netgate.com/pfsense/en/latest/releases/25-03.html#dashboard
-
Yes it's better in 25.03