24.11 on SG-2100 first impression (and issues)

Cabledude

Creating a separate topic from this one.

Cabledude

one day in with 24.11 on my SG-2100. I notice high CPU levels in the dash: 100% most of the time, occasionally dropping to 50% and 25%, then back to 100%.

Navigating around the menus appears to be a little more slowish than with 24.03.

Tried full reboot, no improvement.

No patches applied yet, will do apply all as a next step.

Any advice to give at this point?

Cabledude

@stephenw10 said in pfSense Plus Software Version 24.11 is here!:

Probably the changed widget reload behaviour in 24.11. Especially if you have a lot of widgets enabled?

Yes lots: System Information, Gateways, Installed Packages, Services Status, pfBlockerNG, OpenVPN, Interfaces, Traffic Graphs (9 VLANs of which 3 actively used), NTP Status

Try checking the CPU usage at the CLI using `top -HaSP'. Check it without the webgui open at all. If it's normal there you can try [reverting the widget change.

last pid: 98095;  load averages:    2.76,    3.29,    2.05  up 0+00:09:08    12:25:06
250 threads:   4 running, 227 sleeping, 19 waiting
CPU 0: 52.6% user,  0.8% nice, 18.9% system,  2.4% interrupt, 25.3% idle
CPU 1: 64.2% user,  0.6% nice, 19.3% system,  2.1% interrupt, 13.7% idle
Mem: 257M Active, 117M Inact, 389M Wired, 2537M Free
ARC: 133M Total, 48M MFU, 78M MRU, 1234K Anon, 958K Header, 4848K Other
     101M Compressed, 248M Uncompressed, 2.45:1 Ratio

After the first reboot, KEA IPv6 didn't come back up and I cannot start the kea ipv6 service manually. This is a problem as I've lost of my ipv6 leases on devices that use them, which is a lot. I can still fall back to ipv4 but still.

System Activity will show the cpu activity and update the readings every two seconds, but if I also open the dash in a second browser window, and click back to the system activity window, it will stop updating altogether, even after a page reload. The cpu appears to be very busy with that dash.

SteveITS

@Cabledude Don’t leave the dashboard visible, is the quick fix. Or there is the patch mentioned in the other thread. 25.03 will have a fix also. https://docs.netgate.com/pfsense/en/latest/releases/25-03.html#dashboard

Does Kea log anything? There is this but I’m not sure it applies to 24.11. And is under a v4 heading.
https://docs.netgate.com/pfsense/en/latest/releases/25-03.html#dhcp-ipv4

Cabledude

After removing the "Traffic Graphs" and "Interfaces widgets" from the dash the cpu (dash still open) drops mildly to around 85% at first. It still keeps hitting 100% occasionally. Then after a couple of minutes it goes to 25, then 50, then 63, 36, 20, 73 and so forth, so it's restless but no peaks to 100% anymore.

Cabledude

These are the warnings and errors I get from DHCP. Please note also from ipv4.

Mar 26 12:46:51	dhcp6c	79068	dhcp6c Received INFO
Mar 26 12:46:51	dhcp6c	79068	Sending Renew
Mar 26 12:46:41	dhcp6c	79068	add an address 2001:xxxx:xxxx:0:ac03:be86:8d6e:96dd/128 on mvneta0
Mar 26 12:46:41	dhcp6c	79068	dhcp6c Received INFO
Mar 26 12:46:41	dhcp6c	79068	Sending Renew
Mar 26 12:46:41	dhcp6c	79068	Sending Renew
Mar 26 12:29:50	kea-dhcp6	377	ERROR [kea-dhcp6.dhcp6.0x474e46812000] DHCP6_INIT_FAIL failed to initialize Kea server: configuration error using file '/usr/local/etc/kea/kea-dhcp6.conf': subnet with the prefix of '2001::xxxx:xxxx:a000::/63' already exists (/usr/local/etc/kea/kea-dhcp6.conf:86:13)
Mar 26 12:29:50	kea-dhcp6	377	ERROR [kea-dhcp6.dhcp6.0x474e46812000] DHCP6_CONFIG_LOAD_FAIL configuration error using file: /usr/local/etc/kea/kea-dhcp6.conf, reason: subnet with the prefix of '2001::xxxx:xxxx:a000::/63' already exists (/usr/local/etc/kea/kea-dhcp6.conf:86:13)
Mar 26 12:29:50	kea-dhcp6	377	ERROR [kea-dhcp6.dhcp6.0x474e46812000] DHCP6_PARSER_FAIL failed to create or run parser for configuration element subnet6: subnet with the prefix of '2001::xxxx:xxxx:a000::/63' already exists (/usr/local/etc/kea/kea-dhcp6.conf:86:13)
Mar 26 12:29:50	kea-dhcp6	377	WARN [kea-dhcp6.dhcp6.0x474e46812000] DHCP6_RESERVATIONS_LOOKUP_FIRST_ENABLED Multi-threading is enabled and host reservations lookup is always performed first.
Mar 26 12:29:50	kea-dhcp6	377	WARN [kea-dhcp6.dhcpsrv.0x474e46812000] DHCPSRV_MT_DISABLED_QUEUE_CONTROL disabling dhcp queue control when multi-threading is enabled.
Mar 26 12:29:40	kea-dhcp4	30070	WARN [kea-dhcp4.dhcp4.0x7765c3012000] DHCP4_MULTI_THREADING_INFO enabled: yes, number of threads: 2, queue size: 64
Mar 26 12:29:39	kea-dhcp4	30070	WARN [kea-dhcp4.dhcp4.0x7765c3012000] DHCP4_RESERVATIONS_LOOKUP_FIRST_ENABLED Multi-threading is enabled and host reservations lookup is always performed first.
Mar 26 12:29:39	kea-dhcp4	30070	WARN [kea-dhcp4.dhcpsrv.0x7765c3012000] DHCPSRV_MT_DISABLED_QUEUE_CONTROL disabling dhcp queue control when multi-threading is enabled.
Mar 26 12:17:56	kea-dhcp4	40085	WARN [kea-dhcp4.dhcp4.0x1ed194012000] DHCP4_MULTI_THREADING_INFO enabled: yes, number of threads: 2, queue size: 64
Mar 26 12:17:56	kea-dhcp4	40085	WARN [kea-dhcp4.dhcp4.0x1ed194012000] DHCP4_RESERVATIONS_LOOKUP_FIRST_ENABLED Multi-threading is enabled and host reservations lookup is always performed first.
Mar 26 12:17:56	kea-dhcp4	40085	WARN [kea-dhcp4.dhcpsrv.0x1ed194012000] DHCPSRV_MT_DISABLED_QUEUE_CONTROL disabling dhcp queue control when multi-threading is enabled.
Mar 26 12:17:15	kea-dhcp6	65938	ERROR [kea-dhcp6.dhcp6.0x537dbec12000] DHCP6_INIT_FAIL failed to initialize Kea server: configuration error using file '/usr/local/etc/kea/kea-dhcp6.conf': subnet with the prefix of '2001::xxxx:xxxx:a000::/63' already exists (/usr/local/etc/kea/kea-dhcp6.conf:86:13)
Mar 26 12:17:15	kea-dhcp6	65938	ERROR [kea-dhcp6.dhcp6.0x537dbec12000] DHCP6_CONFIG_LOAD_FAIL configuration error using file: /usr/local/etc/kea/kea-dhcp6.conf, reason: subnet with the prefix of '2001::xxxx:xxxx:a000::/63' already exists (/usr/local/etc/kea/kea-dhcp6.conf:86:13)
Mar 26 12:17:15	kea-dhcp6	65938	ERROR [kea-dhcp6.dhcp6.0x537dbec12000] DHCP6_PARSER_FAIL failed to create or run parser for configuration element subnet6: subnet with the prefix of '2001::xxxx:xxxx:a000::/63' already exists (/usr/local/etc/kea/kea-dhcp6.conf:86:13)
Mar 26 12:17:15	kea-dhcp6	65938	WARN [kea-dhcp6.dhcp6.0x537dbec12000] DHCP6_RESERVATIONS_LOOKUP_FIRST_ENABLED Multi-threading is enabled and host reservations lookup is always performed first.
Mar 26 12:17:15	kea-dhcp6	65938	WARN [kea-dhcp6.dhcpsrv.0x537dbec12000] DHCPSRV_MT_DISABLED_QUEUE_CONTROL disabling dhcp queue control when multi-threading is enabled.
Mar 26 12:17:11	kea-dhcp4	40085	WARN [kea-dhcp4.dhcp4.0x1ed194012000] DHCP4_MULTI_THREADING_INFO enabled: yes, number of threads: 2, queue size: 64
Mar 26 12:17:10	kea-dhcp4	40085	WARN [kea-dhcp4.dhcp4.0x1ed194012000] DHCP4_RESERVATIONS_LOOKUP_FIRST_ENABLED Multi-threading is enabled and host reservations lookup is always performed first.
Mar 26 12:17:10	kea-dhcp4	40085	WARN [kea-dhcp4.dhcpsrv.0x1ed194012000] DHCPSRV_MT_DISABLED_QUEUE_CONTROL disabling dhcp queue control when multi-threading is enabled.

I haven't looked into these yet, now just providing the logs. These errors came after the 24.11 update.

SteveITS

@Cabledude

@stephenw10 said in pfSense Plus Software Version 24.11 is here!:

Try checking the CPU usage at the CLI using `top -HaSP'. Check it without the webgui open at all. If it's normal there you can try reverting the widget change.

Re Kea,
We have only lightly tested Kea since it’s still in preview. I would imagine “already exists” means the subnet is in there twice…?

Cabledude

@SteveITS said in 24.11 on SG-2100 first impression (and issues):

I would imagine “already exists” means the subnet is in there twice…?

Yes. I have DHCPv6 enabled on LAN1 and VLAN10 and the subnets are identical. No changes from my side compared to what I had with 24.03 so these issues are new as from 24.11.
As I currently don't have any ipv6 clients on LAN1, I disabled DHCPv6 for LAN1 and now my clients on VLAN10 get ipv6 leases.

All I see in DHCP logs now are some warnings about multithreading.

So for now it's been solved but I hope these KEA issues will be looked into soon.

And Steve: thanks for your help!

SteveITS

@Cabledude said in 24.11 on SG-2100 first impression (and issues):

on LAN1 and VLAN10 and the subnets are identical

Hm, normally that's a problem for pfSense in that it won't know where to route those packets.

stephenw10

What is using the CPU cycles as shown in the top output?

For example I expect to see something like:

last pid: 43534;  load averages:    1.06,    1.19,    1.15                                                            up 0+03:08:08  14:39:09
293 threads:   3 running, 267 sleeping, 23 waiting
CPU 0:  6.7% user,  0.0% nice, 14.5% system,  0.4% interrupt, 78.4% idle
CPU 1:  9.0% user,  0.0% nice, 11.0% system,  0.4% interrupt, 79.6% idle
Mem: 106M Active, 295M Inact, 322M Wired, 2584M Free
ARC: 133M Total, 50M MFU, 76M MRU, 544K Anon, 1035K Header, 5207K Other
     103M Compressed, 242M Uncompressed, 2.36:1 Ratio

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        187 ki31     0B    32K CPU1     1 166:11  80.29% [idle{idle: cpu1}]
   11 root        187 ki31     0B    32K RUN      0 166:13  79.11% [idle{idle: cpu0}]
  531 root         68    0   140M    59M accept   1   0:18   1.82% php-fpm: pool nginx (php-fpm)
    0 root        -12    -     0B  1040K -        1   0:30   0.68% [kernel{z_wr_iss}]
37128 root         20    0    14M  4496K CPU0     0   0:00   0.54% top -HaSP
61254 root         20    0    33M    11M kqread   0   0:03   0.39% nginx: worker process (nginx)
    7 root        -16    -     0B    16K pftm     1   0:17   0.37% [pf purge]
   17 root        -16    -     0B    16K mmcsd    0   0:15   0.34% [mmcsd0: mmc/sd card]
38317 root         20    0   140M    57M accept   1   0:18   0.28% php-fpm: pool nginx (php-fpm)
    0 root        -16    -     0B  1040K -        0   0:11   0.26% [kernel{z_wr_int}]
73138 root         68    0   107M    46M accept   1   0:00   0.22% php-fpm: pool nginx (php-fpm)
    2 root        -60    -     0B    32K WAIT     0   0:30   0.18% [clock{clock (0)}]
 4898 root         20    0  1300M    64M uwait    0   0:08   0.16% /usr/local/bin/pfnet-controller -conf /var/etc/pfnet-controller/pfnet-cont

That's actually with the dashboard open but in 25.03-beta.

Cabledude

@stephenw10 said in 24.11 on SG-2100 first impression (and issues):

What is using the CPU cycles as shown in the top output?

Well what I showed you above is all I get to see. I go to Menu / Diagnostics / Command Prompt, type "top -HaSP" in the Execute Shell Command box and click Execute.

When I go to System Activity I see the processes like in your example.

Cabledude

This is what I see when I leave the dash open and the cpu in the dash shows close to 100%, but it's changing all the time:

last pid: 22481;  load averages:    3.26,    1.96,    1.03  up 0+03:53:56    16:09:54
257 threads:   5 running, 233 sleeping, 19 waiting
CPU: 11.2% user,  0.9% nice,  7.6% system,  2.0% interrupt, 78.4% idle
Mem: 162M Active, 275M Inact, 480M Wired, 2381M Free
ARC: 197M Total, 80M MFU, 107M MRU, 1216K Anon, 1396K Header, 6983K Other
     157M Compressed, 419M Uncompressed, 2.67:1 Ratio

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        187 ki31     0B    32K RUN      0 185:20  52.69% [idle{idle: cpu0}]
99138 root        100    0    34M    23M CPU1     1   0:03  28.17% /usr/local/bin/python3.11
   11 root        187 ki31     0B    32K RUN      1 181:12  17.19% [idle{idle: cpu1}]
   27 root         68    0   112M    45M piperd   0   0:02  14.45% php-fpm: pool nginx (php-fpm)
49103 root         53    0   112M    49M accept   1   0:16  12.89% php-fpm: pool nginx (php-fpm)
26105 root         68    0   141M    63M accept   1   0:21  10.99% php-fpm: pool nginx (php-fpm)
 5217 root         68    0   141M    56M accept   0   0:03  10.69% php-fpm: pool nginx (php-fpm)
 7323 root         68    0   141M    62M piperd   0   0:14   9.86% php-fpm: pool nginx (php-fpm)
11530 root         68    0   145M    66M lockf    0   0:29   9.67% php-fpm: pool nginx (php-fpm){php-fpm}
35200 root         68    0   112M    49M accept   0   0:11   6.98% php-fpm: pool nginx (php-fpm)

stephenw10

@Cabledude said in 24.11 on SG-2100 first impression (and issues):

I go to Menu / Diagnostics / Command Prompt, type "top -HaSP" in the Execute Shell Command box and click Execute.

Ah OK. That's not the CLI (command line interface). I meant to run that command at the real command prompt so either via SSH or using the console directly. The command prompt page in the webgui is only for commands with a static output. And in fact anything run using the webgui uses significant CPU by itself.

Cabledude

@stephenw10 Ah I see. Never used CLI except wired. I enabled SSH since I had to learn sometime

This is what I get, without dashboard opened or active:

last pid: 76645;  load averages:    0.56,    0.49,    0.40                 up 0+06:10:24  18:26:22
258 threads:   3 running, 236 sleeping, 19 waiting
CPU 0: 18.6% user,  0.0% nice,  9.1% system,  5.9% interrupt, 66.4% idle
CPU 1: 21.4% user,  0.0% nice,  9.1% system,  5.9% interrupt, 63.6% idle
Mem: 170M Active, 251M Inact, 506M Wired, 2371M Free
ARC: 199M Total, 78M MFU, 112M MRU, 770K Anon, 1406K Header, 7176K Other
     159M Compressed, 424M Uncompressed, 2.67:1 Ratio

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        187 ki31     0B    32K RUN      0 306:59  64.97% [idle{idle: cpu0}]
   11 root        187 ki31     0B    32K CPU1     1 302:40  64.56% [idle{idle: cpu1}]
97139 unbound      55    0   137M   113M kqread   1   0:20  32.05% /usr/local/sbin/unbound -c /var
97139 unbound      52    0   137M   113M kqread   0   0:16  23.78% /usr/local/sbin/unbound -c /var
   12 root        -64    -     0B   272K WAIT     1   4:25   2.43% [intr{gic0,s45: mvneta1}]
   12 root        -64    -     0B   272K WAIT     0   3:12   2.09% [intr{gic0,s42: mvneta0}]
   12 root        -60    -     0B   272K WAIT     1   2:23   1.97% [intr{swi1: netisr 1}]
   12 root        -60    -     0B   272K WAIT     0   3:23   1.88% [intr{swi1: netisr 0}]
    2 root        -60    -     0B    32K WAIT     0   4:13   1.38% [clock{clock (0)}]
    0 root        -12    -     0B   992K -        1   2:06   1.03% [kernel{z_wr_iss}]
94582 root         20    0    19M  9584K kqread   1   0:15   0.91% /usr/local/sbin/lighttpd_pfb -f
    0 root        -16    -     0B   992K -        1   0:42   0.56% [kernel{z_wr_int}]
   12 root        -64    -     0B   272K WAIT     1   0:25   0.49% [intr{gic0,s27: ahci0}]
36084 SPK          20    0    14M  4424K CPU0     0   0:01   0.46% top -HaSP
    4 root        -16    -     0B    48K -        1   0:15   0.40% [cam{doneq0}]
76679 avahi        20    0    14M  4524K select   1   1:25   0.17% avahi-daemon: running [SPK.loca
    0 root        -16    -     0B   992K -        1   0:03   0.16% [kernel{z_null_int}]
    0 root        -16    -     0B   992K -        1   0:02   0.12% [kernel{z_flush_int}]
    7 root        -16    -     0B    16K pftm     1   0:37   0.11% [pf purge]
20652 SPK          20    0    22M    11M select   1   0:00   0.09% sshd: SPK@pts/0 (sshd)

Looking at this output and seeing the cpu is largely idle and any significant usage is down to pfBlocker, I must assume that the high cpu I get with dashboard open is caused by the new dashboard widget build in 24.11.
I made a short video showing the "top -HaSP" command CLI output with/without dash open. It's 1.3MB in size.
Not sure how I can make the video available to you though.

Takeaway is it's mostly idle now with dash off:

last pid: 28321;  load averages:    0.30,    0.33,    0.40                 up 0+06:29:57  18:45:55
258 threads:   3 running, 236 sleeping, 19 waiting
CPU 0:  0.0% user,  0.0% nice,  1.2% system,  1.6% interrupt, 97.3% idle
CPU 1:  0.0% user,  0.0% nice,  0.8% system,  0.8% interrupt, 98.4% idle
Mem: 197M Active, 233M Inact, 493M Wired, 2375M Free
ARC: 199M Total, 78M MFU, 113M MRU, 276K Anon, 1406K Header, 7113K Other
     159M Compressed, 425M Uncompressed, 2.68:1 Ratio

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        187 ki31     0B    32K CPU1     1 318:59  98.50% [idle{idle: cpu1}]
   11 root        187 ki31     0B    32K RUN      0 323:28  97.43% [idle{idle: cpu0}]
    2 root        -60    -     0B    32K WAIT     0   4:27   1.15% [clock{clock (0)}]
   12 root        -64    -     0B   272K WAIT     1   4:39   0.83% [intr{gic0,s45: mvneta1}]
   12 root        -60    -     0B   272K WAIT     0   3:33   0.79% [intr{swi1: netisr 0}]
47523 SPK          20    0    14M  4848K CPU0     0   0:03   0.40% top -HaSP
   12 root        -64    -     0B   272K WAIT     0   3:19   0.15% [intr{gic0,s42: mvneta0}]
76679 avahi        20    0    14M  4524K select   0   1:30   0.13% avahi-daemon: running [SPK.loca
    7 root        -16    -     0B    16K pftm     1   0:38   0.10% [pf purge]
   12 root        -60    -     0B   272K WAIT     1   2:29   0.08% [intr{swi1: netisr 1}]
20652 SPK

So I suppose the unit is doing fine now with 24.11, only the dash widget refresh is problematic. Might be better on the intel models such as 4100 4200 6100 etc. but I wanted the ultimate low power firewall...

stephenw10

I mean Unbound usage there is expected. Nothing there really looks like an issue.

I would try that patch I linked to. That should help with widget refreshes on the dashboard using CPU.

Cabledude

@stephenw10 said in 24.11 on SG-2100 first impression (and issues):

I mean Unbound usage there is expected. Nothing there really looks like an issue.

I agree, I had that feeling myself too, when I wrote it's doing fine now, but I appreciate your opinion because you're the master 8-)

I would try that patch I linked to. That should help with widget refreshes on the dashboard using CPU.

I appreciate the link, I reverted and now the cpu in the dashboard page is much lower, but the number updates only every 30 seconds, where before it was more like between 5-8 seconds.

Do you reckon more work will be done on the widget refresh engine? So that it will perform on the ARM units like 24.03 and before?

SteveITS

@Cabledude :)

@SteveITS said in 24.11 on SG-2100 first impression (and issues):

25.03 will have a fix also. https://docs.netgate.com/pfsense/en/latest/releases/25-03.html#dashboard

stephenw10

Yes it's better in 25.03

Cabledude

@SteveITS said in 24.11 on SG-2100 first impression (and issues):

@Cabledude said in 24.11 on SG-2100 first impression (and issues):

on LAN1 and VLAN10 and the subnets are identical

Hm, normally that's a problem for pfSense in that it won't know where to route those packets.

Apologise for the massive delay. I meant identical before/after moving from 24.03 to 24.11

Cabledude

@stephenw10 said in 24.11 on SG-2100 first impression (and issues):

Yes it's better in 25.03

Hi Steve,
I am still on 24.11 for my 2100 Max and I can’t escape the feeling the UI is substantially more sluggish than on 24.03. I cant list any specific tasks right now but the experience is bad. Many tasks can take so long (like 10 seconds or more for setting a static IP for a DHCP client) I start wondering if the system hangs, but then every time it completes normally but it takes too long for comfort. For the first time I start regretting the ARM, wishing I’d gone with one of the higher priced intel models.

Can you confirm this is happening with 24.11 on the ARM models? And is the trouble over after moving to 25.03.
Thanks,
Pete