Observations, 25.03.b.20250414.1838

pst

Here are a few observations after installing the April 14 beta, 25.03.b.20250414.1838.

during the slow boot (see #3 below) I notice bandwidthd instability:

pid 71451 (bandwidthd), jid 0, uid 0: exited on signal 8 (core dumped)
pid 6914 (bandwidthd), jid 0, uid 0: exited on signal 8 (core dumped)
pid 43753 (bandwidthd), jid 0, uid 0: exited on signal 8 (core dumped)

I could not find any core dumps though.

"Alarm-bell" messages on dashboard

"Failed to lint kea-dhcp6 with custom configurations. See Status > System Logs for more information about this failure"

I checked Status / System Logs / DHCP and there are a few like this, which might be the cause for alarm?

Apr 15 19:24:25 	kea-dhcp6 	82464 	ERROR [kea-dhcp6.dhcp6.0x2a6aece12000] DHCP6_INIT_FAIL failed to initialize Kea server: configuration error using file '/usr/local/etc/kea/kea-dhcp6.conf': specified reservation '::151' is not within the IPv6 subnet '::10.10.10.1/128'

I don't think this is a real error. The VIP ::10.10.10.1/128 is set up when pfBlockerNG is configured. The ::151 will be part of the subnet that is set up when the WAN receives the IPv6 config, which is hasn't yet. LAN is tracking WAN in my IPv6 config.

[same behaviour as 24.11-REL] Boot still takes a very long with a configuration with many (14+) wireguard tunnels, which causes the boot to fail as a BSD boot supervision timer expires (~15 mins) and a reboot is triggered:

wg6: changing name to 'tun_wg7'
route: route has not been found
tun_wg7: link state changed to UP
pid 71451 (bandwidthd), jid 0, uid 0: exited on signal 8 (core dumped)
pid 6914 (bandwidthd), jid 0, uid 0: exited on signal 8 (core dumped)
pid 43753 (bandwidthd), jid 0, uid 0: exited on signal 8 (core dumped)
Shutdown NOW!
shutdown: [pid 67649]
2025-04-15T19:09pflog0: promiscuous mode disabled
Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining... 0 0 done
All buffers synced.
Uptime: 15m53s
ukbd0: detached
uhid0: detached
uhub0: detached

Work-around to fix boot issue: detach WAN cable before booting.

This non-optimal handling of wireguard tunnels during boot has previously been reported on 24.05, and a bug raised.

Apart from these three observations, everything seems be running smoothly

netblues

@pst
[Unbelievable
Without limiters or traffic shaper.
That was never the case up to now.
Will also try with mpd5 again.]

I stand corrected
Actually this can only be achieved with load balancing.
Latency still increases slightly under load, and you do need limiters.
Can't say there is any difference with the new kernel ppp driver. See post below

netblues

@netblues
this is with heavy limiting, for the shake of testing

marcosm

Please share the core dump files here:
https://nc.netgate.com/nextcloud/s/zJfaxPfrfAAniwF

pst

@marcosm done.

pst

@pst said in Observations, 25.03.b.20250414.1838:

This non-optimal handling of wireguard tunnels during boot has previously been reported on 24.05, and a bug raised.

The bug in question is https://redmine.pfsense.org/issues/15435

While the root cause is still to be determined, a possible cause could be that some of my WG peers are configured with ends point addresses using FQDNs, and resolving those during boot are causing the slow boot (TBC).

pst

@pst said in Observations, 25.03.b.20250414.1838:

a possible cause could be that some of my WG peers are configured with ends point addresses using FQDNs, and resolving those during boot are causing the slow boot

I have just confirmed that the root cause is the use of FQDN in the peer end-point address. While a temporary work-around is to use IP addresses instead of FQDNs, FQDNs must be allowed as one of my providers specifies FQDNs for a pool of addresses.

pst

@pst I have found a solution that works in my scenario, where Wireguard requires Unbound to be up and running before starting the service. By disabling the early shell command to start wireguardd, and let wireguardd start at the end of the boot sequence (no change required here), my slow/failed boot is no more.

Update: a local patch is required to stop wireguard from automatically reinstalling the early shell command. See redmine for details.

marcosm

@pst said in Observations, 25.03.b.20250414.1838:

I don't think this is a real error. The VIP ::10.10.10.1/128 is set up when pfBlockerNG is configured. The ::151 will be part of the subnet that is set up when the WAN receives the IPv6 config, which is hasn't yet. LAN is tracking WAN in my IPv6 config.

Presumably ::10.10.10.1/128 is on Localhost and ::151 is a static mapping on LAN - correct? The error itself seems correct - the address ::151 is not in the subnet ::10.10.10.1/128. Would you share the full /usr/local/etc/kea/kea-dhcp6.conf while the issue exists? You may upload it to:
https://nc.netgate.com/nextcloud/s/zJfaxPfrfAAniwF

pst

@marcosm yes, ::10.10.10.1/128 is on the LAN

and ::151 is a static mapping on the LAN

I am uploading a zip that contains

kea-dhcp6.conf.ATERROR which is the contents after I release the WAN lease (triggers the ERROR)
kea-dhcp6.conf.AFTERSUBNETASSIGNED which is the working file after reception of the subnet config
kea-dhcp6.conf.diff is the diff between the two
dhcpd.log are the log entries for the renewal of the WAN lease

To me it looks like the error situation is present until we receive the IPv6 subnet configuration for the WAN which is then propagated to the LANs. I also saw this in 24.11 but paid less attention to it as it didn't trigger an alarm bell on the dashboard.

pst

@marcosm I did some additional testing and managed to reduce the number of occurances of the error.

First I changed the WAN config, unchecked the "Do not wait for RA" (which was previously checked for some reason).

I then released the WAN lease, which triggered one error as seen before. When I renewed the WAN lease there were no errors, progress!

marcosm

@pst DM'd with request for additional info.

marcosm

Details and fix for the Kea error:
https://redmine.pfsense.org/issues/16154