• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Observations, 25.03.b.20250414.1838

Plus 25.03 Develoment Snapshots
3
13
575
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P
    pst
    last edited by 23 days ago

    Here are a few observations after installing the April 14 beta, 25.03.b.20250414.1838.

    1. during the slow boot (see #3 below) I notice bandwidthd instability:
    pid 71451 (bandwidthd), jid 0, uid 0: exited on signal 8 (core dumped)
    pid 6914 (bandwidthd), jid 0, uid 0: exited on signal 8 (core dumped)
    pid 43753 (bandwidthd), jid 0, uid 0: exited on signal 8 (core dumped)
    

    I could not find any core dumps though.

    1. "Alarm-bell" messages on dashboard
    "Failed to lint kea-dhcp6 with custom configurations. See Status > System Logs for more information about this failure"
    

    I checked Status / System Logs / DHCP and there are a few like this, which might be the cause for alarm?

    Apr 15 19:24:25 	kea-dhcp6 	82464 	ERROR [kea-dhcp6.dhcp6.0x2a6aece12000] DHCP6_INIT_FAIL failed to initialize Kea server: configuration error using file '/usr/local/etc/kea/kea-dhcp6.conf': specified reservation '::151' is not within the IPv6 subnet '::10.10.10.1/128'
    

    I don't think this is a real error. The VIP ::10.10.10.1/128 is set up when pfBlockerNG is configured. The ::151 will be part of the subnet that is set up when the WAN receives the IPv6 config, which is hasn't yet. LAN is tracking WAN in my IPv6 config.

    1. [same behaviour as 24.11-REL] Boot still takes a very long with a configuration with many (14+) wireguard tunnels, which causes the boot to fail as a BSD boot supervision timer expires (~15 mins) and a reboot is triggered:
    wg6: changing name to 'tun_wg7'
    route: route has not been found
    tun_wg7: link state changed to UP
    pid 71451 (bandwidthd), jid 0, uid 0: exited on signal 8 (core dumped)
    pid 6914 (bandwidthd), jid 0, uid 0: exited on signal 8 (core dumped)
    pid 43753 (bandwidthd), jid 0, uid 0: exited on signal 8 (core dumped)
    Shutdown NOW!
    shutdown: [pid 67649]
    2025-04-15T19:09pflog0: promiscuous mode disabled
    Waiting (max 60 seconds) for system process `vnlru' to stop... done
    Waiting (max 60 seconds) for system process `syncer' to stop...
    Syncing disks, vnodes remaining... 0 0 done
    All buffers synced.
    Uptime: 15m53s
    ukbd0: detached
    uhid0: detached
    uhub0: detached
    

    Work-around to fix boot issue: detach WAN cable before booting.

    This non-optimal handling of wireguard tunnels during boot has previously been reported on 24.05, and a bug raised.

    Apart from these three observations, everything seems be running smoothly 😀

    N P M 3 Replies Last reply 23 days ago Reply Quote 0
    • N
      netblues @pst
      last edited by netblues 23 days ago 23 days ago

      @pst
      [Unbelievable
      Without limiters or traffic shaper.
      That was never the case up to now.
      Will also try with mpd5 again.]

      I stand corrected
      Actually this can only be achieved with load balancing.
      Latency still increases slightly under load, and you do need limiters.
      Can't say there is any difference with the new kernel ppp driver. See post below

      login-to-view

      N 1 Reply Last reply 23 days ago Reply Quote 1
      • N
        netblues @netblues
        last edited by 23 days ago

        @netblues
        this is with heavy limiting, for the shake of testing

        login-to-view

        1 Reply Last reply Reply Quote 0
        • M
          marcosm Netgate
          last edited by 22 days ago

          Please share the core dump files here:
          https://nc.netgate.com/nextcloud/s/zJfaxPfrfAAniwF

          P 1 Reply Last reply 22 days ago Reply Quote 0
          • P
            pst @marcosm
            last edited by 22 days ago

            @marcosm done.

            1 Reply Last reply Reply Quote 0
            • P
              pst @pst
              last edited by 22 days ago

              @pst said in Observations, 25.03.b.20250414.1838:

              This non-optimal handling of wireguard tunnels during boot has previously been reported on 24.05, and a bug raised.

              The bug in question is https://redmine.pfsense.org/issues/15435

              While the root cause is still to be determined, a possible cause could be that some of my WG peers are configured with ends point addresses using FQDNs, and resolving those during boot are causing the slow boot (TBC).

              P 1 Reply Last reply 22 days ago Reply Quote 0
              • P
                pst @pst
                last edited by 22 days ago

                @pst said in Observations, 25.03.b.20250414.1838:

                a possible cause could be that some of my WG peers are configured with ends point addresses using FQDNs, and resolving those during boot are causing the slow boot

                I have just confirmed that the root cause is the use of FQDN in the peer end-point address. While a temporary work-around is to use IP addresses instead of FQDNs, FQDNs must be allowed as one of my providers specifies FQDNs for a pool of addresses.

                P 1 Reply Last reply 21 days ago Reply Quote 0
                • P
                  pst @pst
                  last edited by pst 21 days ago 21 days ago

                  @pst I have found a solution that works in my scenario, where Wireguard requires Unbound to be up and running before starting the service. By disabling the early shell command to start wireguardd, and let wireguardd start at the end of the boot sequence (no change required here), my slow/failed boot is no more.

                  Update: a local patch is required to stop wireguard from automatically reinstalling the early shell command. See redmine for details.

                  1 Reply Last reply Reply Quote 0
                  • M
                    marcosm Netgate @pst
                    last edited by 20 days ago

                    @pst said in Observations, 25.03.b.20250414.1838:

                    I don't think this is a real error. The VIP ::10.10.10.1/128 is set up when pfBlockerNG is configured. The ::151 will be part of the subnet that is set up when the WAN receives the IPv6 config, which is hasn't yet. LAN is tracking WAN in my IPv6 config.

                    Presumably ::10.10.10.1/128 is on Localhost and ::151 is a static mapping on LAN - correct? The error itself seems correct - the address ::151 is not in the subnet ::10.10.10.1/128. Would you share the full /usr/local/etc/kea/kea-dhcp6.conf while the issue exists? You may upload it to:
                    https://nc.netgate.com/nextcloud/s/zJfaxPfrfAAniwF

                    P 1 Reply Last reply 20 days ago Reply Quote 0
                    • P
                      pst @marcosm
                      last edited by 20 days ago

                      @marcosm yes, ::10.10.10.1/128 is on the LAN

                      login-to-view

                      and ::151 is a static mapping on the LAN

                      login-to-view

                      I am uploading a zip that contains

                      • kea-dhcp6.conf.ATERROR which is the contents after I release the WAN lease (triggers the ERROR)
                      • kea-dhcp6.conf.AFTERSUBNETASSIGNED which is the working file after reception of the subnet config
                      • kea-dhcp6.conf.diff is the diff between the two
                      • dhcpd.log are the log entries for the renewal of the WAN lease

                      To me it looks like the error situation is present until we receive the IPv6 subnet configuration for the WAN which is then propagated to the LANs. I also saw this in 24.11 but paid less attention to it as it didn't trigger an alarm bell on the dashboard.

                      P 1 Reply Last reply 20 days ago Reply Quote 0
                      • P
                        pst @pst
                        last edited by 20 days ago

                        @marcosm I did some additional testing and managed to reduce the number of occurances of the error.

                        First I changed the WAN config, unchecked the "Do not wait for RA" (which was previously checked for some reason).

                        I then released the WAN lease, which triggered one error as seen before. When I renewed the WAN lease there were no errors, progress!

                        M 1 Reply Last reply 17 days ago Reply Quote 0
                        • M
                          marcosm Netgate @pst
                          last edited by 17 days ago

                          @pst DM'd with request for additional info.

                          1 Reply Last reply Reply Quote 0
                          • M
                            marcosm Netgate
                            last edited by 17 days ago

                            Details and fix for the Kea error:
                            https://redmine.pfsense.org/issues/16154

                            1 Reply Last reply Reply Quote 3
                            5 out of 13
                            • First post
                              5/13
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.