Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    [solved] 25.03.b.20250507.1611 Dashboard Alarm Bell after the upgrade reboot

    Scheduled Pinned Locked Moved Plus 25.03 Develoment Snapshots
    18 Posts 4 Posters 501 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      pst @stephenw10
      last edited by

      @stephenw10 said in 25.03.b.20250507.1611 Dashboard Alarm Bell after the upgrade reboot:

      What's in that interface group?

      My nine VLANs, which are all configured on igb1 (lan)

      IF I find some time I can go back and perform the upgrade again. Are there anything in particular I should check for? I'll keep a copy of the initial /tmp/rules.debug to check for differences after a second reboot.

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        I would expect there to be some error logged when the system tries to generate the alias. If it doesn't then it might not be trying; so some ordering issue.

        Do you have Nexus/MIM enabled?

        P 1 Reply Last reply Reply Quote 0
        • P
          pst @stephenw10
          last edited by

          @stephenw10 said in 25.03.b.20250507.1611 Dashboard Alarm Bell after the upgrade reboot:

          Do you have Nexus/MIM enabled?

          no, never had

          1 Reply Last reply Reply Quote 1
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            What did you upgrade from?

            P 1 Reply Last reply Reply Quote 0
            • P
              pst @stephenw10
              last edited by

              @stephenw10 said in 25.03.b.20250507.1611 Dashboard Alarm Bell after the upgrade reboot:

              What did you upgrade from?

              the previous beta version, 25.03.b.20250429.1329

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Hmm, I can't replicate that. An interface group table loads fine here at all points.

                Probably going to need some debug info from your install if you can get it.

                P 1 Reply Last reply Reply Quote 0
                • P
                  pst @stephenw10
                  last edited by

                  @stephenw10 I went back to the previous beta and completed another upgrade. The result was exactly the same, an error on the same variable ALL_LANS:

                  Filter Reload
                  
                      There were error(s) loading the rules: /tmp/rules.debug:684: macro 'ALL_VLANS__NETWORK' not defined - The line in question reads [684]: pass in quick on $LAN inet from $admin_devices to $ALL_VLANS__NETWORK ridentifier 1746201666 keep state label "USER_RULE: Allow admin access to every VLAN" label "id:1746201666"
                      @ 2025-05-09 17:11:36
                  

                  The error doesn't appear in the system log as it is triggered before the syslog is started but I manage to find it in the console output. It seems to be triggered by the reception of WAN configuration (DHCP) very early during the boot, rc.newwanip is run which seems to trigger the filter reload.

                  <snip>
                  igb0: link state changed to DOWN
                  overwrite!
                  Loading package configuration... done.
                  Configuring package components...
                  Loading package instructions...
                  Custom commands...
                  Executing custom_php_install_command()...
                  Rebuilding GeoIP tabs...2025-05-09T17:11:29.258105+02:00 - php-fpm 601 - - /rc.linkup: Ignoring link event during boot sequence.
                  2025-05-09T17:11:29.512131+02:00 - php-fpm 601 - - /rc.linkup: DHCP Client not running on wan (igb0), reconfiguring dhclient.
                  2025-05-09T17:11:29.521576+02:00 - php-fpm 601 - - /rc.linkup: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf -p /var/run/dhclient.igb0.pid igb0 > /tmp/igb0_output 2> /tmp/igb0_error_output' returned exit code '1', the output was ''
                  igb0: link state changed to UP
                  2025-05-09T17:11:33.961711+02:00 - php-fpm 602 - - /rc.newwanip: rc.newwanip: Info: starting on igb0.
                  2025-05-09T17:11:33.961994+02:00 - php-fpm 602 - - /rc.newwanip: rc.newwanip: on (IP address: X.X.X.X) (interface: 0WAN[wan]) (real interface: igb0).
                  gif0: link state changed to UP
                  2025-05-09T17:11:34.336889+02:00 - php-fpm 602 - - /rc.newwanip: Gateway, switch to: WAN_DHCP
                  2025-05-09T17:11:34.341573+02:00 - php-fpm 602 - - /rc.newwanip: Default gateway setting Interface WAN_DHCP Gateway as default.
                  2025-05-09T17:11:34.354551+02:00 - php-fpm 602 - - /rc.newwanip: Gateway, switch to: WANV6_TUNNELV6
                  2025-05-09T17:11:34.359146+02:00 - php-fpm 602 - - /rc.newwanip: Default gateway setting Interface WANV6_TUNNELV6 Gateway as default.
                  pflog0: promiscuous mode enabled
                  load_dn_sched dn_sched FIFO loaded
                  load_dn_sched dn_sched QFQ loaded
                  load_dn_sched dn_sched RR loaded
                  load_dn_sched dn_sched WF2Q+ loaded
                  load_dn_sched dn_sched PRIO loaded
                  load_dn_sched dn_sched FQ_CODEL loaded
                  load_dn_sched dn_sched FQ_PIE loaded
                  load_dn_aqm dn_aqm CODEL loaded
                  load_dn_aqm dn_aqm PIE loaded
                  2025-05-09T17:11:36.226321+02:00 - php-fpm 602 - - /rc.newwanip: New alert found: There were error(s) loading the rules: /tmp/rules.debug:684: macro 'ALL_VLANS__NETWORK' not defined - The line in question reads [684]: pass  in  quick  on $LAN inet from $admin_devices to $ALL_VLANS__NETWORK ridentifier 1746201666 keep state label "USER_RULE: Allow admin access to every VLAN" label "id:1746201666"
                  2025-05-09T17:11:36.226456+02:00 - php-fpm 602 - -
                   done.
                  Adding pfBlockerNG Widget to the Dashboard... done.
                  Creating Firewall filter service... done.
                  Renew Firewall filter executables... done.
                  Starting Firewall filter Service... done.
                  <snip>
                  

                  I checked the rules.debug immediately after the upgrade and the content is fine at that point:

                  [25.03-BETA][root@pfsense.local.lan]/root: grep ALL_VLANS rules.debug-after-upgrade
                  ALL_VLANS = "{ ALL_VLANS }"
                  table <ALL_VLANS__NETWORK> persist { 192.168.10.254/24 192... }
                  ALL_VLANS__NETWORK = "<ALL_VLANS__NETWORK>"
                  

                  ALL_VLANS is as I mentioned a list of nine VLANs. They all have static IPv4/v6 configuration - apart from one which is tracking the WAN DHCPv6. Not sure if that could be causing issues? Perhaps ALL_VLANS isn't created until all member addresses are available? [speculation]

                  Another reason why this is only seen at the initial reboot, and not subsequent, might be that the upgrade reboot takes longer, because of upgrade tasks, which means the WAN configuration is received much earlier in the boot process.

                  /etc/rc.newwanip seems to be able to detect and act based on is_platform_booting(), so perhaps there is just a bit of logic missing to prevent calling filter_configure_sync() in this scenario?

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Hmm, I imagine something in your config takes longer to load than my test box.

                    Are you able to upload that config for us to test with? If not I'll try to create something.

                    P 1 Reply Last reply Reply Quote 0
                    • P
                      pst @stephenw10
                      last edited by

                      @stephenw10 yes I can upload it somewhere private. It should be directly loadable into a box with two igb interfaces, igb0 is WAN and igb1 is LAN. If you want the box to complete the boot process you probably need the patch in https://redmine.pfsense.org/issues/15435#note-11 Without it wireguard will lock up as some of the wg peers are configured with FQDN. I'll also add my console output so you see the timing of the error in my bootup.

                      1 Reply Last reply Reply Quote 0
                      • M
                        marcosm Netgate
                        last edited by

                        See https://redmine.pfsense.org/issues/16182

                        If you're using ZFS and want to test the fix, try the following:

                        • revert back to the snapshot before the upgrade
                        • enable "Defer Automatic Reboot" under System > Update > Update Settings
                        • run the system upgrade - do not reboot after it's done - and wait until it completes
                        • go to Diagnostics > Command Prompt and run the command bectl mount default - if needed replace default with the name of the boot environment that's being upgraded; this will output a path in /tmp, make a note of it
                        • create a new patch using commit a8e5ba643026ee11001dbeff48246ec9fbd07cc9 and set the patch's base directory to the noted /tmp path.
                        • save, fetch, and apply the patch
                        • run the command bectl unmount default - again, replace default if needed
                        • reboot the system to continue the upgrade as normal
                        P 1 Reply Last reply Reply Quote 2
                        • P
                          pst @marcosm
                          last edited by

                          @marcosm thanks, I'll give it a go :)

                          P 1 Reply Last reply Reply Quote 0
                          • P
                            pst @pst
                            last edited by

                            @marcosm I have tested the fix, and it works (not that I ever doubed it wouldn't)

                            Thanks guys!

                            1 Reply Last reply Reply Quote 1
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.