[solved] 25.03.b.20250507.1611 Dashboard Alarm Bell after the upgrade reboot
-
@stephenw10 said in 25.03.b.20250507.1611 Dashboard Alarm Bell after the upgrade reboot:
What's in that interface group?
My nine VLANs, which are all configured on igb1 (lan)
IF I find some time I can go back and perform the upgrade again. Are there anything in particular I should check for? I'll keep a copy of the initial /tmp/rules.debug to check for differences after a second reboot.
-
I would expect there to be some error logged when the system tries to generate the alias. If it doesn't then it might not be trying; so some ordering issue.
Do you have Nexus/MIM enabled?
-
@stephenw10 said in 25.03.b.20250507.1611 Dashboard Alarm Bell after the upgrade reboot:
Do you have Nexus/MIM enabled?
no, never had
-
What did you upgrade from?
-
@stephenw10 said in 25.03.b.20250507.1611 Dashboard Alarm Bell after the upgrade reboot:
What did you upgrade from?
the previous beta version, 25.03.b.20250429.1329
-
Hmm, I can't replicate that. An interface group table loads fine here at all points.
Probably going to need some debug info from your install if you can get it.
-
@stephenw10 I went back to the previous beta and completed another upgrade. The result was exactly the same, an error on the same variable ALL_LANS:
Filter Reload There were error(s) loading the rules: /tmp/rules.debug:684: macro 'ALL_VLANS__NETWORK' not defined - The line in question reads [684]: pass in quick on $LAN inet from $admin_devices to $ALL_VLANS__NETWORK ridentifier 1746201666 keep state label "USER_RULE: Allow admin access to every VLAN" label "id:1746201666" @ 2025-05-09 17:11:36
The error doesn't appear in the system log as it is triggered before the syslog is started but I manage to find it in the console output. It seems to be triggered by the reception of WAN configuration (DHCP) very early during the boot, rc.newwanip is run which seems to trigger the filter reload.
<snip> igb0: link state changed to DOWN overwrite! Loading package configuration... done. Configuring package components... Loading package instructions... Custom commands... Executing custom_php_install_command()... Rebuilding GeoIP tabs...2025-05-09T17:11:29.258105+02:00 - php-fpm 601 - - /rc.linkup: Ignoring link event during boot sequence. 2025-05-09T17:11:29.512131+02:00 - php-fpm 601 - - /rc.linkup: DHCP Client not running on wan (igb0), reconfiguring dhclient. 2025-05-09T17:11:29.521576+02:00 - php-fpm 601 - - /rc.linkup: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf -p /var/run/dhclient.igb0.pid igb0 > /tmp/igb0_output 2> /tmp/igb0_error_output' returned exit code '1', the output was '' igb0: link state changed to UP 2025-05-09T17:11:33.961711+02:00 - php-fpm 602 - - /rc.newwanip: rc.newwanip: Info: starting on igb0. 2025-05-09T17:11:33.961994+02:00 - php-fpm 602 - - /rc.newwanip: rc.newwanip: on (IP address: X.X.X.X) (interface: 0WAN[wan]) (real interface: igb0). gif0: link state changed to UP 2025-05-09T17:11:34.336889+02:00 - php-fpm 602 - - /rc.newwanip: Gateway, switch to: WAN_DHCP 2025-05-09T17:11:34.341573+02:00 - php-fpm 602 - - /rc.newwanip: Default gateway setting Interface WAN_DHCP Gateway as default. 2025-05-09T17:11:34.354551+02:00 - php-fpm 602 - - /rc.newwanip: Gateway, switch to: WANV6_TUNNELV6 2025-05-09T17:11:34.359146+02:00 - php-fpm 602 - - /rc.newwanip: Default gateway setting Interface WANV6_TUNNELV6 Gateway as default. pflog0: promiscuous mode enabled load_dn_sched dn_sched FIFO loaded load_dn_sched dn_sched QFQ loaded load_dn_sched dn_sched RR loaded load_dn_sched dn_sched WF2Q+ loaded load_dn_sched dn_sched PRIO loaded load_dn_sched dn_sched FQ_CODEL loaded load_dn_sched dn_sched FQ_PIE loaded load_dn_aqm dn_aqm CODEL loaded load_dn_aqm dn_aqm PIE loaded 2025-05-09T17:11:36.226321+02:00 - php-fpm 602 - - /rc.newwanip: New alert found: There were error(s) loading the rules: /tmp/rules.debug:684: macro 'ALL_VLANS__NETWORK' not defined - The line in question reads [684]: pass in quick on $LAN inet from $admin_devices to $ALL_VLANS__NETWORK ridentifier 1746201666 keep state label "USER_RULE: Allow admin access to every VLAN" label "id:1746201666" 2025-05-09T17:11:36.226456+02:00 - php-fpm 602 - - done. Adding pfBlockerNG Widget to the Dashboard... done. Creating Firewall filter service... done. Renew Firewall filter executables... done. Starting Firewall filter Service... done. <snip>
I checked the rules.debug immediately after the upgrade and the content is fine at that point:
[25.03-BETA][root@pfsense.local.lan]/root: grep ALL_VLANS rules.debug-after-upgrade ALL_VLANS = "{ ALL_VLANS }" table <ALL_VLANS__NETWORK> persist { 192.168.10.254/24 192... } ALL_VLANS__NETWORK = "<ALL_VLANS__NETWORK>"
ALL_VLANS is as I mentioned a list of nine VLANs. They all have static IPv4/v6 configuration - apart from one which is tracking the WAN DHCPv6. Not sure if that could be causing issues? Perhaps ALL_VLANS isn't created until all member addresses are available? [speculation]
Another reason why this is only seen at the initial reboot, and not subsequent, might be that the upgrade reboot takes longer, because of upgrade tasks, which means the WAN configuration is received much earlier in the boot process.
/etc/rc.newwanip seems to be able to detect and act based on is_platform_booting(), so perhaps there is just a bit of logic missing to prevent calling filter_configure_sync() in this scenario?
-
Hmm, I imagine something in your config takes longer to load than my test box.
Are you able to upload that config for us to test with? If not I'll try to create something.
-
@stephenw10 yes I can upload it somewhere private. It should be directly loadable into a box with two igb interfaces, igb0 is WAN and igb1 is LAN. If you want the box to complete the boot process you probably need the patch in https://redmine.pfsense.org/issues/15435#note-11 Without it wireguard will lock up as some of the wg peers are configured with FQDN. I'll also add my console output so you see the timing of the error in my bootup.
-
See https://redmine.pfsense.org/issues/16182
If you're using ZFS and want to test the fix, try the following:
- revert back to the snapshot before the upgrade
- enable "Defer Automatic Reboot" under System > Update > Update Settings
- run the system upgrade - do not reboot after it's done - and wait until it completes
- go to Diagnostics > Command Prompt and run the command
bectl mount default
- if needed replacedefault
with the name of the boot environment that's being upgraded; this will output a path in/tmp
, make a note of it - create a new patch using commit
a8e5ba643026ee11001dbeff48246ec9fbd07cc9
and set the patch's base directory to the noted /tmp path. - save, fetch, and apply the patch
- run the command
bectl unmount default
- again, replacedefault
if needed - reboot the system to continue the upgrade as normal
-
@marcosm thanks, I'll give it a go :)
-
@marcosm I have tested the fix, and it works (not that I ever doubed it wouldn't)
Thanks guys!