Bootup configuration not loading from USB
-
Netgate XG-7100
Pfsense 21.05.1-RELEASEI am trying to utilize the feature Restore using the External Configuration Locator (ECL)
- I have valid config in
/config/config.xml
on the USB device. - When the system reboots, it does not load new config from the USB.
Here are some boot log snippets related to USB that I saw:
Jan 11 18:41:16 fw1 kernel: xhci0: <Intel Denverton USB 3.0 controller> mem 0xdfb60000-0xdfb6ffff irq 19 at device 21.0 on pci0 Jan 11 18:41:16 fw1 kernel: xhci0: 32 bytes context size, 64-bit DMA Jan 11 18:41:16 fw1 kernel: usbus0 on xhci0 Jan 11 18:41:16 fw1 kernel: usbus0: 5.0Gbps Super Speed USB v3.0 [...] Jan 11 18:41:16 fw1 kernel: umass0 on uhub0 Jan 11 18:41:16 fw1 kernel: umass0: <PNY USB 2.0 FD, class 0/0, rev 2.00/1.00, addr 3> on usbus0 [...] Jan 11 18:41:16 fw1 kernel: da0 at umass-sim0 bus 0 scbus5 target 0 lun 0 Jan 11 18:41:16 fw1 kernel: da0: <PNY USB 2.0 FD PMAP> Removable Direct Access SPC-2 SCSI device Jan 11 18:41:16 fw1 kernel: da0: Serial Number 071917E1598E6375 Jan 11 18:41:16 fw1 kernel: da0: 40.000MB/s transfers Jan 11 18:41:16 fw1 kernel: da0: 29604MB (60628992 512 byte sectors) Jan 11 18:41:16 fw1 kernel: da0: quirks=0x2<NO_6_BYTE>
But after that nothing seemed to jump out regarding config loading (or errors regarding the USB)
There are no log entries that I can see in system log forExternal config loader
-- error, info, or otherwise.At the shell, I can mount the drive, and inspect config:
$ /sbin/mount_msdosfs /dev/da0s1 /mnt $ ls -l /mnt/config/config.xml -rwxr-xr-x 1 root wheel 55927 Jan 7 17:22 /mnt/config/config.xml
Filesystem on the device is msdosfs, which I am pretty sure is just FAT:
$ fstyp /dev/da0s1 msdosfs
- I have valid config in
-
When pfSense boot, the main /etc/rc get started. At this moment, pfSense is basicly a FreeBSD OS, the /etc/rc will transform the system in what pfSense is.
The first thing /etc/rc does, it : running "/etc/pfSense-rc"./etc/pfSense-rc will run (should run) : line 394 :
# Launch external configuration loader /usr/local/sbin/fcgicli -f /etc/ecl.php
/etc/ecl.php will look for a USB device "that isn't the boot partition and isn't the swap partition".
Look for the "find_config_xml()" function : it tries tp mount the usb drive and locate the config.xml file.
I tell you all this, because you can see the mine that are logged == shown on the console during boot.
The (my) question is : what is the console showing you during boot ?
-
@gertjan Thanks.
Accessing the serial console was the key here. However I'm still not quite sure what's going on:
Welcome to Netgate pfSense Plus 21.05.1-RELEASE... No core dumps found. ...ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib ugen0.2: <Prolific Technology Inc. USB-Serial Controller D> at usbus0 compatibility ldconfig path: done. ugen0.3: <Prolific Technology Inc. USB-Serial Controller D> at usbus0 >>> Removing vituplcom1 on uhub0 uplcom1: <Prolific Technology Inc. USB-Serial Controller D, class 0/0, rev 1.10/4.00, addr 2> on usbus0 al flag from php74... done. External config loader 1.0 is now starting... mmcsd0s1 mmcsd0s1a mmcsd0s1b Launching the init system...Updating CPU Microcode... f<XSAVEOPT,XSAVEC,XINUSE,XSAVES> IA32_ARCH_CAPS=0x69<RDCL_NO,SKIP_L1DFL_VME> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr TSC: P-state invariant, performance statistics Done. ugen0.4: <PNY USB 2.0 FD> at usbus0 umass0 on uhub0 umass0: <PNY USB 2.0 FD, class 0/0, rev 2.00/1.00, addr 3> on usbus0 ...Updating configuration...done. Checking config backups consistency.............................done. Setting up extended sysctls...done. coretemp0: <CPU On-Die Thermal Sensors> on cpu0 Setting timezone...done. Configuring looplo0: link state changed to UP back interface...done. da0 at umass-sim0 bus 0 scbus5 target 0 lun 0 da0: <PNY USB 2.0 FD PMAP> Removable Direct Access SPC-2 SCSI device da0: Serial Number 071917E1598E6375 da0: 40.000MB/s transfers da0: 29604MB (60628992 512 byte sectors) da0: quirks=0x2<NO_6_BYTE> Starting syslog...done. Starting Secure Shell Services...done. Configuring switch...done. Setting up interfaces microcode...done. Starting PC/SC Smart Card Services...done. Configuring loopback interface...done. Creating wireless clone interfaces...done. Configuring LAGG interfaces...done. Configuring VLAN interfaces...done. Configuring QinQ interfaces...done. Configuring WAN interface...done. Configuring LAN interface...done. Configuring LOOPBACK interface...done. Configuring ETH6 interface...done. Configuring ETH8 interface...done. Configuring IPsec VTI interfaces...done. Configuring CARP settings...done. Syncing OpenVPN settings...done. ...done. Starting PFLOG...done. Setting up gateway monitors...done. Setting up static routes...done. Synchronizing user settings...done. Starting webConfigurator...done. Configuring CRON...done. ...done. Configuring IPsec VPN... route: route has not been found route: route has not been found done Generating RRD graphs...done. done. Starting CRON... done. Starting package AWS VPC Wizard...done. Starting package IPsec Profile Wizard...done. Starting package FRR...done. Starting package node_exporter...done. Starting package lldpd...done.
From here it looks like it loaded configs and stopped on routes in the IPsec section? The change I was expecting to see from the new config is regarding DNS servers and package settings (updated in the XML) yet they are not present on the fw after a reboot. Is the entire reconfig aborted/rolled back if it fails on one part?
I appreciate your help!
-
It didn't load a config via the ECL at that boot:
External config loader 1.0 is now starting... mmcsd0s1 mmcsd0s1a mmcsd0s1b
That would list the USB drive as da0sX and show a config found and loaded if it had.
And it looks like that's because it didn't find the USB device until after the ECL had run. Later in the boot we see:
da0 at umass-sim0 bus 0 scbus5 target 0 lun 0 da0: <PNY USB 2.0 FD PMAP> Removable Direct Access SPC-2 SCSI device da0: Serial Number 071917E1598E6375 da0: 40.000MB/s transfers da0: 29604MB (60628992 512 byte sectors) da0: quirks=0x2<NO_6_BYTE>
Was it inserted after boot? Was the the first boot?
When you restore a config file it loads the complete config or nothing. It would only not load it if it's invalid.
Does it boot completely without the USB drive?
Try entering
ctl+t
at the console at that point. It should respond with whatever it's waiting for.Steve
-
@stephenw10 Thank you
The USB drive is permanently installed in the device. It's never removed. This is not the first boot, the device is operational, I'm just trying to make minor changes to the config (dns servers and add some fw rules).
Getting physical access to the device would require me booking a flight :) In an emergency I would create a ticket with on-site COLO personnel to access the physical device if anything was needed like pulling the drive or patching cables/etc. I am currently accessing it via serial console cable from another system that I'm remotely connected to.
A little background:
Our company has dozens of these netgate devices all over the world so the goal is to program them remotely and in batches using configuration management with no need for GUI clicks since there are so many of them and we need to keep their configs in-sync. Using ECL seems like the best way to do that unless we write a whole SDK from scratch around the php-shell since there is no API which is a bit unfeasible at this time.PS - I did a diff on my config.xml and the current running config and it looks good and valid.
Try entering ctl+t at the console at that point. It should respond with whatever it's waiting for.
At what point?
Thanks again
-
Ok, well it will try to pull in a config via the ECL at every boot so be aware of that.
It isn't pulling in config there because the USB device is not detected early enough. It must be a very slow USB device as the 7100 has a loader line to wait for it and works with most devices.
One way around that is to re-root instead of rebooting. That doesn't disconnect the USB so will almost always work.
Entering
ctl+t
when the console appears unresponsive can often show why.Steve
-
it will try to pull in a config via the ECL at every boot so be aware of that.
Yep the idea is to use the USB as "startup config" -- as the source of truth for all boot ups.
It must be a very slow USB device
It's a USB 3.0 brand new PNY stick. Purchased in 2021. About as fast as you can get unfortunately. Not sure what else to buy.
re-root instead of rebooting
I'll look into this, thanks.
-
It's not the transfer speed of the interface it's the time it takes to initialise the drive. Which seems random!
But you might check that /boot/loader.conf contains the kern.cam.boot_delay line. It should be present by default.
It could have been overridden by a value in /boot/loader.conf.local if you're using that.Steve
-
Interestingly there are two identical lines in
/boot/loader.conf
:$ grep cam.boot_delay /boot/loader.conf kern.cam.boot_delay=10000 kern.cam.boot_delay=10000
/boot/loader.conf.local
doesn't exist.So are you saying I could create a
/boot/loader.conf.local
and add the linekern.cam.boot_delay=30000
to make it wait longer for the device? -
Yes, though I wouldn't expect 30000 to have much effect there over 10000. It's worth trying.
-
@stephenw10 Re-root worked! I'll go with this route moving forward.
Appreciate the help.
-
@stephenw10 said in Bootup configuration not loading from USB:
One way around that is to re-root instead of rebooting. That doesn't disconnect the USB so will almost always work.
Any quick hint to what a re-root is & how2 do ?
/Bingo
-
Reroot is one of the options presented on selecting reboot from the GUI or the console.
"Performs a “reroot” style reboot, which is faster than a traditional reboot but does not restart the entire operating system. All running processes are killed, all filesystems are remounted, and then the system startup sequence is run again. This type of restart is much faster as it does not reset the hardware, reload the kernel, or need to go through the hardware detection process."
-
-
reroot is covered in docs (link) -- although it says it's faster than a reboot it still took a few mins.
-
To perform a reroot, choose option
r
when triggering a reboot from the terminal menu. -
Since everything in my stack is automated (no human touch where we can help it), I just made a simple php script that triggers like this (php) and it's called through our automation engine:
require_once("functions.inc"); system_reboot_sync(true);
the
true
in system_reboot_sync(true) tells it to do a reroot instead of a reboot. It's not documented, but I found it in the source code here
-