long boot times when using wireguard VPN (was: Syslogs starts late in the boot process leading to misrepresentation of events (incorrect time stamps))
-
I am trying to understand why a reboot takes over five minutes on my system. Looking at the pfSense system log I can see that there is a large gap in log entries between the syslogd stopping and restarting.
Apr 19 22:29:33 kernel pflog0: promiscuous mode disabled Apr 19 22:29:33 syslogd kernel boot file is /boot/kernel/kernel Apr 19 22:25:04 syslogd exiting on signal 15 Apr 19 22:25:04 reboot 54300 rebooted by root
That's a four minute gap!
I hooked up a monitor to the system and can see a lot of logging on the console which doesn't get logged in the syslog, but unfortunately I can't get the serial console to work to log everything permanently.This screenshot shows the majority of the time being taken up by wireguard establishing tunnels, one at the time, and I have quite a few...
I can see the log entries in the screenshot later in the syslog when they are all dumped in there, together with other boot messages but they all get the same timestamp:
Apr 19 22:29:33 kernel wg5: changing name to 'tun_wg6' Apr 19 22:29:33 kernel tun_wg5: link state changed to UP Apr 19 22:29:33 kernel wg4: changing name to 'tun_wg5' Apr 19 22:29:33 kernel [fib_algo] inet.0 (bsearch4#39) rebuild_fd_flm: switching algo to radix4_lockless Apr 19 22:29:33 kernel tun_wg4: link state changed to UP Apr 19 22:29:33 kernel wg3: changing name to 'tun_wg4' Apr 19 22:29:33 kernel tun_wg2: link state changed to UP Apr 19 22:29:33 kernel wg2: changing name to 'tun_wg2' Apr 19 22:29:33 kernel pflog0: promiscuous mode enabled Apr 19 22:29:33 kernel igb2: link state changed to UP
I would like syslogd to start earlier to capture these events, with correct time stamps. Is there a trick/hack to force syslogd to start earlier in the boot process? What could be the reason why ir's not starting earlier?
-
What are you running that on? That gap seems unexpectedly large. I'm not aware of any way to get syslogd to start earlier though.
-
@stephenw10 Thanks, it's a non-Netgate box with an "Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz". Nothing wrong with the processing power. The screenshot was taken from a five minute video I took at the same time of the monitor. There's nothing going on but the slow establishments of wireguard tunnels.
I'll put on my dev hat and go hacking into the logging system, there must be a way of temporarily saving the events with timestamps without using syslog.
-
@stephenw10 An intermediate report. I have implemented a basic PHP tracing system which gave me a first timing from /etc/rc.bootup, the time stamps are prior to the execution of the function next to it:
Logging /etc/rc.bootup_with_trace 24/04/2024 11:09:16 L75: led_normalize(); 24/04/2024 11:09:16 L122: notices_setup(); 24/04/2024 11:09:16 L138: system_dmesg_save(); 24/04/2024 11:09:16 L141: system_check_reset_button(); 24/04/2024 11:09:16 L167: mute_kernel_msgs(); 24/04/2024 11:09:16 L168: start_devd(); 24/04/2024 11:09:18 L169: set_device_perms(); 24/04/2024 11:09:18 L170: unmute_kernel_msgs(); 24/04/2024 11:09:18 L174: parse_config_bootup(); 24/04/2024 11:09:18 L177: mwexec("/usr/sbin/gnid > {$g['vardb_path']}/uniqueid 2>/dev/null"); 24/04/2024 11:09:18 L194: convert_config(); 24/04/2024 11:09:19 L230: load_crypto(); 24/04/2024 11:09:19 L235: system_setup_sysctl(); 24/04/2024 11:09:19 L248: system_do_shell_commands(1); 24/04/2024 11:12:48 L252: load_thermal_hardware(); 24/04/2024 11:12:48 L255: system_timezone_configure(); 24/04/2024 11:12:48 L258: system_hostname_configure(); 24/04/2024 11:12:50 L261: system_hosts_generate(); 24/04/2024 11:12:55 L264: enable_carp(false); 24/04/2024 11:12:55 L267: interfaces_loopback_configure(); 24/04/2024 11:12:55 L270: ca_setup_trust_store(); 24/04/2024 11:12:55 L277: exec("/sbin/zfs get -H compression /var/log | /usr/bin/awk '{print $3;}'", $varlog_compression_state); 24/04/2024 11:12:55 L285: system_syslogd_start(); 24/04/2024 11:12:57 L288: set_pam_auth(); 24/04/2024 11:12:57 L296: switches_configure(); 24/04/2024 11:12:57 L300: setup_microcode(); 24/04/2024 11:12:58 L321: interfaces_configure(); 24/04/2024 11:13:07 L322: interfaces_sync_setup(); 24/04/2024 11:13:07 L328: system_hosts_generate(); 24/04/2024 11:13:07 L334: send_event("service reload sshd"); 24/04/2024 11:13:07 L340: openvpn_resync_all(); 24/04/2024 11:13:07 L344: system_resolvconf_generate(); 24/04/2024 11:13:07 L347: filter_pflow_configure(); 24/04/2024 11:13:07 L350: filter_configure_sync(); 24/04/2024 11:13:09 L354: filter_pflog_start(); 24/04/2024 11:13:09 L358: setup_gateways_monitor(); 24/04/2024 11:13:11 L362: system_routing_configure(); 24/04/2024 11:13:12 L371: services_dnsmasq_configure(); 24/04/2024 11:13:12 L374: services_unbound_configure(); 24/04/2024 11:13:22 L378: local_reset_accounts(); 24/04/2024 11:13:22 L383: configure_cron(); 24/04/2024 11:13:22 L387: system_routing_enable(); 24/04/2024 11:13:22 L424: mwexec("/usr/bin/timeout -k 45 30 /usr/local/sbin/ntpd -g -q -c /dev/null {$ntp_boot_time_servers}"); 24/04/2024 11:13:29 L429: system_ntp_configure(); 24/04/2024 11:13:29 L437: system_webgui_start(); 24/04/2024 11:13:31 L441: console_configure(); 24/04/2024 11:13:31 L444: services_dhcpd_configure(); 24/04/2024 11:13:31 L447: system_dhcpleases_configure(); 24/04/2024 11:13:31 L450: services_dhcrelay_configure(); 24/04/2024 11:13:31 L453: services_dhcrelay6_configure(); 24/04/2024 11:13:31 L456: send_event("service reload dyndnsall"); 24/04/2024 11:13:31 L459: filter_configure_sync(); 24/04/2024 11:13:32 L462: vpn_setup(); 24/04/2024 11:13:32 L465: captiveportal_configure(); 24/04/2024 11:13:32 L468: voucher_configure(); 24/04/2024 11:13:32 L471: system_do_shell_commands(); 24/04/2024 11:13:32 L474: $ipsec_dynamic_hosts = ipsec_configure(); 24/04/2024 11:13:34 L477: services_snmpd_configure(); 24/04/2024 11:13:36 L480: system_set_harddisk_standby(); 24/04/2024 11:13:36 L483: reload_ttys(); 24/04/2024 11:13:36 L486: enable_rrd_graphing(); 24/04/2024 11:13:37 L489: enable_watchdog(); 24/04/2024 11:13:37 L519: services_igmpproxy_configure(); 24/04/2024 11:13:37 L522: upnp_start(); 24/04/2024 11:13:37 L525: activate_powerd(); 24/04/2024 11:13:37 L528: prefer_ipv4_or_ipv6(); 24/04/2024 11:13:37 L536: register_all_installed_packages(); 24/04/2024 11:13:37 L541: system_syslogd_start(); 24/04/2024 11:13:39 L548: enable_carp(); 24/04/2024 11:13:39 L572: filter_configure(); 24/04/2024 11:13:39 L576: led_normalize(); 24/04/2024 11:13:39 L578: notify_all_remote("Bootup complete");
The execution of system_do_shell_commands() is taking up most of the boot time, 3.5 minutes
24/04/2024 11:09:19 L248: system_do_shell_commands(1); 24/04/2024 11:12:48 L252: load_thermal_hardware();
I will continue digging.
-
@pst said in Syslogs starts late in the boot process leading to misrepresentation of events (incorrect time stamps):
Apr 19 22:29:33 syslogd kernel boot file is /boot/kernel/kernel
Apr 19 22:25:04 syslogd exiting on signal 15Yep, over 4 minutes. A bit long indeed.
But like any other OS there : how would it be able to "log" ?
Its the chicken and egg problem. How can something get written to a disk, if the device doesn't even know it has a 'disk' to write to ? Better : when a system boot, some BIOS code starts, and it doesn't even know that it is going to boot Windows, or Debian, or pfSense .... or your coffee machine (most coffee machines have no disk ^^).
So, no way, to access disks, file systems, file acls etc ...
Only when the system detected that it has a 'drive', and this drive has a bootable partition, and this bootable partition is loaded, and the OS kick started, and the OS (kernel) start to make an inventory of the entire system, and it will detect 'user drives', starts up multi processing, kicks of the first process, then launches all system processes, like "syslogd", only then, at that moment, that process can start collecting 'logs' and write them to a disk.But ... don't worry, there are solutions.
The first one : for devices with a graphics card : hook up that screen, and power up the system. You will "see" boot info. You have probably a VGA plug : use it.
If there is no graphics adapter - my SG4100 has none - then use the serial "RS232" access. This is even better as now you can log, if needed, what the boot process produced.The kernel itself, before it can actually 'log to a disk' will keep the log into its own memory before writing it out way latter : see here : Status > System Logs > System > OS Boot
@pst said in Syslogs starts late in the boot process leading to misrepresentation of events (incorrect time stamps):
24/04/2024 11:09:19 L248: system_do_shell_commands(1);
Normally, there are none.
If there are any, ask the admin, as he put them there.
This package exists :and go here : Services>Shellcmd Settings
so you can see them (before, we were editing the config.xml ourselves)Btw : in my config.xml :
so the patches packages also installs 'hook' so it can patch the system as soon as possible, if needed.
Right now, on my 24.03 Release, the patches package is pretty empty. -
@Gertjan Thanks, I don't use the Shellcmd package so any commands originate from pfSense itself.
But your suggestion to look in the config.xml was useful. It confirms my suspicion of wireguuard being the time thief. The wireguard daemon is started early:
<earlyshellcmd>service wireguardd start</earlyshellcmd>
-
@stephenw10 I have now got detailed tracing on wireguardd service start during bootup. I will raise a redmine on the current behaviour which is extending the boot time by several minutes, preventing access to the pfSense.
-
Great! Yeah open a ticket and we can review it.
-
@stephenw10 I have opened https://redmine.pfsense.org/issues/15435 Sorry, it turned out a bit of an essay :)
-
No worries, more info is almost always better.
-