Pfsense 2.6 and 2.7 crash on Zotac Mini PC
Most significant errors are shown on the console whether or not you're logged in. If it lost access to the boot drive for example you would see a bunch of disk errors.
If it is still responsive and there are no errors then you should try to connect out from the command line and see what still works, if anything, and what errors that produces.
@stephenw10 OK thanks.
In /var/log, based on the symptoms above, what should I be looking for? Could it be a DHCP problem?
No you wouldn't lose access from the LAN side if it was dhcp.
It could be the LAN side NIC seeing some error. Is it a Realtek NIC?
I would expect to see something logged after rebooting from the driver though.Some Zotac boxes had issues with a conflicting driver for the sd/mmc device that required it be disabled. That prevented it booting entirely though, I haven't seen that happen after boot.
If you find the console is still responsive run
andcat /var/log/system.log
and see what errors are shown. -
@stephenw10 Here's system.log from the time of the crash. Note the hole in time in the log. Looks like logging fails even though pfsense worked enough that I didn't notice the failure until almost 2 hours later when I rebooted.
Aug 24 18:44:36 sshd[15774]: Disconnected from authenticating user root port 43266 [preauth] Aug 24 18:44:36 sshguard[25997]: Attack from "" on service SSH with danger 10. Aug 24 18:45:02 sshd[87435]: Invalid user network from port 56558 Aug 24 18:45:02 sshguard[25997]: Attack from "" on service SSH with danger 10. Aug 24 18:45:02 sshd[87435]: Received disconnect from port 56558:11: Bye Bye [preauth] Aug 24 18:45:02 sshd[87435]: Disconnected from invalid user network port 56558 [preauth] Aug 24 18:45:02 sshguard[25997]: Attack from "" on service SSH with danger 10. Aug 24 18:45:39 sshguard[25997]: unblocking after 969 secs Aug 24 18:46:41 sshd[10619]: Received disconnect from port 60080:11: Bye Bye [preauth] Aug 24 18:46:41 sshd[10619]: Disconnected from authenticating user root port 60080 [preauth] Aug 24 18:46:41 sshguard[25997]: Attack from "" on service SSH with danger 10. Aug 24 18:46:53 sshguard[25997]: unblocking after 995 secs Aug 24 20:21:39 syslogd: kernel boot file is /boot/kernel/kernel Aug 24 20:21:39 kernel: ---<<BOOT>>--- Aug 24 20:21:39 kernel: Copyright (c) 1992-2023 The FreeBSD Project. Aug 24 20:21:39 kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Aug 24 20:21:39 kernel: The Regents of the University of California. All rights reserved. Aug 24 20:21:39 kernel: FreeBSD is a registered trademark of The FreeBSD Foundation. Aug 24 20:21:39 kernel: FreeBSD 14.0-CURRENT #1 RELENG_2_7_0-n255866-686c8d3c1f0: Wed Jun 28 04:21:19 UTC 2023
Looks like you have SSH open to the world which is generally not a good idea.
Nothing logged at all like that could be a failing drive. It can take a while after the driver goes AWOL for the firewall services to all fail.
@stephenw10 I'm not seeing anything that indicates a failing drive, am I missing something?
>smartctl --test long /dev/ada0 smartctl 7.3 2022-02-28 r5338 [FreeBSD 14.0-CURRENT amd64] (local build) Copyright (C) 2002-22, Bruce Allen, Christian Franke, === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Fri Aug 25 09:11:00 2023 EDT Use smartctl -X to abort test. smartctl -a /dev/ada0 smartctl 7.3 2022-02-28 r5338 [FreeBSD 14.0-CURRENT amd64] (local build) Copyright (C) 2002-22, Bruce Allen, Christian Franke, === START OF INFORMATION SECTION === Device Model: SC2 MSATA SSD Serial Number: 39DD07471ED000000340 Firmware Version: S9FM01.9 User Capacity: 60,022,480,896 bytes [60.0 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches TRIM Command: Available Device is: Not in smartctl database 7.3/5319 ATA Version is: ACS-3 (minor revision not indicated) SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Fri Aug 25 09:14:31 2023 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 30) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 2) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 23637 12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always - 3175 168 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0 170 Unknown_Attribute 0x0013 100 100 010 Pre-fail Always - 21 173 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 674825059 192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 53 194 Temperature_Celsius 0x0023 070 070 000 Pre-fail Always - 30 218 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0 241 Total_LBAs_Written 0x0012 100 100 000 Old_age Always - 16018186 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 12 - SMART Selective self-test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
Nope I don't see anything there either.
@stephenw10 Hmm, OK, so if the there's nothing in /var/crash, and the system log shows nothing on failure, and the disk looks OK, is there anything else to do or try?
Wait for it to fail again and check on the local console. If it's responsive try connecting out.
@stephenw10 Just to provide more information, here are my installed packages:
I have not installed any system patches, I just installed the package.
@stephenw10 When you say "connect out", do you mean ping, curl, etc., or something else?
Any of those. I would start with ping to a local IP, then to some remote IP.