Capture a log on a lockup?
-
When pfsense locks up, doesn't respond to telnet/ssh/ping/http/dns anything, I force a reboot via the power/reset button.
Are there any log files (in /var/log) that I can go and look at to see why it locked up?
I look at the system log, it doesn't show anything.
2.4 Beta seems to do this every week or so.
thanks
2.4.0-BETA (amd64)
built on Mon Mar 27 13:20:05 CDT 2017
FreeBSD 11.0-RELEASE-p8
Intel(R) Atom(TM) CPU D2550 @ 1.86GHz
4 GB ram, 120 GB SATA SSD.
PPPoE,darkstat, openvpn server (no clients when crashed),ntopng (missing the whole month of March) -
Maybe with an external syslog server?
-
Hi Gentle Joe,
I agree with Hugovsky, remote logging would be ideal. You can define the remote syslog server where you'd like the logs to go under Status > System Log > Settings.
Thank you,
-James
-
Hi!
When pfsense locks up, doesn't respond to telnet/ssh/ping/http/dns anything, I force a reboot via the power/reset button.
What do you see on the monitor when this happens?
I had my pfSense 2.3.something do something like this and what had happened was that it had rebooted but failed to reboot correctly… I would reboot it again and it would work for a while (initially about 7-10 days I think but at the end many times per day).
I look at the system log, it doesn't show anything.
My log didn't contain anything either…
Now in my case it was a dying SSD which was the cause (SMART didn't report anything and the logs didn't indicate any problems with the disk either) and you would have to be pretty unlucky to have your hardware start to fail while you are testing a beta version but it's not impossible...
Good luck and have a nice day!
Nick
-
I enabled a remote syslog server today, I shall see if that captures anything.
I don't have a monitor connected, I will do that too.
Thanks for the suggestions.
-
I captured the issues I believe, on the remote syslog server.
This file shows the last time that anything was written to the log file before I rebooted.
The error points to the SSD. ZFS is the file format.
I performed the tests on the new SSD, no errors are reported. About 2000 hours of life so far.
This drive replaced a HD that wasn't a SSD, pre-2.4 beta release.<2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: Timeout on slot 31 port 0
<2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd 80 serr 00000000 cmd 0004df17
<2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
<2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
<2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked
<2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: ada0 at ahcich2 bus 0 scbus2 target 0 lun 0
<2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: ada0: <corsair force="" ls="" ssd="" s9fm02.0="">s/n 154181170FF10312345C detachedAnd then this..
<2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: Poll timeout on slot 3 port 0
<2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: is 00000000 cs 00000008 ss 00000000 rs 00000008 tfd 80 serr 00000000 cmd 0004c317
<2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
<2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
<2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retries exhaustedand
<2>1 2017-04-06T22:32:02-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: Timeout on slot 10 port 0
<2>1 2017-04-06T22:32:02-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: is 00000000 cs 00000400 ss 00000000 rs 00000400 tfd 80 serr 00000000 cmd 0004ca17
<2>1 2017-04-06T22:32:02-07:00 router.madeupdomain.com kernel - - - kernel: (ada0:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 e8 18 71 40 00 00 00 00 00 00pflog.txt</corsair>
-
Seems you have a problem with your SSD. Have you checked smart status of your disk? Check cables, connections and test the disk, if possible.
-
Definitely some issue with the boot drive if not actually failing. Potentially something with the SATA controller maybe or, as suggested above, simply a loose cable can exhibit in odd ways.
Steve
-
I swapped the SATA data and power cable.
It hasn't had this error since, but I'm keeping it logging to the external syslog. If it shows up again, I'll swap out the drive. thanks