• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Capture a log on a lockup?

Scheduled Pinned Locked Moved 2.4 Development Snapshots
9 Posts 5 Posters 1.3k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G Offline
    GentleJoe
    last edited by Apr 1, 2017, 2:27 AM

    When pfsense locks up, doesn't respond to telnet/ssh/ping/http/dns anything, I force a reboot via the power/reset button.

    Are there any log files (in /var/log) that I can go and look at to see why it locked up?

    I look at the system log, it doesn't show anything.

    2.4 Beta seems to do this every week or so.

    thanks

    2.4.0-BETA (amd64)
    built on Mon Mar 27 13:20:05 CDT 2017
    FreeBSD 11.0-RELEASE-p8
    Intel(R) Atom(TM) CPU D2550 @ 1.86GHz
    4 GB ram, 120 GB SATA SSD.
    PPPoE,darkstat, openvpn server (no clients when crashed),ntopng (missing the whole month of March)

    1 Reply Last reply Reply Quote 0
    • H Offline
      Hugovsky
      last edited by Apr 1, 2017, 8:15 PM

      Maybe with an external syslog server?

      1 Reply Last reply Reply Quote 0
      • ? Offline
        A Former User
        last edited by Apr 1, 2017, 10:13 PM

        Hi Gentle Joe,

        I agree with Hugovsky, remote logging would be ideal. You can define the remote syslog server where you'd like the logs to go under Status > System Log > Settings.

        Thank you,

        -James

        1 Reply Last reply Reply Quote 0
        • K Offline
          Knight
          last edited by Apr 1, 2017, 10:46 PM

          Hi!

          @Gentle:

          When pfsense locks up, doesn't respond to telnet/ssh/ping/http/dns anything, I force a reboot via the power/reset button.

          What do you see on the monitor when this happens?

          I had my pfSense 2.3.something do something like this and what had happened was that it had rebooted but failed to reboot correctly… I would reboot it again and it would work for a while (initially about 7-10 days I think but at the end many times per day).

          @Gentle:

          I look at the system log, it doesn't show anything.

          My log didn't contain anything either…

          Now in my case it was a dying SSD which was the cause (SMART didn't report anything and the logs didn't indicate any problems with the disk either) and you would have to be pretty unlucky to have your hardware start to fail while you are testing a beta version but it's not impossible...

          Good luck and have a nice day!

          Nick

          1 Reply Last reply Reply Quote 0
          • G Offline
            GentleJoe
            last edited by Apr 2, 2017, 2:47 AM

            I enabled a remote syslog server today, I shall see if that captures anything.

            I don't have a monitor connected, I will do that too.

            Thanks for the suggestions.

            1 Reply Last reply Reply Quote 0
            • G Offline
              GentleJoe
              last edited by Apr 7, 2017, 4:42 PM Apr 7, 2017, 4:37 PM

              I captured the issues I believe, on the remote syslog server.

              This file shows the last time that anything was written to the log file before I rebooted.

              The error points to the SSD. ZFS is the file format.

              I performed the tests on the new SSD, no errors are reported. About 2000 hours of life so far.
              This drive replaced a HD that wasn't a SSD, pre-2.4 beta release.

              <2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: Timeout on slot 31 port 0
              <2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd 80 serr 00000000 cmd 0004df17
              <2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
              <2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
              <2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked
              <2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: ada0 at ahcich2 bus 0 scbus2 target 0 lun 0
              <2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: ada0: <corsair force="" ls="" ssd="" s9fm02.0="">s/n 154181170FF10312345C detached

              And then this..

              <2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: Poll timeout on slot 3 port 0
              <2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: is 00000000 cs 00000008 ss 00000000 rs 00000008 tfd 80 serr 00000000 cmd 0004c317
              <2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
              <2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
              <2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted

              and

              <2>1 2017-04-06T22:32:02-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: Timeout on slot 10 port 0
              <2>1 2017-04-06T22:32:02-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: is 00000000 cs 00000400 ss 00000000 rs 00000400 tfd 80 serr 00000000 cmd 0004ca17
              <2>1 2017-04-06T22:32:02-07:00 router.madeupdomain.com kernel - - - kernel: (ada0:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 e8 18 71 40 00 00 00 00 00 00

              pflog.txt</corsair>

              1 Reply Last reply Reply Quote 0
              • H Offline
                Hugovsky
                last edited by Apr 9, 2017, 4:56 PM

                Seems you have a problem with your SSD. Have you checked smart status of your disk? Check cables, connections and test the disk, if possible.

                1 Reply Last reply Reply Quote 0
                • S Offline
                  stephenw10 Netgate Administrator
                  last edited by Apr 10, 2017, 12:55 AM

                  Definitely some issue with the boot drive if not actually failing. Potentially something with the SATA controller maybe or, as suggested above, simply a loose cable can exhibit in odd ways.

                  Steve

                  1 Reply Last reply Reply Quote 0
                  • G Offline
                    GentleJoe
                    last edited by Apr 14, 2017, 11:32 PM

                    I swapped the SATA data and power cable.
                    It hasn't had this error since, but I'm keeping it logging to the external syslog. If it shows up again, I'll swap out the drive. thanks

                    1 Reply Last reply Reply Quote 0
                    9 out of 9
                    • First post
                      9/9
                      Last post
                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                      This community forum collects and processes your personal information.
                      consent.not_received