Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Capture a log on a lockup?

    Scheduled Pinned Locked Moved 2.4 Development Snapshots
    9 Posts 5 Posters 1.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • GentleJoeG
      GentleJoe
      last edited by

      When pfsense locks up, doesn't respond to telnet/ssh/ping/http/dns anything, I force a reboot via the power/reset button.

      Are there any log files (in /var/log) that I can go and look at to see why it locked up?

      I look at the system log, it doesn't show anything.

      2.4 Beta seems to do this every week or so.

      thanks

      2.4.0-BETA (amd64)
      built on Mon Mar 27 13:20:05 CDT 2017
      FreeBSD 11.0-RELEASE-p8
      Intel(R) Atom(TM) CPU D2550 @ 1.86GHz
      4 GB ram, 120 GB SATA SSD.
      PPPoE,darkstat, openvpn server (no clients when crashed),ntopng (missing the whole month of March)

      1 Reply Last reply Reply Quote 0
      • H
        Hugovsky
        last edited by

        Maybe with an external syslog server?

        1 Reply Last reply Reply Quote 0
        • ?
          A Former User
          last edited by

          Hi Gentle Joe,

          I agree with Hugovsky, remote logging would be ideal. You can define the remote syslog server where you'd like the logs to go under Status > System Log > Settings.

          Thank you,

          -James

          1 Reply Last reply Reply Quote 0
          • K
            Knight
            last edited by

            Hi!

            @Gentle:

            When pfsense locks up, doesn't respond to telnet/ssh/ping/http/dns anything, I force a reboot via the power/reset button.

            What do you see on the monitor when this happens?

            I had my pfSense 2.3.something do something like this and what had happened was that it had rebooted but failed to reboot correctly… I would reboot it again and it would work for a while (initially about 7-10 days I think but at the end many times per day).

            @Gentle:

            I look at the system log, it doesn't show anything.

            My log didn't contain anything either…

            Now in my case it was a dying SSD which was the cause (SMART didn't report anything and the logs didn't indicate any problems with the disk either) and you would have to be pretty unlucky to have your hardware start to fail while you are testing a beta version but it's not impossible...

            Good luck and have a nice day!

            Nick

            1 Reply Last reply Reply Quote 0
            • GentleJoeG
              GentleJoe
              last edited by

              I enabled a remote syslog server today, I shall see if that captures anything.

              I don't have a monitor connected, I will do that too.

              Thanks for the suggestions.

              1 Reply Last reply Reply Quote 0
              • GentleJoeG
                GentleJoe
                last edited by

                I captured the issues I believe, on the remote syslog server.

                This file shows the last time that anything was written to the log file before I rebooted.

                The error points to the SSD. ZFS is the file format.

                I performed the tests on the new SSD, no errors are reported. About 2000 hours of life so far.
                This drive replaced a HD that wasn't a SSD, pre-2.4 beta release.

                <2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: Timeout on slot 31 port 0
                <2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd 80 serr 00000000 cmd 0004df17
                <2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
                <2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
                <2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked
                <2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: ada0 at ahcich2 bus 0 scbus2 target 0 lun 0
                <2>1 2017-04-06T22:25:51-07:00 router.madeupdomain.com kernel - - - kernel: ada0: <corsair force="" ls="" ssd="" s9fm02.0="">s/n 154181170FF10312345C detached

                And then this..

                <2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: Poll timeout on slot 3 port 0
                <2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: is 00000000 cs 00000008 ss 00000000 rs 00000008 tfd 80 serr 00000000 cmd 0004c317
                <2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
                <2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
                <2>1 2017-04-06T22:28:48-07:00 router.madeupdomain.com kernel - - - kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted

                and

                <2>1 2017-04-06T22:32:02-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: Timeout on slot 10 port 0
                <2>1 2017-04-06T22:32:02-07:00 router.madeupdomain.com kernel - - - kernel: ahcich2: is 00000000 cs 00000400 ss 00000000 rs 00000400 tfd 80 serr 00000000 cmd 0004ca17
                <2>1 2017-04-06T22:32:02-07:00 router.madeupdomain.com kernel - - - kernel: (ada0:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 e8 18 71 40 00 00 00 00 00 00

                pflog.txt</corsair>

                1 Reply Last reply Reply Quote 0
                • H
                  Hugovsky
                  last edited by

                  Seems you have a problem with your SSD. Have you checked smart status of your disk? Check cables, connections and test the disk, if possible.

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Definitely some issue with the boot drive if not actually failing. Potentially something with the SATA controller maybe or, as suggested above, simply a loose cable can exhibit in odd ways.

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • GentleJoeG
                      GentleJoe
                      last edited by

                      I swapped the SATA data and power cable.
                      It hasn't had this error since, but I'm keeping it logging to the external syslog. If it shows up again, I'll swap out the drive. thanks

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.