PfSense 2.4.2 crashing on PC engines apu2 at random times
-
I have a similar problem with mine. Mine seems to lock up maybe once a month. What temperatures is your running at?
-
@acascianelli, it's running around 50-55 Celsius and it happens every 10 days or so.
@jimp, thanks for the hint, I'll search for another power supply and give it a try. I also think it's hardware related, I hope it's the hard drive or the power supply that I can easily replace and not the main board.
-
I finally managed to get some error logs from my console, nothing written in system logs though.
ahcich0: Timeout on slot 10 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs ffffe7ff tfd 40 serr 00000000 cm d 00406a17 (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 01 e8 c7 00 40 00 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command ahcich0: Timeout on slot 11 port 0 ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000800 tfd 50 serr 00000000 cm d 00406b17 (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 (aprobe0:ahcich0:0:0:0): CAM status: Command timeout (aprobe0:ahcich0:0:0:0): Retrying command ahcich0: Timeout on slot 12 port 0 ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00001000 tfd 50 serr 00000000 cm d 00406c17 (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 (aprobe0:ahcich0:0:0:0): CAM status: Command timeout (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted ahcich0: Timeout on slot 13 port 0 ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00002000 tfd 50 serr 00000000 cm d 00406d17 (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 (aprobe0:ahcich0:0:0:0): CAM status: Command timeout (aprobe0:ahcich0:0:0:0): Error 5, Retry was blocked ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <ts16gmsa310 20120703="">s/n 20121222A55XXXXXX detached ... db:0:kdb.enter.default> textdump set ... db:0:kdb.enter.default> capture on ... db:0:kdb.enter.default> run lockinfo ... db:0:kdb.enter.default> show pcpu ... db:0:kdb.enter.default> bt ... db:0:kdb.enter.default> ps ... db:0:kdb.enter.default> alltrace ... db:0:kdb.enter.default> capture off ... db:0:kdb.enter.default> textdump dump ... ... Tracing command kernel pid 0 tid 100099 td 0xfffff800144fe000 sched_switch() at sched_switch+0x4aa/frame 0xfffffe011fdb9960 mi_switch() at mi_switch+0xe5/frame 0xfffffe011fdb9990 sleepq_wait() at sleepq_wait+0x3a/frame 0xfffffe011fdb99c0 _sleep() at _sleep+0x255/frame 0xfffffe011fdb9a40 taskqueue_thread_loop() at taskqueue_thread_loop+0x121/frame 0xfffffe011fdb9a70 fork_exit() at fork_exit+0x85/frame 0xfffffe011fdb9ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe011fdb9ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- db:0:kdb.enter.default> capture off db:0:kdb.enter.default> textdump dump textdump_writeblock: offset 801111552, error 6 Textdump: Error 6 writing dump db:0:kdb.enter.default> reset cpu_reset: Restarting BSP cpu_reset_proxy: Stopped CPU 2 PC Engines apu2 coreboot build 07/24/2017 BIOS version v4.6.0</ts16gmsa310>
This is where it gets stuck and doesn't boot until I manually reset it.
-
Could a moderator move this topic under "Hardware" please?
-
Hi,
Same issue on APU3. I noticed few days ago the same behaviour
ahcich0: Timeout on slot 10 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs ffffe7ff tfd 40 serr 00000000 cm d 00406a17 (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 01 e8 c7 00 40 00 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command ahcich0: Timeout on slot 11 port 0
Temperature says around 53 - 57 °C .
Running coreboot v4.6.0
-
That's interesting. Let me also highlight that the crashes do NOT happen when the system is under load.
PC Engines apu2
Coreboot: build 07/24/2017
BIOS: version v4.6.0
pfSense: 2.4.2-RELEASE-p1 (amd64)
OS: FreeBSD 11.1-RELEASE-p6
mSATA SSD: Transcend TS16GMSA310 16 GB - https://www.amazon.co.uk/Transcend-TS16GMSA310-16-GB-Internal/dp/B007DIS8Y2@software, what kind of storage do you use?
-
I run a number of APU2; Coreboot 4.0.7 & have not seen this issue occur.
-
No problems:
System PC Engines APU2B2
BIOS Vendor: coreboot
Version: 88a4f96
Release Date: Mon Mar 7 2016
Version 2.4.2-RELEASE-p1 (amd64)
built on Tue Dec 12 13:45:26 CST 2017
FreeBSD 11.1-RELEASE-p6
&
mSATA SSD: Transcend TS16GMSA370 16 GB -
@/CS:
That's interesting. Let me also highlight that the crashes do NOT happen when the system is under load.
PC Engines apu2
Coreboot: build 07/24/2017
BIOS: version v4.6.0
pfSense: 2.4.2-RELEASE-p1 (amd64)
OS: FreeBSD 11.1-RELEASE-p6
mSATA SSD: Transcend TS16GMSA310 16 GB - https://www.amazon.co.uk/Transcend-TS16GMSA310-16-GB-Internal/dp/B007DIS8Y2@software, what kind of storage do you use?
It's a no name chinese mSata SSD. I think the cheap SSD bits me now in the ass.
Same here, it was during the night. Almost no load, except some openvpn ping traffic and some home automation.Jan 23 23:33:46 kernel (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted Jan 23 23:33:46 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout Jan 23 23:33:46 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 23 23:33:46 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00008000 tfd 50 serr 00000000 cmd 00406f17 Jan 23 23:33:46 kernel ahcich0: Timeout on slot 15 port 0 Jan 23 23:33:16 kernel (aprobe0:ahcich0:0:0:0): Retrying command Jan 23 23:33:16 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout Jan 23 23:33:16 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 23 23:33:16 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00004000 tfd 50 serr 00000000 cmd 00406e17 Jan 23 23:33:16 kernel ahcich0: Timeout on slot 14 port 0 Jan 23 23:32:46 kernel (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted Jan 23 23:32:46 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout Jan 23 23:32:46 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 23 23:32:46 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000800 tfd 50 serr 00000000 cmd 00406b17 Jan 23 23:32:46 kernel ahcich0: Timeout on slot 11 port 0 Jan 23 23:32:16 kernel (aprobe0:ahcich0:0:0:0): Retrying command Jan 23 23:32:16 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout Jan 23 23:32:16 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 23 23:32:16 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000400 tfd 50 serr 00000000 cmd 00406a17 Jan 23 23:32:16 kernel ahcich0: Timeout on slot 10 port 0 Jan 23 23:31:46 kernel (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted Jan 23 23:31:46 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout Jan 23 23:31:46 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 23 23:31:46 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000020 tfd 50 serr 00000000 cmd 00406517 Jan 23 23:31:46 kernel ahcich0: Timeout on slot 5 port 0 Jan 23 23:31:16 kernel (aprobe0:ahcich0:0:0:0): Retrying command Jan 23 23:31:16 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout Jan 23 23:31:16 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 23 23:31:16 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000010 tfd 50 serr 00000000 cmd 00406417 Jan 23 23:31:16 kernel ahcich0: Timeout on slot 4 port 0 Jan 23 23:30:46 kernel (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted Jan 23 23:30:46 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout Jan 23 23:30:46 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 23 23:30:46 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000002 tfd 50 serr 00000000 cmd 00406117 Jan 23 23:30:46 kernel ahcich0: Timeout on slot 1 port 0 Jan 23 23:30:16 kernel (aprobe0:ahcich0:0:0:0): Retrying command Jan 23 23:30:16 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout Jan 23 23:30:16 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 23 23:30:16 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000001 tfd 50 serr 00000000 cmd 00406017 Jan 23 23:30:16 kernel ahcich0: Timeout on slot 0 port 0 Jan 23 23:29:46 kernel (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted Jan 23 23:29:46 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout Jan 23 23:29:46 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 23 23:29:46 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 08000000 tfd 50 serr 00000000 cmd 00407b17 Jan 23 23:29:46 kernel ahcich0: Timeout on slot 27 port 0 Jan 23 23:29:16 kernel (aprobe0:ahcich0:0:0:0): Retrying command Jan 23 23:29:16 kernel (aprobe0:ahcich0:0:0:0): CAM status: Command timeout Jan 23 23:29:16 kernel (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 23 23:29:16 kernel ahcich0: is 00000002 cs 00000000 ss 00000000 rs 04000000 tfd 50 serr 00000000 cmd 00407a17
-
Looks like crappy disks to me.
-
@KOM:
Looks like crappy disks to me.
I think so too.
I have ordered a new "Transcend 32GB mSATA SSD (TS32GMSA370)" and I'll keep you posted.
-
Did your system crash before you upgraded the BIOS to 4.6.0? Which BIOS version were you running then? 4.0.x or 4.5.x?
Because the version that you are currently running is not recommended to use with pfSense or FreeBSD. PCEngines has a warning on their BIOS/Howto page:
For FreeBSD based OS like OPNSense and pfSense please use the legacy versions.
Note: "legacy" versions are 4.0.x
See: http://pcengines.ch/howto.htm#biosThere have been several reports about issues with the newer coreboot releases (afaik not this issue though), so I wouldn't be surprised if this is caused by the firmware and not the disk itself.
That being said, if you have seen this on the older firmware 4.0x. as well, then it's likely a disk issue.
-
Did your system crash before you upgraded the BIOS to 4.6.0? Which BIOS version were you running then? 4.0.x or 4.5.x?
…
That being said, if you have seen this on the older firmware 4.0x. as well, then it's likely a disk issue.It actually happened when I was on older firmware, I don't recall the version. I just thought it was a good opportunity to upgrade to the latest one hoping that it could possibly help.
I'll find out soon. -
I bought a brand new SSD (Transcend 32GB SATA III 6Gb/s MSA370 mSATA SSD - TS32GMSA370) and tried to reinstall pfSense loading the installer from one of the USB ports, but I'm getting similar errors during the installation process and it eventually fails. I tried multiple times and it always fails during the extraction of distribution files. Any ideas what's going on? The board itself maybe? I hope not…
pfSense Installer qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq ahcich0: Timeout on slot 15 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs 0000ffc0 tfd 40 serr 00000000 cmd 00406f17 (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 a8 60 62 40 03 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command lqqqqqqqArchive Extractionqqqqqqqqqk x Extracting distribution files... x x x x x x Overall Progress: x x lqqqqqqqqqqqqqqqqqqqqqqqqqqqqk x x x 23% x x x mqqqqqqqqqqqqqqqqqqqqqqqqqqqqj x mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj pfSense Installer qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq ahcich0: Timeout on slot 15 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs 0000ffc0 tfd 40 serr 00000000 cmd 00406f17 (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 a8 60 62 40 03 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command ahcich0: Timeout on slot 16 port 0 ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00010000 tfd 50 serr 00000000 cmd 00407017 (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 (aprobe0:ahcich0:0:0:0): CAM status: Command timeout x (aprobe0:ahcich0:0:0:0): Retrying command x x Overall Progress: x x lqqqqqqqqqqqqqqqqqqqqqqqqqqqqk x x x 23% x x x mqqqqqqqqqqqqqqqqqqqqqqqqqqqqj x mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj
-
For FreeBSD based OS like OPNSense and pfSense please use the legacy versions.
Reference: http://pcengines.ch/howto.htm#bios
The comment above was the reason I went to https://github.com/pcengines/apu2-documentation#binary-releases and downloaded http://pcengines.ch/file/apu2_v4.0.12.rom.tar.gz which is the latest "Legacy" version. After downgrading to 4.0.12 the installation of pfSense on my new SSD completed smoothly and without any errors.
Lets wait now for a week or so to see if the system will crash again or not.
-
Uptime: 22 days
I had no crach since I downgraded to the stable firmware and replaced my SSD. ;)
-
@/CS:
Uptime: 22 days
I had no crach since I downgraded to the stable firmware and replaced my SSD. ;)
Thanks! I have a similar problem and I will try the downgrade and report back.
Apr 8 11:12:10 pfsense kernel: ahcich0: Timeout on slot 14 port 0
Apr 8 11:12:10 pfsense kernel: ahcich0: is 00000008 cs 00000000 ss 00000000 rs 00007800 tfd 40 serr 00000000 cmd 00406e17
Apr 8 11:12:10 pfsense kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 01 0f c3 00 40 00 00 00 00 00 00
Apr 8 11:12:10 pfsense kernel: (ada0:ahcich0:0:0:0): CAM status: Command timeout
Apr 8 11:12:10 pfsense kernel: (ada0:ahcich0:0:0:0): Retrying command
Apr 8 11:12:40 pfsense kernel: ahcich0: Timeout on slot 15 port 0
Apr 8 11:12:40 pfsense kernel: ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00008000 tfd 50 serr 00000000 cmd 00406f17
Apr 8 11:12:40 pfsense kernel: (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Apr 8 11:12:40 pfsense kernel: (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
Apr 8 11:12:40 pfsense kernel: (aprobe0:ahcich0:0:0:0): Retrying command
Apr 8 11:13:10 pfsense kernel: ahcich0: Timeout on slot 16 port 0
Apr 8 11:13:10 pfsense kernel: ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00010000 tfd 50 serr 00000000 cmd 00407017
Apr 8 11:13:10 pfsense kernel: (aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Apr 8 11:13:10 pfsense kernel: (aprobe0:ahcich0:0:0:0): CAM status: Command timeout
Apr 8 11:13:10 pfsense kernel: (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted -
Everything works absolutely fine after the BIOS downgrade. I am running coreboot 4.0.7.1 now. Uptime 1 day and >6 hours.