Pfsense keeps crashing
-
My pfsense has started to crash sporadically, this time it happened at night, I can't connect to it at all, the only information I have is a photo of my TV where the router is connected to.
I will try to post that picture here.
I can't really tell whats causing this crash. -
hc iche: HHC
reset: aevlce nOt
CUSC
1
'hc ichg: Poll timeout on slot 26 port @
hc icha: is aaaaaaaa cs a4aaaea0 ss paaoaaoa rs a40gaaaa tfd 80 serr apgpgpep cmd apgade
aprobea:ahc icha :@:@:@): NOP FLUSHQUEUE. ACB: 00 80 aa aa 00 00 a0 00 a0 00 00 0a
aprobeg :ahc icha:@:@:@): cAM status: Command t imeout
aprobeg:ahc ich@:0:0:0): Error 5, Retr ies exhausted
hc iche: Timeout on slot 27 port a
hc icha: is aaeaaaaa cs a8oaboa0 ss paaoaaoa rs a8oaaaaa tfd 80 serr apapgpae cmd appgdb
ada@ : ahc ich@ :@:0:0): SETFEATURES EhABLE NCACHE. RCB: ef 02 00 00 00 40 80 00 0a 00 00 0
adag : ahc icha:0:0:0): cnM status: Command t imeout
adag :ahc icha@:0) : Error 5, Per iph was inval idated
hc icha: AhCI reset: device not ready after 31000ms (tfd = 00000080)
hc ichg: Poll timeout on slot 29 port a
hc icha: is aaaaa0a0 cs 2aaaa0a0 ss aaaaaaaa rs 2a0aaaa0 tfd 80 serr 80000a0g cmd apgddd
aprobea :ahc ich@:4:4:@): NOP FLUSHQUEUE. ACB: 00 00 a0 aa aa aa aa aa aa 00 a0 00
aprobeg:ahc ich@:0:0:0): CAM status: Command t imeout
aprobeg :ahc ich@0:a): Error 5, Retr ies exhausted
hc icha: Timeout on slot 30 port a
hc ichg: is a00aa80a cs c800aaaa ss a00a0a0a rs caaaaaaa tfd 80 serr gaaaaaaa cmd aaagde
adag :ahc ich@:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 80 a 01 0a
ada@:ahc ich@:@:0:@): CAM status: Command timeout
ada8 :ahc ich@:0:0:0): Error 5, Per iph was inval idated
ada0 :ahc icha:0:0:0): NRITE _DMA48.RCB: 35 00 30 6d a9 40 17 00 00 0 08 08
adag :ahc ich@:0:0:@): CAn status: Uncond it ional ly Re-queue Request
ada0 :ahc ich@:0:0:0): Error 5, Per iph was inval idated
ada0 :ahc icha:0:0:0): Periph destroyed
hc ich@: AHCI reset: dev ice not ready after 31000ms (tfd = 08agaa80)
hc ich@: Poll timeout on slot 1 port @
hc icha: is aaqaaaaa cs aaaaaa02 ss aagaaaa0 rs 00a00002 tfd 80 serr 8aaaagag cmd a00ac1
aprobeg:ahcich@:@:0:0): NOP FLUSHQUEUE. ACB : 00 a0 0a aa aa aa a0 aa 00 00 00 a0
aprobeg:ahc ich0:0:0:0): cAM status: Command t imeout
aprobeg :ahc ich@:0:8:0): Error 5, Retr ies exhausted -
@tjabas this is just a copy of the photo I took of the log on my screen, I really can't tell whats wrong
-
-
When you see 'CAM' and the device is 'ada.....' then you know the 'disk', or communication with the disk has issues.
To rule out the disk, or confirm the issue : use another disk.
-
i managed to get a photo of my screen, i have no clue whats happening.
i have deactivated icap interface,clam antivirus,squid and squidguard, just to start somewhere. -
@tjabas so you mean that the ssd disk inside the router?
the router is brand new, even though its a china box, i guess that the quality can vary alot. -
the disk i have monted in the router now is a msata 256gb, i found another msata at home but only 16gb, i also found a 250gb 2,5" ssd sata disk, but what happens when i remove the faulty disk, and replace it with one of the new ones, will i also loose the boot meny settings or are they stored in the motherboard?
will 16gb be enough? -
i found an old ssd 256gb harddrive , this is now installed and the system is up running, it will be interesting to see if the problem is still there.
-
16GB is probably fine. The boot settings will be stored in flash not on the disk. You will have to reinstall pfSense obviously but you must have done that already.
That error is definitely a drive or drive controller problem, almost certainly a bad drive.Steve
-
@tjabas said in Pfsense keeps crashing:
will i also loose the boot
No, of course not.
That why you make these regular daily Diagnostics > Backup & Restore Backup & Restore
And that's why you've set up Services > Auto Configuration Backup > SettingsA new disk means of course : you need a install media, an USB drive with the 'latest'.
After install, import de config. Reboot. Done.@tjabas said in Pfsense keeps crashing:
icap interface,clam antivirus,squid and squidguard
These can produce a lot of disk activity.
If the drive is 'small', a lot of sectors will be rewritten often. That's what kills even the best SSD. -
thanks for your help, maybee ill have to stop using these programs i use in pfsense, the last pfsense hardware i used lasted for 12years without any problems at all, and this brand new device lasted 2 weeks.
-
Even the worst current SSD should last waaaaay longer than that. There were some early SSDs that had bad firmware or controllers in which the wear leveling was completely broken. But if that was 256GB it probably wasn't one of them. Just a bad device.
Steve
-
@stephenw10 yes that sucks, someone in this tread mentioned autoconfig backup, i usually do backups and save them on my laptop, but what does the autoconfig backup do? and where do pfsense store that backup?
-
ACB encrypts the config file and sends it to our servers. The key remains in the firewall, we are unable to see those configs.
It can be configured to backup periodically or at every config change.
https://docs.netgate.com/pfsense/en/latest/backup/autoconfigbackup.html
Steve
-
-
I was running into the same issue with my SSD. I decided to run pfSense with RAM disk enabled with hourly writes of system logs, RRD, and DHCP leases. The only package that I have running that requires an SSD/HDD is pfBlockerNG. It's been stable for over a week now (8 Days 06 Hours). Whereas before, I would have to hard reset every day.
It might be worth trying doing the same just to see if it works for you. It might save you some time.
I'm hesitant to buy another SSD (I live where returning items isn't so easy) because I found another person on reddit ahcich1 timeout, CAM command timeout who experienced the same issue on the same device running pfSense that I do (HP T620 Plus). That person experienced the same issue with a new SSD in two separate devices of the same model (HP T620 Plus). Before enabling RAM drive, I tried various settings in the BIOS and sysctl no avail.
-
@mentisdominus said in Pfsense keeps crashing:
I would have to hard reset every day
That's the perfect way to break your drive 'logically' : the file system becomes a mess, the drive partition becomes read-only, which triggers run time failures, as the OS can't write files anymore.
Easy to understand : take your own PC : remove power / remove battery, repeat several times and your system goes into eternal blue screen mode. It's re install time, version 'from the ground up' which mans : partition hard drive, destructive re format, the big total.A normal (== good ?) pfSense system never reboots or fails.
Netgate devices tend to deliver on that criteria. -
@Gertjan The system was suspended showing the error message along the lines of "Solaris: zpool 'pfSense' has encountered an uncorrectable I/O error the system is suspended". There was no way to login/ssh into it to restart it or enter a keyboard command to restart.
Hard resettingHard reboot was the only way I know how to restart it.Edit: I previously wrote "hard reset" in my previous comments as instead of "hard reboot"; as in, physically holding down the power button to turn it off and then pressing it again to power it back on. I went back to correct that part.
-
Yes, from there you can really only reset.
That sounds like a possible drive controller issue though if it happens repeatedly even after reinstalling. Or across multiple drives.