pfsense 2.4.4 hangs without logging errors



  • Hi all,

    Sorry for my lack of imagination on the title front =P

    I'm after some ideas or guidance on how to catch this thing.

    Since I've upgraded to 2.4.4 (without any crash or re install needed) I have been experiencing, so far, a couple of hangs where plugging hdmi in wouldn't turn the display on to look at possible dumps or logs on the console and the logs themselves didn't show anything at all at the time it happened.

    So I was wondering, in preparation for the next time, if I could do anything to make sure the box logs SOMETHING that helps me understand what the root cause is.

    This is my hardware

    Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
    Current: 2300 MHz, Max: 2301 MHz
    4 CPUs: 1 package(s) x 2 core(s) x 2 hardware threads
    AES-NI CPU Crypto: Yes (inactive)

    It's got a 4-port intel card as well, so one of them is basically the wan and other plugs to the AP.

    I'm basically running pfblockerng and openvpn.

    I've attached 'dmesg' output.0_1541624080990_fw_dmesg.txt

    This is an extract from system log

    At 22:27 I realised it was long dead so I rebooted the box. So you can see, only logs before that, pfblockerng only.

    Nov 4 22:27:15 	kernel: Copyright (c) 1992-2018 The FreeBSD Project.
    Nov 4 22:27:15 	syslogd: kernel boot file is /boot/kernel/kernel
    Nov 4 22:00:00 	php: [pfBlockerNG] Starting cron process.
    Nov 4 21:04:28 	php: [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Nov 4 21:00:00 	php: [pfBlockerNG] Starting cron process.
    Nov 4 20:04:30 	php: [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Nov 4 20:00:00 	php: [pfBlockerNG] Starting cron process.
    Nov 4 19:04:33 	php: [pfBlockerNG] No changes to Firewall rules, skipping Filter Reloa
    

    I'm running this version
    pfBlockerNG-devel net 2.2.5_19

    I haven't tried re installing anything at all, I'd to first identify what's the issue, if possible.

    Any ideas to help catch this thing would be highly appreciated.

    Thanks!


  • Rebel Alliance Developer Netgate

    Generally speaking, a lockup without any logs is going to be hardware. It could be a disk issue (hence no logs saved) or it could be power/heat or another hardware problem.

    Leave a connection open to the console, either with a monitor for a video console or a serial client set to log for a serial console. If it locks up and there is no output even on the console, then it's almost certainly hardware. If there is output on the console, it would hopefully point toward a more accurate assessment of what's wrong.



  • It took me a while to be able to run any sort of offline test.

    So I downloaded memtest86 ISO and let it running (3 hours) and didn't find any errors related to CPU and/or RAM.

    Now I'm going to run stresslinux tools focusing on disk since, as you mentioned, might be the faulty bit. Any special recommendation on this regarding? I'm a bit concern of getting positive results hehe I wouldn't know what else to look at.

    Just keeping people posted, I'm sure they were all looking forward for updates (?)



  • @pucho Maybe you are getting into Unbound memory problems ?
    8GB support around 1MB DNSBL db. At some point unbound-checkconf may suck all the system memory and the system becomes very slow for many minutes.

    Have a look at Status / Monitoring System Memory and Processor.

    Did you inspect the /var/log/pfblockerng/pfblockerNG.log ? Status / System Logs / System /DNS Resolver log?



  • @ronpfs said in pfsense 2.4.4 hangs without logging errors:

    @pucho Maybe you are getting into Unbound memory problems ?
    8GB support around 1MB DNSBL db. At some point unbound-checkconf may suck all the system memory and the system becomes very slow for many minutes.

    Have a look at Status / Monitoring System Memory and Processor.

    Did you inspect the /var/log/pfblockerng/pfblockerNG.log ? Status / System Logs / System /DNS Resolver log?

    Hi ronpfs, yes I did and couldn't find any clues, at that time. Maybe I didn't look very well, any particular keyword to search for?

    I did notice an increase on mem usage but very insignificant I'd say. Eg.: was around 5% and now is around 9%

    Now that you mention that, I'll take a deep look into unbound and pfblockerNG. The thing is that I normally find it hanged on the mornings (not ever time, but most of them) like it's got hanged at some point during the night. That's why I don't think it's related to temp for example. Where I live is not particularly hot and of course temp drops during the night.

    @RonpfS is there any work around that memory leak you mentioned?

    Thanks, all ideas are welcome.



  • @pucho Remove big DNSBL tables to see if things improve.
    Run Diagnostics System Activity when Cron Update is running.
    Monitor Status / Monitoring over a day.

    On reboot Unbound doesn't allways log in the Status / System Logs / System / DNS Resolver tab.
    Restarting Unbound will get it to log and output those lines about every 12hrs:

    Dec 10 18:46:57 	unbound 	27300:0 	info: generate keytag query _ta-4b5c-4f86. NULL IN
    


  • @pucho said in pfsense 2.4.4 hangs without logging errors:

    I've attached 'dmesg' output.0_1541624080990_fw_dmesg.txt

    You saw this one :

    WARNING: / was not properly dismounted
    WARNING: /: mount pending error: blocks 192376 files 8
    

    A non-writable disk would explain a lot.
    It's fsck time.



  • Hi @Gertjan ,

    dmesg doesn't really show you what pfsense did in consequence which is run a fsck.

    Also the consequence not the cause. I'd expect these messages every time the box hangs.
    Every time it hanged, I plugged the monitor, rebooted it and saw pfsense detecting fs un-properly dismounted and running fsck. Plus also logged in as admin and reboot with option to run fsck, plus booted up with a live cd and ran several tests with no negative results.

    I ran the smartctl tests and they all went well. I also ran fsck and no errors were found.

    I'll probably follow on @RonpfS recommendations since I put the box back online now to keep an eye on those things. Right after the boot up, unbound was holding 1.5G ram. We'll see how it evolves.

    Next steps would be a complete re install.



  • Well, after I finally kidnapped my partner's screen and left it plugged to the box. I woke up this morning and found it waiting on the BIOS screen. Disk is dead.

    A bit annoying after you run all the tests you can run and still is that bloody thing..I'm glad I don't work on ER. Patient is gone, we are so sorry.

    I wonder if it was temperature what screw it, this is one of those fan less boxes.

    Thanks for the answers, anyway.