Issues after upgrade to 2.3.4 - random crashes



  • About 2 months ago I upgrade to 2.3.4 and since then pfsense has been intimately crashing.

    The routers IP address becomes unresponsive but IPsec vpn will still accept connections.  Any VPN clients can not hit a single address. A simple power cycle gets it back up and running.

    This happens once a day, after restarting the device it usually works for a good 12 hours. Can not find any pattern that causes this to happen.

    As a trouble shooting step the device has received a factor restart and all configuration was manually entered to ensure it was not config related.  All packages have been removed.

    Setup a cron task to restart the device once a day at 3am but this did not seem to make a difference.

    There is nothing in any of the log files that indicate an error, so I am at a complete loss as to the cause.

    Hardware: Netgate RCC-DFF
    Running 2.3.4-RELEASE (amd64)  - FreeBSD 10.3-RELEASE-p19

    Any suggestions?



  • Just remembered, I added the group 'wheel' back in. Could this be related?

    https://forum.pfsense.org/index.php?topic=131527.msg723835#msg723835



  • Having the same problem.  Running 2.3.4 and it just failed again.  I upgraded probably middle of May or so and it had been rock solid, but over the last 2-3 weeks it has been acting flaky.  Just failed again, I do use OpenVPN and know about the vulnerabilities and wondering if maybe if someone is trying to hack my pfsense box via the OpenVPN port and it is jut causing system instability and issues.

    Will reboot and see if I get anything similar to what you have errors, etc. and probably turn off OpenVPN to see if it stabilizes at all.

    HokieMan



  • I believe my issue was related to the OpenVPN vulnerabilities.  I think hackers were probing for vulnerabilities on port 1194 and kept bringing pfsense down when they did, I have OpenVPN server setup and configured for remote access.  Once I installed 2.3.4 patch 1 it has been rock solid since.



  • Afraid this did not resolve the issue I am experiencing. Also not using OpenVPN so I am not surprised this patch did not address it.

    Glad it works for you.



  • I'm having the same issues with 2.3.4 and 2.3.4-p1 on a Shuttle DS68U. It used to work fine when I was using an old Dell laptop.

    What hardware are you using?

    In my case it always happens between 24 to 28 hrs of uptime and it seems related to storage. For whatever reason the HD becomes detached and the system crashes and reboots:

    ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
    ada0: <st9160310as de06=""> s/n 5GV6HR23D detached
    (ada0:ahcich0:0:0:0): Periph destroyed
    /: got error 6 while accessing filesystem</st9160310as>
    

    Tried 2 different HD, same results. As a last resort, last night I installed the latest 2.4 on an nvme drive in efi mode. Ruuning well so far, and just passed the 12hrs mark. I'll see how long it lasts.



  • We're running 2.3.4-P1 are experiencing the same crashing issue.

    There doesn't seem to be a pattern and the point (read: status on screen) at which it crashes differs.

    We're now running reboots via Cron three times a day, with some very unhappy people barking my way… :o

    Up until P1, our pf was rock-solid.

    I have submitted crash reports, whenever possible.

    We do run several Packages.

    Has anyone found a resolution to this issue?

    With 2.4 seemingly imminent, I'm hesitant to downgrade.



  • @the3rdrock:

    We're running 2.3.4-P1 are experiencing the same crashing issue.

    There doesn't seem to be a pattern and the point (read: status on screen) at which it crashes differs.

    We're now running reboots via Cron three times a day, with some very unhappy people barking my way… :o

    Up until P1, our pf was rock-solid.

    I have submitted crash reports, whenever possible.

    We do run several Packages.

    Has anyone found a resolution to this issue?

    With 2.4 seemingly imminent, I'm hesitant to downgrade.

    What are the specs of hardware you are using? Unfortunately I have to report that things have not improved for me. I tried two standard HD, switched to 2.4 and tried the NVME drive in efi mode which, at least now, is supported by the installer. I replaced the ram with specific brand and model on the DS68U compatibility list, even if the existing one passed memtest86 with flying colors.

    Nothing! every day, between 24hrs and 31hrs (new record) of uptime the box just spontaneously reboots. It can be in the middle of the night or any time during the day, but never before reaching at least 24 hrs of uptime. When it runs, it works great, performance is good, load low, temps stable around 34C, SMART reporting all good. I am at a loss…

    I will have to schedule a daily reboot at night with cron at this point I don't know what else to do.



  • @MaxPF:

    @the3rdrock:

    We're running 2.3.4-P1 are experiencing the same crashing issue.

    There doesn't seem to be a pattern and the point (read: status on screen) at which it crashes differs.

    We're now running reboots via Cron three times a day, with some very unhappy people barking my way… :o

    Up until P1, our pf was rock-solid.

    I have submitted crash reports, whenever possible.

    We do run several Packages.

    Has anyone found a resolution to this issue?

    With 2.4 seemingly imminent, I'm hesitant to downgrade.

    What are the specs of hardware you are using? Unfortunately I have to report that things have not improved for me. I tried two standard HD, switched to 2.4 and tried the NVME drive in efi mode which, at least now, is supported by the installer. I replaced the ram with specific brand and model on the DS68U compatibility list, even if the existing one passed memtest86 with flying colors.

    Nothing! every day, between 24hrs and 31hrs (new record) of uptime the box just spontaneously reboots. It can be in the middle of the night or any time during the day, but never before reaching at least 24 hrs of uptime. When it runs, it works great, performance is good, load low, temps stable around 34C, SMART reporting all good. I am at a loss…

    I will have to schedule a daily reboot at night with cron at this point I don't know what else to do.

    I'll have to check for you, but off-hand, I can tell you that it's a relatively new (1 year-ish), dedicated server box (Intel Xeon, etc.) and far exceeds the requirements of pfSense.

    Today was a tragedy. After a week and a bit of strife, the box crashed, hard. I've spent the entire day trying to calm nerves and rebuild my client's gateway box, on another HDD (just in case) under immense pressure.

    For the record, I ran a full SMART test on the old HDD about 2 weeks back. No issues.

    We're now on 2.3.4 (NOT P1) and it seems OK, but it's only been live for an hour or so.

    Once up, Suricata took us down again, so that's been disabled, for now.

    Cron reboots did absolutely SFA for us.



  • Do you get any error on the console or logs? In my case I was getting the storage detached error when I was using sata HDs, but in 2.4 with the nvme drive I get nothing at all. Also are you by any chance running pfBlocker?



  • We were in crisis mode. Our f/o did not come online as expected and we had an entire company down. Sorry, but short of telling you that the boot took a lengthy time and scrolled through several errors, I can't give you any more info'.

    I don't recall ever seeing a storage detached error though.

    I installed pfBlockerNG a couple of weeks ago, but had it disabled.

    I've now installed all of the packages that I had previously, except for pfBlockerNG; as it's not needed right now. I did do a Pkg restore though, so I know that the settings are already on the box…

    Sidebar: Suricata won't start on 2.3.4, but it did work on 2.3.4-P1.



  • AFAIK the FreeBSD or pfSense crash without kernel panic and dump is possible under certain conditions and mostly caused by hardware or totally broken drivers/BIOS settings.
    The question remains, did you try downgrade to non P1 2.3.4?



  • @the3rdrock:

    We were in crisis mode. Our f/o did not come online as expected and we had an entire company down. Sorry, but short of telling you that the boot took a lengthy time and scrolled through several errors, I can't give you any more info'.

    I don't recall ever seeing a storage detached error though.

    I installed pfBlockerNG a couple of weeks ago, but had it disabled.

    I've now installed all of the packages that I had previously, except for pfBlockerNG; as it's not needed right now. I did do a Pkg restore though, so I know that the settings are already on the box…

    Sidebar: Suricata won't start on 2.3.4, but it did work on 2.3.4-P1.

    Ouch! That sounds painful. One last question… Do you use traffic shaper? That's one thing I haven't tried to disable yet. If that fails as well, I will try a different approach by installing VMware ESXI on the same hardware and setup pfSense as a VM using my current config. That should rule out or confirm any broken hardware.



  • I am using the netgate hardware so there is no physical drive to be causing issues.
    Once the device goes into this state I can not access it from any method, serial terminal included. There are also no errors to be seen within the log files.

    Going to rebuild the device from the ground up (re image) in case this is related to a previous plugin module or an older version.

    From the comments here it appears we have all migrated from a previous version, so maybe that is something to consider.

    Will report back in a day or two once it is done.



  • I had a similar issue with my Qotom based pfsense box. It started crashing nearly every day, but not in the sense of a normal crash. Some functionality would work, but slowly degrade to the point of traffic not passing or the admin interface not being accessible. I noticed the HDD light would be solidly on when this happen which is unusual behavior. I attributed it to the m.2 SSD going bad (even though SMART showed it OK). It was a Sandisk 120GB that came with the Qotom box from China. I figured who knows if it is even authentic. Long story short, I replaced it with a 250GB 2.5" 5400RPM laptop HDD. I reinstalled pfsense to the same version and restored the backup. It has been running solid ever since about 12 days now. Just some food for thought as you continue troubleshooting.



  • It would seem that some people might be experiencing the same issue as a result of hardware defects. With out much in the way of error messages, it is hard to investigate.

    Due to the issues we stopped using the ipsec vpn, and for 6 days it stayed up without crashing.

    Have just rebuilt the device from the ground up, will report back the outcome.



  • Happy to report that the VPN router has been stable since the update.

    So if you are experiencing issues on a device that has seen multiple updates over the years, consider a bare metal rebuild.  It does not take long and can export and import all of the settings easily.


Log in to reply