Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Pfsense system crash

    Scheduled Pinned Locked Moved General pfSense Questions
    24 Posts 4 Posters 3.0k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • V Offline
      vcr58 @stephenw10
      last edited by

      @stephenw10 Thanks.

      Went ahead and reinstalled pfSense using default ZFS except I bumped up the swap to 4G. Everything is configured as before. The only change in system resources so far is an increase in memory usage from 8% with GPT to 18% for ZFS.

      So, now just wait and see. Monitor and keyboard are still connected.

      1 Reply Last reply Reply Quote 1
      • V Offline
        vcr58
        last edited by vcr58

        With the latest pfSense plus and ZFS file system install I ran into a system freeze again. This time I noticed that remote desktop was acting sluggish so I tried to log in to the pfSense web interface but the web browser said the site could not be reached. However, I still was able to use the console and performed a reboot. Then all was well after the reboot.

        From a shell ran the "top" command and discovered that ntopng normally used less that 1% CPU usage, but occasionally the CPU usage would jump to 65%.

        I am suspecting now that ntopng has a memory leak, or maybe it just needs more horsepower than my CPU can provide at times and causes a crash.

        So, I disabled ntopng service and am going to see if that fixes my system freezing.

        1 Reply Last reply Reply Quote 0
        • stephenw10S Offline
          stephenw10 Netgate Administrator
          last edited by

          That's a good test. You should be able to see a memory leak in ntop-ng in the top output though.

          It could also be ntop struggling due to traffic from some other issue.

          Steve

          V 1 Reply Last reply Reply Quote 0
          • F Offline
            fim @vcr58
            last edited by

            @vcr58 Hi there. I have the same problem.
            I'm also using a J4005 NUC and also tried everything on an an J5005 NUC.
            I can reproduce this behaveiour by downloading a large file on one VLAN or moving large Files from one VLAN to another. I have 6 VLANS configured.
            I've tried trafficshaping for bufferfloat, disabled hardware offloading, reinstalled and RECONFIGURED the fw from scratch (without any services running) and also baught a new nuc just to be sure. Same problem.
            This problem started occuring 3-5 Weeks ago

            stephenw10S 1 Reply Last reply Reply Quote 0
            • stephenw10S Offline
              stephenw10 Netgate Administrator @fim
              last edited by

              @fim said in Pfsense system crash:

              This problem started occuring 3-5 Weeks ago

              Like spontaneously or after an upgrade? Some other change?

              F 1 Reply Last reply Reply Quote 0
              • F Offline
                fim @stephenw10
                last edited by

                @stephenw10
                Thinking back, the firewall did start behaving "less reliable" after the 2.5.0 upgrade all in all. I thought the hardware was to weak so I bought the J5005 NUC. Same problem.

                The turningpoint was after adding 3 more OpenVPN Servers. After that the problem I described occured after every stress test.

                1 Reply Last reply Reply Quote 0
                • stephenw10S Offline
                  stephenw10 Netgate Administrator
                  last edited by stephenw10

                  So you are also only seeing latency issues when running ZFS?

                  I assume you're running 2.6 now?

                  F 1 Reply Last reply Reply Quote 0
                  • F Offline
                    fim @stephenw10
                    last edited by

                    @stephenw10 I did some more testing with a clean configuration for the last 2 hours.

                    Hardware: Intel NUC J5005
                    configuration:

                    • 3 VLANS (1-WAN PPPoE ; 2-LAN; 3-OPT1)
                    • PPPoE is bridged
                    • I also tried to use the ISP modem (zyxel xmg3927-b50a) as a router

                    It doesn't matter if ZFS or not. Transfering a large file (30GB ) from LAN to OPT1 lets the firwall crash after few seconds. It takes longer until it crashes if WAN has no config. Nothing in the logs as it is a full system crash that requiers a hardreboot.

                    Yes, I'm running 2.6

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S Offline
                      stephenw10 Netgate Administrator
                      last edited by

                      So it just appears to lock up? No response at the console? Even to Ctl+t ?

                      And there is no crash report shown when it reboots?

                      V 1 Reply Last reply Reply Quote 0
                      • V Offline
                        vcr58 @stephenw10
                        last edited by

                        @stephenw10 said in Pfsense system crash:

                        That's a good test. You should be able to see a memory leak in ntop-ng in the top output though.

                        It could also be ntop struggling due to traffic from some other issue.

                        Steve

                        With ntop-ng disabled I have not had any issues so far. A YT video I watched recommended that ntop-ng not be running all the time anyway since it is somewhat of of resource hog.

                        Thanks.

                        1 Reply Last reply Reply Quote 0
                        • V Offline
                          vcr58 @stephenw10
                          last edited by

                          @stephenw10 I have been running pfsense now for 22 days but just now am not being served any ip addresses and cannot log in. My cable modem says everything is fine so it's pfsense not running. CTL+t does respond with

                           "load: 0.00 cmd: login 53946 [tx->tx_sync_done_cv] 1881545.20r 0.00u 0.00s 0% 2708k"
                          

                          Anything else I should do? I know a reboot will fix it for a while.

                          1 Reply Last reply Reply Quote 0
                          • V Offline
                            vcr58
                            last edited by

                            Here is a pic of the monitor connected and the output before the crash.
                            20220509_190455-2.jpg

                            keyserK 1 Reply Last reply Reply Quote 0
                            • keyserK Offline
                              keyser Rebel Alliance @vcr58
                              last edited by

                              @vcr58 said in Pfsense system crash:

                              Here is a pic of the monitor connected and the output before the crash.
                              20220509_190455-2.jpg

                              Your issue is related to write-io to the system disk. It seems your disk goes missing/not responding. This also supports why you get better stability without NtopNG as that package in particular does A LOT of write I/O.

                              Strange that it only happens with ZFS and not UFS. But ZFS uses a very different write strategy, and is a quite write intense in bursts opposed to UFS. So it would seem your SSD/eMMC/HDD is the culprit. Please remember that especially eMMC and NTopNG is not a good match as the write endurance could be worn out in a matter of a year or two.

                              Love the no fuss of using the official appliances :-)

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S Offline
                                stephenw10 Netgate Administrator
                                last edited by

                                Mmm, that looks like a bad/failing disk. It should never stop responding like that.

                                I would replace it and restest when you can.

                                Steve

                                V 1 Reply Last reply Reply Quote 1
                                • V Offline
                                  vcr58 @stephenw10
                                  last edited by

                                  @stephenw10 - I suppose it could be the SSD going bad although I never get any errors when running a scan on it. What @keyser said does make sense to me as well.

                                  I do have an older WD SSD green that I could try so I will try that one and see what happens.

                                  Thanks

                                  1 Reply Last reply Reply Quote 1
                                  • stephenw10S Offline
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    I have an SSD here that continually throws CAM errors like that when pfSense is running from it. It has never actually failed during use but there's no way I would use it in any sort of critical role. It has failed to install before and I consider it dead.

                                    Steve

                                    V 1 Reply Last reply Reply Quote 0
                                    • V Offline
                                      vcr58 @stephenw10
                                      last edited by

                                      @stephenw10 I have the same pfsense setup on a different SSD now. It is probably a better drive even though it's older (I think).

                                      When the first drive stopped working the only log that showed CAM errors was on the monitor connected to the pfsense PC. After reboot I could not find the same info in pfsense logs anywhere. Would CAM errors show up in "Status/System Logs/System/General" in the web gui if the system was still running?

                                      1 Reply Last reply Reply Quote 0
                                      • stephenw10S Offline
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        It's common to see drive errors like that on the console only because often logs cannot be written by that point.

                                        V 1 Reply Last reply Reply Quote 1
                                        • V Offline
                                          vcr58 @stephenw10
                                          last edited by

                                          @stephenw10 After replacing the SSD I have not seen any errors after 4 days of uptime, even with ntopng running, so problem was indeed the bad SSD.

                                          Thank you so much for your help in troubleshooting my issue!

                                          1 Reply Last reply Reply Quote 2
                                          • V vcr58 referenced this topic on
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.