Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfSense became unresponsive, then no DNS resolution after reboot

    Scheduled Pinned Locked Moved General pfSense Questions
    19 Posts 3 Posters 548 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Do you see anything logged in pfSense in the run up to the outage?

      Are local clients setup to use the pihole directly?

      I'd expect if pfSense stopped responding then the other devices behind it might lose dhcp leases for example. Or lose a route perhaps? Hard to say at this point but I'd check the logs on those hosts at the time too.

      S 1 Reply Last reply Reply Quote 0
      • S
        Sherwatt @stephenw10
        last edited by

        @stephenw10 I can't see anything outstanding in the logs. But I might be missing something. Maybe I should dig into logs via the command line and not just look at the GUI.

        However, I found something that might be the cause. I am using HAProxy and I recently enabled the stats logging. Using ps aux I could see that HAProxy is consuming lot of memory. After some time I checked again and its memory consumption increased. So at this point my theory is that it caused an out of memory error. I disabled stats and since then the memory usage stays on the same level. Fingers crossed.

        BTW, all (known) devices on the network have static IPs. PfSense hands out pihole's IP to clients and I can confirm they are using pihole directly. Now that I think about, the DNS resolution issue is now even more mysterious, as pfSense should not be involved in DNS lookups.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Check the graphs in Status > Monitoring. If there as memory exhaustion it should have been recorded there.

          1 Reply Last reply Reply Quote 0
          • GertjanG
            Gertjan @Sherwatt
            last edited by

            @Sherwatt said in pfSense became unresponsive, then no DNS resolution after reboot:

            was the first time we experienced a power outage

            Next time when you boot up pfSense, do so while you are watching, following the boot process from the console, the serial access with the small wire. You'll know right away if there is a problem.
            Also : when pfSense doesn't seem to react : connect to serial (console) interface first.
            Resetting or ripping out the power is like a Russian roulette "head shot".
            SSH access is the next best, but it needs 'interfaces' to work. Not being able to ssh in is already a 'bad' sign by itself. See it like this : nearly every device on the planet depends on SSH, and it's pretty rock solid. SSH not working is a big red flag. It could be as simple as the "Login protection" has excluded you after several login (password) errors, but you better be sure right away = try logging in from another device.
            The fact that pfSense handles (normally) DHCP, this is also a good sign that some parts are still working, but if all your devices use static IP settings, you 'miss' this check = run ipconfig /all on your PC, or check if your device re obtained a DHCP lease after removing the connection for a short time.

            When you install pfSense packages like HAProxy, it becomes important that you check regularly the system resources. After all, when RAM fills up, pfSense can start swapping and that's something you really do not want to happen, as the system might elect a random (the process using the most RAM) process and kill it. This will most probably have an impact as every process is essential. This "killing" will get signaled in the system log.

            And yeah, an UPS can pay itself back without you knowing about it ;)

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            1 Reply Last reply Reply Quote 0
            • S
              Sherwatt
              last edited by Sherwatt

              I'm just checking the Monitoring graph and memory consumption seems normal, nothing outstanding there.
              However the States started increasing 10 days ago. I am not even sure I understand what States are, but I guess I need to see what I changed 10 days ago and see if is related.

              EDIT: that big spike is NOT when the issue happened, that spike is actually 24 hours before that. So maybe it is not even States that caused it.

              df14dc55-e582-46f1-8c56-8fadb6ee1f8e-image.png

              Thanks for the tips @Gertjan, I will try to remember to use the serial access first. But it also depends how quickly the household needs internet as two people are working from home here.
              And yes, after this outage I definitely want to buy a UPS, I just need to do some research, because I have never used one.

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Yeah I doubt it's a states problem. 4000 states really isn't that much. Odd that it spiked like that though. Do you have any sort of content sharing applications running? bit torrent creates a lot of states for example.

                S 1 Reply Last reply Reply Quote 0
                • S
                  Sherwatt @stephenw10
                  last edited by

                  @stephenw10 Yes, I am running qBittorrent in a container.

                  Can I run some kind of error checking and fixing command on pfSense to look for potentially corrupted files on the disk? Maybe the outage caused a corruption somewhere on the filesystem which is rarely accessed, but when fails, the whole system crashes.

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Is it UFS or ZFS?

                    S 1 Reply Last reply Reply Quote 0
                    • S
                      Sherwatt @stephenw10
                      last edited by

                      @stephenw10 I asked ChatGPT the same question as in my previous post and after a short chat it turns out it is ZFS:

                      $ zpool status -v
                        pool: pfSense
                       state: ONLINE
                      config:
                      
                      	NAME        STATE     READ WRITE CKSUM
                      	pfSense     ONLINE       0     0     0
                      	  mmcsd0p4  ONLINE       0     0     0
                      
                      errors: No known data errors
                      

                      Is there anything else I could use to retroactively diagnose the problem? I already fed the boot log to ChatGPT to look for errors, but it didn't find anything scary. Should I share it with you and if yes, is pasting it in a post acceptable?

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Then you can run a zfs pool scrub: zpool scrub pfSense
                        https://docs.netgate.com/pfsense/en/latest/troubleshooting/filesystem-check.html

                        You can upload the logs here and I can look at them:
                        https://nc.netgate.com/nextcloud/s/zgpTGfKio3Fa5eb

                        S 1 Reply Last reply Reply Quote 0
                        • S
                          Sherwatt @stephenw10
                          last edited by

                          @stephenw10 Thank you. I uploaded boot.txt.

                          [2.7.2-RELEASE][admin@pfSense.lan.mydomain.com]/root: zpool scrub pfSense
                          [2.7.2-RELEASE][admin@pfSense.lan.mydomain.com]/root: zpool status
                            pool: pfSense
                           state: ONLINE
                            scan: scrub repaired 0B in 00:00:10 with 0 errors on Wed Mar 19 15:11:17 2025
                          config:
                          
                                  NAME        STATE     READ WRITE CKSUM
                                  pfSense     ONLINE       0     0     0
                                    mmcsd0p4  ONLINE       0     0     0
                          
                          errors: No known data errors
                          
                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by stephenw10

                            That's just the boot log from after the outage happened.

                            We need to see the system covering the event. So from at least some hours before until and including the reboot.

                            You should disable the on-board audio device though. It just uses resources and does nothing in pfSense.

                            hdacc0: <Intel Jasper Lake HDA CODEC> at cad 2 on hdac0
                            hdaa0: <Intel Jasper Lake Audio Function Group> at nid 1 on hdacc0
                            
                            S 1 Reply Last reply Reply Quote 0
                            • S
                              Sherwatt @stephenw10
                              last edited by

                              @stephenw10 Thank you for looking into my issue. I uploaded system.log twice, because I messed up the first one. I guess this is what I should be looking at, right? (from /var/log).
                              I think the issue happened around 17:45 (March 18). I left my computer around 17:40 and when came back pfSense was dead.

                              I should disable the audio device in UEFI, right?

                              stephenw10S 1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator @Sherwatt
                                last edited by

                                @Sherwatt said in pfSense became unresponsive, then no DNS resolution after reboot:

                                I should disable the audio device in UEFI, right?

                                Yup somewhere in the EFI/BIOS setup you should be able to disable it completely.

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Mmm, nothing really shown in the logs at all:

                                  Mar 18 17:17:00 pfSense sshguard[62427]: Now monitoring attacks.
                                  Mar 18 17:26:00 pfSense sshguard[62427]: Exiting on signal.
                                  Mar 18 17:26:00 pfSense sshguard[44994]: Now monitoring attacks.
                                  Mar 18 17:35:00 pfSense sshguard[44994]: Exiting on signal.
                                  Mar 18 17:35:00 pfSense sshguard[31294]: Now monitoring attacks.
                                  Mar 18 17:44:00 pfSense sshguard[31294]: Exiting on signal.
                                  Mar 18 17:44:00 pfSense sshguard[11995]: Now monitoring attacks.
                                  Mar 18 17:47:09 pfSense syslogd: exiting on signal 15
                                  Mar 18 17:48:38 pfSense syslogd: kernel boot file is /boot/kernel/kernel
                                  Mar 18 17:48:38 pfSense kernel: ---<<BOOT>>---
                                  Mar 18 17:48:38 pfSense kernel: Copyright (c) 1992-2023 The FreeBSD Project.
                                  Mar 18 17:48:38 pfSense kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
                                  Mar 18 17:48:38 pfSense kernel:         The Regents of the University of California. All rights reserved.
                                  Mar 18 17:48:38 pfSense kernel: FreeBSD is a registered trademark of The FreeBSD Foundation.
                                  Mar 18 17:48:38 pfSense kernel: FreeBSD 14.0-CURRENT amd64 1400094 #1 RELENG_2_7_2-n255948-8d2b56da39c: Wed Dec  6 20:45:47 UTC 2023
                                  Mar 18 17:48:38 pfSense kernel:     root@freebsd:/var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/obj/amd64/StdASW5b/var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/sources/FreeBSD-src-RELENG_2_7_2/amd64.amd64/sys/pfSense amd64
                                  Mar 18 17:48:38 pfSense kernel: FreeBSD clang version 16.0.6 (https://github.com/llvm/llvm-project.git llvmorg-16.0.6-0-g7cbf1a259152)
                                  

                                  If nothing is logged at reboot like that it can be a hardware issue.

                                  I assume you didn't see a crash report after rebooting? It doesn't look like you have SWAP configured so you wouldn't see one if it panicked.

                                  S 1 Reply Last reply Reply Quote 0
                                  • S
                                    Sherwatt @stephenw10
                                    last edited by

                                    @stephenw10 Thank you for your time looking into the logs. I did not see any crash reports. Do you think I should configure swap in pfSense in case this happens again?

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      You would need to re-install to do so. But that would then give you a crash report if it was the result of a kernel panic.

                                      S 1 Reply Last reply Reply Quote 0
                                      • S
                                        Sherwatt @stephenw10
                                        last edited by

                                        @stephenw10 Then I'm just going to stick with my current setup and see if there is anything on the console the next time this happens, if happens.
                                        Thank you for your help, much appreciated!

                                        1 Reply Last reply Reply Quote 1
                                        • S Sherwatt referenced this topic on
                                        • First post
                                          Last post
                                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.