Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfSense became unresponsive, then no DNS resolution after reboot

    Scheduled Pinned Locked Moved General pfSense Questions
    19 Posts 3 Posters 521 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      Sherwatt
      last edited by

      I left my computer for 5 minutes and when I came back, there was no internet connection. pfSense did not respond to ping, I could not ssh into it and there was the web interface timed out.
      I had to press the power button on the Protectli hardware. I waited it to shutdown. Then waited 30 seconds before pressing it again.
      Eventually pfSense came back, responds to pings, ssh and webUI works.
      I immediately started searching on the internet using Brave search, and it resolved brave and reddit, but then nothing else.

      In pfSense there is only 1 DNS server configured (besides 127.0.0.1), which is a locally hosted Pi-hole on a separate machine, running in docker container. This Pi-hole uses unbound (also docker container) as its upstream DNS server. In Pi-hole's logs I can see that for the domain queries it responded with SERVFAIL. I did not have much time to troubleshoot this as we needed internet, so I just rebooted both Pi-hole and unbound containers at the same time, and this solved the DNS issue. But I find it strange that after pfSense reboots, Pi-hole/unbound on another machine stop serving DNS.

      What logs should I look at and how in pfSense? The top priority for me is to figure out why it stopped working in the first place. (There is still plenty of free disk space on it.)

      It is maybe worth mentioning that this is the first time it did this, and it is a relatively fresh install, only 2 months old. Also yesterday was the first time we experienced a power outage, so all hardware was stopped abruptly, but then everything worked again after power came back.

      Thank you.

      GertjanG 1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Do you see anything logged in pfSense in the run up to the outage?

        Are local clients setup to use the pihole directly?

        I'd expect if pfSense stopped responding then the other devices behind it might lose dhcp leases for example. Or lose a route perhaps? Hard to say at this point but I'd check the logs on those hosts at the time too.

        S 1 Reply Last reply Reply Quote 0
        • S
          Sherwatt @stephenw10
          last edited by

          @stephenw10 I can't see anything outstanding in the logs. But I might be missing something. Maybe I should dig into logs via the command line and not just look at the GUI.

          However, I found something that might be the cause. I am using HAProxy and I recently enabled the stats logging. Using ps aux I could see that HAProxy is consuming lot of memory. After some time I checked again and its memory consumption increased. So at this point my theory is that it caused an out of memory error. I disabled stats and since then the memory usage stays on the same level. Fingers crossed.

          BTW, all (known) devices on the network have static IPs. PfSense hands out pihole's IP to clients and I can confirm they are using pihole directly. Now that I think about, the DNS resolution issue is now even more mysterious, as pfSense should not be involved in DNS lookups.

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Check the graphs in Status > Monitoring. If there as memory exhaustion it should have been recorded there.

            1 Reply Last reply Reply Quote 0
            • GertjanG
              Gertjan @Sherwatt
              last edited by

              @Sherwatt said in pfSense became unresponsive, then no DNS resolution after reboot:

              was the first time we experienced a power outage

              Next time when you boot up pfSense, do so while you are watching, following the boot process from the console, the serial access with the small wire. You'll know right away if there is a problem.
              Also : when pfSense doesn't seem to react : connect to serial (console) interface first.
              Resetting or ripping out the power is like a Russian roulette "head shot".
              SSH access is the next best, but it needs 'interfaces' to work. Not being able to ssh in is already a 'bad' sign by itself. See it like this : nearly every device on the planet depends on SSH, and it's pretty rock solid. SSH not working is a big red flag. It could be as simple as the "Login protection" has excluded you after several login (password) errors, but you better be sure right away = try logging in from another device.
              The fact that pfSense handles (normally) DHCP, this is also a good sign that some parts are still working, but if all your devices use static IP settings, you 'miss' this check = run ipconfig /all on your PC, or check if your device re obtained a DHCP lease after removing the connection for a short time.

              When you install pfSense packages like HAProxy, it becomes important that you check regularly the system resources. After all, when RAM fills up, pfSense can start swapping and that's something you really do not want to happen, as the system might elect a random (the process using the most RAM) process and kill it. This will most probably have an impact as every process is essential. This "killing" will get signaled in the system log.

              And yeah, an UPS can pay itself back without you knowing about it ;)

              No "help me" PM's please. Use the forum, the community will thank you.
              Edit : and where are the logs ??

              1 Reply Last reply Reply Quote 0
              • S
                Sherwatt
                last edited by Sherwatt

                I'm just checking the Monitoring graph and memory consumption seems normal, nothing outstanding there.
                However the States started increasing 10 days ago. I am not even sure I understand what States are, but I guess I need to see what I changed 10 days ago and see if is related.

                EDIT: that big spike is NOT when the issue happened, that spike is actually 24 hours before that. So maybe it is not even States that caused it.

                df14dc55-e582-46f1-8c56-8fadb6ee1f8e-image.png

                Thanks for the tips @Gertjan, I will try to remember to use the serial access first. But it also depends how quickly the household needs internet as two people are working from home here.
                And yes, after this outage I definitely want to buy a UPS, I just need to do some research, because I have never used one.

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Yeah I doubt it's a states problem. 4000 states really isn't that much. Odd that it spiked like that though. Do you have any sort of content sharing applications running? bit torrent creates a lot of states for example.

                  S 1 Reply Last reply Reply Quote 0
                  • S
                    Sherwatt @stephenw10
                    last edited by

                    @stephenw10 Yes, I am running qBittorrent in a container.

                    Can I run some kind of error checking and fixing command on pfSense to look for potentially corrupted files on the disk? Maybe the outage caused a corruption somewhere on the filesystem which is rarely accessed, but when fails, the whole system crashes.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Is it UFS or ZFS?

                      S 1 Reply Last reply Reply Quote 0
                      • S
                        Sherwatt @stephenw10
                        last edited by

                        @stephenw10 I asked ChatGPT the same question as in my previous post and after a short chat it turns out it is ZFS:

                        $ zpool status -v
                          pool: pfSense
                         state: ONLINE
                        config:
                        
                        	NAME        STATE     READ WRITE CKSUM
                        	pfSense     ONLINE       0     0     0
                        	  mmcsd0p4  ONLINE       0     0     0
                        
                        errors: No known data errors
                        

                        Is there anything else I could use to retroactively diagnose the problem? I already fed the boot log to ChatGPT to look for errors, but it didn't find anything scary. Should I share it with you and if yes, is pasting it in a post acceptable?

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Then you can run a zfs pool scrub: zpool scrub pfSense
                          https://docs.netgate.com/pfsense/en/latest/troubleshooting/filesystem-check.html

                          You can upload the logs here and I can look at them:
                          https://nc.netgate.com/nextcloud/s/zgpTGfKio3Fa5eb

                          S 1 Reply Last reply Reply Quote 0
                          • S
                            Sherwatt @stephenw10
                            last edited by

                            @stephenw10 Thank you. I uploaded boot.txt.

                            [2.7.2-RELEASE][admin@pfSense.lan.mydomain.com]/root: zpool scrub pfSense
                            [2.7.2-RELEASE][admin@pfSense.lan.mydomain.com]/root: zpool status
                              pool: pfSense
                             state: ONLINE
                              scan: scrub repaired 0B in 00:00:10 with 0 errors on Wed Mar 19 15:11:17 2025
                            config:
                            
                                    NAME        STATE     READ WRITE CKSUM
                                    pfSense     ONLINE       0     0     0
                                      mmcsd0p4  ONLINE       0     0     0
                            
                            errors: No known data errors
                            
                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by stephenw10

                              That's just the boot log from after the outage happened.

                              We need to see the system covering the event. So from at least some hours before until and including the reboot.

                              You should disable the on-board audio device though. It just uses resources and does nothing in pfSense.

                              hdacc0: <Intel Jasper Lake HDA CODEC> at cad 2 on hdac0
                              hdaa0: <Intel Jasper Lake Audio Function Group> at nid 1 on hdacc0
                              
                              S 1 Reply Last reply Reply Quote 0
                              • S
                                Sherwatt @stephenw10
                                last edited by

                                @stephenw10 Thank you for looking into my issue. I uploaded system.log twice, because I messed up the first one. I guess this is what I should be looking at, right? (from /var/log).
                                I think the issue happened around 17:45 (March 18). I left my computer around 17:40 and when came back pfSense was dead.

                                I should disable the audio device in UEFI, right?

                                stephenw10S 1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator @Sherwatt
                                  last edited by

                                  @Sherwatt said in pfSense became unresponsive, then no DNS resolution after reboot:

                                  I should disable the audio device in UEFI, right?

                                  Yup somewhere in the EFI/BIOS setup you should be able to disable it completely.

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Mmm, nothing really shown in the logs at all:

                                    Mar 18 17:17:00 pfSense sshguard[62427]: Now monitoring attacks.
                                    Mar 18 17:26:00 pfSense sshguard[62427]: Exiting on signal.
                                    Mar 18 17:26:00 pfSense sshguard[44994]: Now monitoring attacks.
                                    Mar 18 17:35:00 pfSense sshguard[44994]: Exiting on signal.
                                    Mar 18 17:35:00 pfSense sshguard[31294]: Now monitoring attacks.
                                    Mar 18 17:44:00 pfSense sshguard[31294]: Exiting on signal.
                                    Mar 18 17:44:00 pfSense sshguard[11995]: Now monitoring attacks.
                                    Mar 18 17:47:09 pfSense syslogd: exiting on signal 15
                                    Mar 18 17:48:38 pfSense syslogd: kernel boot file is /boot/kernel/kernel
                                    Mar 18 17:48:38 pfSense kernel: ---<<BOOT>>---
                                    Mar 18 17:48:38 pfSense kernel: Copyright (c) 1992-2023 The FreeBSD Project.
                                    Mar 18 17:48:38 pfSense kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
                                    Mar 18 17:48:38 pfSense kernel:         The Regents of the University of California. All rights reserved.
                                    Mar 18 17:48:38 pfSense kernel: FreeBSD is a registered trademark of The FreeBSD Foundation.
                                    Mar 18 17:48:38 pfSense kernel: FreeBSD 14.0-CURRENT amd64 1400094 #1 RELENG_2_7_2-n255948-8d2b56da39c: Wed Dec  6 20:45:47 UTC 2023
                                    Mar 18 17:48:38 pfSense kernel:     root@freebsd:/var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/obj/amd64/StdASW5b/var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/sources/FreeBSD-src-RELENG_2_7_2/amd64.amd64/sys/pfSense amd64
                                    Mar 18 17:48:38 pfSense kernel: FreeBSD clang version 16.0.6 (https://github.com/llvm/llvm-project.git llvmorg-16.0.6-0-g7cbf1a259152)
                                    

                                    If nothing is logged at reboot like that it can be a hardware issue.

                                    I assume you didn't see a crash report after rebooting? It doesn't look like you have SWAP configured so you wouldn't see one if it panicked.

                                    S 1 Reply Last reply Reply Quote 0
                                    • S
                                      Sherwatt @stephenw10
                                      last edited by

                                      @stephenw10 Thank you for your time looking into the logs. I did not see any crash reports. Do you think I should configure swap in pfSense in case this happens again?

                                      1 Reply Last reply Reply Quote 0
                                      • stephenw10S
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        You would need to re-install to do so. But that would then give you a crash report if it was the result of a kernel panic.

                                        S 1 Reply Last reply Reply Quote 0
                                        • S
                                          Sherwatt @stephenw10
                                          last edited by

                                          @stephenw10 Then I'm just going to stick with my current setup and see if there is anything on the console the next time this happens, if happens.
                                          Thank you for your help, much appreciated!

                                          1 Reply Last reply Reply Quote 1
                                          • S Sherwatt referenced this topic on
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.