Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Unbound failure after power failure - how to prevent? [solved enough]

    Scheduled Pinned Locked Moved DHCP and DNS
    7 Posts 5 Posters 837 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      mkernalcon
      last edited by mkernalcon

      This morning when I came into work, I found a complete network-wide DNS outage. We are set up with pfSense providing DNS Resolver (Unbound) for the network.

      Looking through my cameras/logs/etc., it seems we had a power outage which exhausted the UPS system for the router. Unfortunately, I do not yet have any UPS monitoring properly setup for the router. As a result, the system hard-shutdown.

      When everything came back up, unbound did not. When I arrived (hours later), I could not manually start unbound (no issues in logs, just an attempt to start and then it wasn't started). I immediately switched over to the Forwarder, then changed the listen port and log level on Unbound to debug. At this point unbound started properly. I was then able to switch back to using unbound as normal.

      My theory, based on similar posts in the past, is that the system was shutdown in the middle of a file write related to unbound, and that resetting the config was able to overcome this.

      As a stopgap, I added google DNS to the DHCP options for my network, so that we can bypass the router if an issue like this arises. However, for local resolving to continue working, I'd like to make sure this issue doesn't happen again. Short of properly setting up the UPS to shutdown the system (which is in the works), is there a way to have the config automatically sanify itself if unbound fails to start?

      GertjanG 1 Reply Last reply Reply Quote 0
      • johnpozJ
        johnpoz LAYER 8 Global Moderator
        last edited by johnpoz

        @mkernalcon said in Unbound failure after power failure - how to prevent?:

        I added google DNS to the DHCP options for my network

        Doesn't work that way - if you hand clients more than 1 dns, you have no idea which one they will use. So you could start seeing clients not able to resolve local resources. Even when your local dns is working fine.

        An intelligent man is sometimes forced to be drunk to spend time with his fools
        If you get confused: Listen to the Music Play
        Please don't Chat/PM me for help, unless mod related
        SG-4860 24.11 | Lab VMs 2.8, 24.11

        1 Reply Last reply Reply Quote 0
        • M
          mkernalcon
          last edited by

          @johnpoz said in Unbound failure after power failure - how to prevent?:

          Doesn't work that way - if you hand clients more than 1 dns, you have no idea which one they will use. So you could start seeing clients not able to resolve local resources. Even when your local dns is working fine.

          My thought was that even if they check and fail to resolve on public DNS, they'd go through the rest of their list until one resolves. Is this not accurate? (And theoretically they should try the first server first, but I know that's not a guarantee)

          Obviously this isn't my favorite solution, so if I can solve the unbound problem, I will revert to exclusively using the router as intended.

          bmeeksB 1 Reply Last reply Reply Quote 0
          • GertjanG
            Gertjan @mkernalcon
            last edited by Gertjan

            @mkernalcon said in Unbound failure after power failure - how to prevent?:

            it seems we had a power outage which exhausted the UPS system for the router

            The primary usage of a UPS is : provoking an automated controlled shut down for all logically connected devices when the power goes down - and stays down for more then X minutes.
            The UPS will 'bridge' any very short power outages by using it's battery. That's just a nice advantage, but not the main goal of an UPS.

            An UPS should "communicate by any means" with the critical devices it delivers power to.

            What happened is that, UPS or not, you had a power outage. This can provoke** a file system failure.
            When the device start up again, you must check the file system. (The pfSense manual and this forum speaks often about this procedure)

            Also : this "UPS + devices" setup should be tested like ones in a month. Just rip out the main power plug and see what happens. These protected devices should proceed with a ordinary shut down after the X minutes. If they don't, review and retest your setup.

            So, the solution : I advise you to finish your UPS setup. There is a little bit more involved as solely using the UPS as a simple multiple power outlet ^^

            ** try it out for yourself : take an ordinary Windows PC and rip out the wall power plug - and restart your PC.
            I'll bet you have majors boot problems within 10 tries.
            So, please, just believe me - and do not tries this @home.

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            JKnottJ 1 Reply Last reply Reply Quote 0
            • bmeeksB
              bmeeks @mkernalcon
              last edited by bmeeks

              @mkernalcon said in Unbound failure after power failure - how to prevent?:

              @johnpoz said in Unbound failure after power failure - how to prevent?:

              Doesn't work that way - if you hand clients more than 1 dns, you have no idea which one they will use. So you could start seeing clients not able to resolve local resources. Even when your local dns is working fine.

              My thought was that even if they check and fail to resolve on public DNS, they'd go through the rest of their list until one resolves. Is this not accurate? (And theoretically they should try the first server first, but I know that's not a guarantee)

              No, that's not correct. The clients would only try an alternate DNS if the first one they attempted to contact did not reply at all. If the DNS server they ask first returns a "NXDOMAIN" response (indicating the requested domain name does not exist), the client will stop asking any other servers since it got a response. And the response was "there is no such domain on the Internet by that name".

              In the case of a local domain defined only for your internal LAN, then Google's DNS and everyone else on the Internet will have no record of that domain and thus those public DNS servers will return a NXDOMAIN response when your clients ask them for the IP address of an internal host.

              1 Reply Last reply Reply Quote 0
              • M
                mkernalcon
                last edited by

                Well, so much for that idea.

                I've reverted back to local-only resolving, and I have a better UPS system coming, which I will actually take the time to get set up properly. Thanks for the input all around.

                1 Reply Last reply Reply Quote 0
                • JKnottJ
                  JKnott @Gertjan
                  last edited by

                  @Gertjan said in Unbound failure after power failure - how to prevent? [solved enough]:

                  try it out for yourself : take an ordinary Windows PC and rip out the wall power plug - and restart your PC.
                  I'll bet you have majors boot problems within 10 tries.
                  So, please, just believe me - and do not tries this @home

                  Several years ago, I worked at IBM. One day I got a call from someone whose computer wouldn't boot. Her disk was full of garbage. It turned out at the end of the day she'd just turn off the power bar, instead of doing a proper shut down.

                  PfSense running on Qotom mini PC
                  i5 CPU, 4 GB memory, 32 GB SSD & 4 Intel Gb Ethernet ports.
                  UniFi AC-Lite access point

                  I haven't lost my mind. It's around here...somewhere...

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.