• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

pfSense resolver stops working

DHCP and DNS
7
66
15.4k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M
    maverickws @johnpoz
    last edited by maverickws Jul 26, 2022, 3:20 PM Jul 26, 2022, 3:18 PM

    @johnpoz alright have to say I never paid much attention to these interface options. I noticed despite being listening to all interfaces, on the external side only devices from the access list would actually get a reply, so never changed that...

    I'm going to change that config now.

    On the listening interfaces I'll leave:
    LAN IPv6 Link-Local
    PUB IPv6 Link-Local
    CARP LAN VIP
    CARP PUB VIP
    CARP WAN VIP
    DMZ VIP

    And on the outgoing interfaces that'll be only "localhost".

    (if localhost has to choose between the interface IP, or the VIP which is the gateway for the interface, which will it prefer? VIP or IP?)

    EDIT:

    I updated the settings selecting the mentioned interfaces, when I hit save I get an error:

    The following input errors were detected:
    This system is configured to use the DNS Resolver as its DNS server, so Localhost or All must be selected in Network Interfaces.
    

    So I've also selected localhost on the listening interfaces. You also got that on your selection so my bad!

    J 1 Reply Last reply Jul 26, 2022, 3:32 PM Reply Quote 0
    • J
      johnpoz LAYER 8 Global Moderator @maverickws
      last edited by Jul 26, 2022, 3:32 PM

      @maverickws yeah you have to listen on local host if you want pfsense to be able to use 127.0.01 which is default ;)

      An intelligent man is sometimes forced to be drunk to spend time with his fools
      If you get confused: Listen to the Music Play
      Please don't Chat/PM me for help, unless mod related
      SG-4860 24.11 | Lab VMs 2.7.2, 24.11

      M 1 Reply Last reply Jul 26, 2022, 3:37 PM Reply Quote 1
      • M
        maverickws @johnpoz
        last edited by maverickws Jul 26, 2022, 3:37 PM Jul 26, 2022, 3:37 PM

        @johnpoz
        yeah it makes sense!!

        Ok so anyway I've changed these configs as reported on the thread here, I did not disable the DHCP Leases option yet as I'm waiting to see if it fails again so I can do some more checks in regard to those points you mentioned above.

        Will stay on top of this and if the issue keeps occurring or when it occurs I'll get back!

        thanks a lot for all the feedback so far! have a great one

        G 1 Reply Last reply Jul 27, 2022, 7:07 AM Reply Quote 0
        • I
          ik13
          last edited by Jul 27, 2022, 1:55 AM

          In about 3 months I had the DNS Resolver stop working 3 times. Definitely an issue!

          Very basic setup. pfSense 2.6.0 on metal.
          I do not have "Register DHCP leases in the DNS Resolver" checked. Service restart fixes it until the next time.
          Options that are "on":

          • Respond to incoming SSL/TLS queries from local clients
          • Enable DNSSEC Support
          • Enable Forwarding Mode
          • Use SSL/TLS for outgoing DNS Queries to Forwarding Servers
          J 1 Reply Last reply Jul 27, 2022, 3:16 AM Reply Quote 0
          • J
            johnpoz LAYER 8 Global Moderator @ik13
            last edited by Jul 27, 2022, 3:16 AM

            @ik13 said in pfSense resolver stops working:

            Enable DNSSEC Support
            Enable Forwarding Mode

            This is by no means a good setup - have been over it and over it here multiple times.. If your going to forward there is zero reason to have dnssec checked, where you forward to either does it or it doesn't. Having that checked does nothing but cause extra queries and problems.

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 24.11 | Lab VMs 2.7.2, 24.11

            1 Reply Last reply Reply Quote 0
            • G
              Gertjan @maverickws
              last edited by Gertjan Jul 27, 2022, 7:17 AM Jul 27, 2022, 7:07 AM

              @maverickws said in pfSense resolver stops working:

              I did not disable the DHCP Leases option yet as I'm waiting to see if it fails again

              Just keep in mind what will happen with this option checked :
              Every time a device on any of your LAN's asks or renew a lease, unbound will get restarted.

              You have just one LAN, a and a couple of devices, this will happen half way the duration of every lease, or when a device gets disconnected, and reconnect (think of phones and other wifi device).

              Run this on the console to see unbound stopping :

              grep -E -i '(start|stopped)' /var/log/resolver.log
              

              Again, it's not only a DHCP event that restarts unbound, it can also be pfblockerng-devel, or an interface down+up event.

              These unbound stop+starts are not bad, but, if they take some time, they will leave your network without DNS for a couple of moments.

              And what I don't like at all : unbound receives a stop command. It will not just stop, it will take down all the memory structures, caches etc, this takes time. Same thing when it start, it has a lot to do.
              What happened if there was another stop event coming in ? And another one right at this moment ? At best you have a lot of race conditions. And that is .... well, that's a situation I don't like at all. I was a programmer in my previous live (C, C++, C# etc)

              I have this for the last month or so :

              [22.05-RELEASE][admin@pfSEnse.my-site.net]/root: grep -E -i '(start|stopped)' /var/log/resolver.log
              <30>1 2022-06-19T00:18:12.638505+02:00 pfSEnse.my-site.net unbound 61884 - - [61884:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-19T00:18:15.294608+02:00 pfSEnse.my-site.net unbound 86881 - - [86881:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-19T11:10:42.240792+02:00 pfSEnse.my-site.net unbound 86881 - - [86881:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-21T02:29:21.972768+02:00 pfSEnse.my-site.net unbound 22445 - - [22445:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-21T02:29:42.732938+02:00 pfSEnse.my-site.net unbound 22445 - - [22445:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-21T02:30:35.984183+02:00 pfSEnse.my-site.net unbound 43631 - - [43631:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-21T02:31:25.988265+02:00 pfSEnse.my-site.net unbound 43631 - - [43631:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-21T02:32:01.804532+02:00 pfSEnse.my-site.net unbound 39746 - - [39746:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-21T02:32:18.435160+02:00 pfSEnse.my-site.net unbound 39746 - - [39746:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-21T02:32:20.879029+02:00 pfSEnse.my-site.net unbound 23048 - - [23048:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-21T02:32:27.600193+02:00 pfSEnse.my-site.net unbound 23048 - - [23048:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-21T02:32:30.041500+02:00 pfSEnse.my-site.net unbound 18498 - - [18498:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-21T02:32:46.722051+02:00 pfSEnse.my-site.net unbound 18498 - - [18498:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-21T02:32:49.996440+02:00 pfSEnse.my-site.net unbound 82950 - - [82950:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-21T02:32:52.506637+02:00 pfSEnse.my-site.net unbound 82950 - - [82950:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-21T02:32:56.071159+02:00 pfSEnse.my-site.net unbound 58776 - - [58776:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-21T02:33:02.527865+02:00 pfSEnse.my-site.net unbound 58776 - - [58776:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-21T02:33:16.489983+02:00 pfSEnse.my-site.net unbound 42004 - - [42004:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-21T02:33:25.849070+02:00 pfSEnse.my-site.net unbound 42004 - - [42004:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-21T02:33:29.721989+02:00 pfSEnse.my-site.net unbound 88115 - - [88115:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-21T02:33:34.115088+02:00 pfSEnse.my-site.net unbound 88115 - - [88115:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-21T02:33:35.895850+02:00 pfSEnse.my-site.net unbound 7800 - - [7800:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-21T02:38:33.501144+02:00 pfSEnse.my-site.net unbound 7800 - - [7800:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-21T02:38:36.071832+02:00 pfSEnse.my-site.net unbound 390 - - [390:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-21T02:38:41.846193+02:00 pfSEnse.my-site.net unbound 390 - - [390:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-21T02:38:45.098570+02:00 pfSEnse.my-site.net unbound 9309 - - [9309:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-21T02:38:52.371103+02:00 pfSEnse.my-site.net unbound 9309 - - [9309:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-22T16:46:33.383773+02:00 pfSEnse.my-site.net unbound 53212 - - [53212:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-22T16:46:52.584831+02:00 pfSEnse.my-site.net unbound 53212 - - [53212:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-22T16:46:55.806446+02:00 pfSEnse.my-site.net unbound 31030 - - [31030:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-22T16:50:30.755244+02:00 pfSEnse.my-site.net unbound 31030 - - [31030:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-22T16:50:32.673715+02:00 pfSEnse.my-site.net unbound 86962 - - [86962:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-22T16:52:26.575190+02:00 pfSEnse.my-site.net unbound 86962 - - [86962:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-22T16:52:28.480685+02:00 pfSEnse.my-site.net unbound 83093 - - [83093:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-22T16:54:13.110429+02:00 pfSEnse.my-site.net unbound 83093 - - [83093:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-22T16:54:15.008855+02:00 pfSEnse.my-site.net unbound 41447 - - [41447:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-22T16:59:55.007531+02:00 pfSEnse.my-site.net unbound 41447 - - [41447:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-22T16:59:56.932470+02:00 pfSEnse.my-site.net unbound 2988 - - [2988:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-23T17:15:38.524324+02:00 pfSEnse.my-site.net unbound 2988 - - [2988:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-06-23T17:15:40.408052+02:00 pfSEnse.my-site.net unbound 49970 - - [49970:0] info: start of service (unbound 1.13.2).
              <30>1 2022-06-27T00:00:42.045699+02:00 pfSEnse.my-site.net unbound 49970 - - [49970:0] info: service stopped (unbound 1.13.2).
              <30>1 2022-07-02T00:00:48.060874+02:00 pfSEnse.my-site.net unbound 97180 - - [97180:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-05T00:00:33.888195+02:00 pfSEnse.my-site.net unbound 97180 - - [97180:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-05T00:00:35.778522+02:00 pfSEnse.my-site.net unbound 3931 - - [3931:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-08T10:22:30.888264+02:00 pfSEnse.my-site.net unbound 3931 - - [3931:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-08T10:22:34.399240+02:00 pfSEnse.my-site.net unbound 685 - - [685:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-14T00:00:50.029350+02:00 pfSEnse.my-site.net unbound 83848 - - [83848:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-17T00:00:54.309024+02:00 pfSEnse.my-site.net unbound 83848 - - [83848:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-17T00:00:56.408997+02:00 pfSEnse.my-site.net unbound 54222 - - [54222:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-18T00:00:58.241048+02:00 pfSEnse.my-site.net unbound 54222 - - [54222:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-18T00:01:00.444239+02:00 pfSEnse.my-site.net unbound 22032 - - [22032:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-18T11:32:19.600367+02:00 pfSEnse.my-site.net unbound 22032 - - [22032:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-18T11:32:26.992179+02:00 pfSEnse.my-site.net unbound 8790 - - [8790:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-19T13:00:20.983512+02:00 pfSEnse.my-site.net unbound 8790 - - [8790:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-19T13:00:22.083823+02:00 pfSEnse.my-site.net unbound 18042 - - [18042:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-20T12:00:34.296337+02:00 pfSEnse.my-site.net unbound 18042 - - [18042:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-20T12:00:36.365909+02:00 pfSEnse.my-site.net unbound 24358 - - [24358:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-21T00:00:38.712404+02:00 pfSEnse.my-site.net unbound 24358 - - [24358:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-21T00:00:40.611149+02:00 pfSEnse.my-site.net unbound 58053 - - [58053:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-21T10:45:44.144614+02:00 pfSEnse.my-site.net unbound 58053 - - [58053:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-21T10:45:46.047555+02:00 pfSEnse.my-site.net unbound 93413 - - [93413:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-21T14:29:28.563979+02:00 pfSEnse.my-site.net unbound 93413 - - [93413:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-21T14:29:32.039512+02:00 pfSEnse.my-site.net unbound 70486 - - [70486:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-21T14:29:44.413710+02:00 pfSEnse.my-site.net unbound 70486 - - [70486:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-21T14:29:47.879285+02:00 pfSEnse.my-site.net unbound 50669 - - [50669:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-21T14:30:17.753945+02:00 pfSEnse.my-site.net unbound 50669 - - [50669:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-21T14:30:21.189694+02:00 pfSEnse.my-site.net unbound 24816 - - [24816:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-25T00:00:41.414948+02:00 pfSEnse.my-site.net unbound 24816 - - [24816:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-25T00:00:43.444923+02:00 pfSEnse.my-site.net unbound 53748 - - [53748:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-25T08:52:18.808277+02:00 pfSEnse.my-site.net unbound 53748 - - [53748:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-25T08:52:19.890207+02:00 pfSEnse.my-site.net unbound 49444 - - [49444:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-25T08:53:59.193396+02:00 pfSEnse.my-site.net unbound 49444 - - [49444:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-25T08:53:59.618754+02:00 pfSEnse.my-site.net unbound 78030 - - [78030:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-25T08:54:25.182703+02:00 pfSEnse.my-site.net unbound 78030 - - [78030:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-25T08:54:27.240339+02:00 pfSEnse.my-site.net unbound 77318 - - [77318:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-26T09:27:12.426763+02:00 pfSEnse.my-site.net unbound 77318 - - [77318:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-26T09:27:16.229963+02:00 pfSEnse.my-site.net unbound 17569 - - [17569:0] info: start of service (unbound 1.15.0).
              <30>1 2022-07-26T09:32:45.480474+02:00 pfSEnse.my-site.net unbound 17569 - - [17569:0] info: service stopped (unbound 1.15.0).
              <30>1 2022-07-26T09:32:48.194044+02:00 pfSEnse.my-site.net unbound 74846 - - [74846:0] info: start of service (unbound 1.15.0).
              
              

              Every "stopped" line should be followed by a "start" line like this :

              <30>1 2022-07-26T09:32:48.194044+02:00 pfSense.brit-hotel-fumel.net unbound 74846 - - [74846:0] info: start of service (unbound 1.15.0).
              

              after all, unbound is never just stopped. That only happens when you shut the system down, or you disable unbound in the GUI yourself.
              All other process actions are stop + start actions.

              What I propose : if stopping + starting raises the chance of finding unbound dead in the water, then I vote for lowering the number of these stops and starts.

              Disabling "DHCP Leases option" is just one quick way to do this.

              There are some redmine bug reports about this.
              Better solution have been proposed, like using unboundctl to insert new host names into unbound the DNS cache when they get announced by the DHCP server (if they gave a valid non empty host name, as this is often the case, so no intercation is needed with DNS).

              Without any proof, I think that arm based devices are more sensible to this issues.
              @ik13 : arm or intel ?

              No "help me" PM's please. Use the forum, the community will thank you.
              Edit : and where are the logs ??

              J I 2 Replies Last reply Jul 27, 2022, 9:00 AM Reply Quote 0
              • J
                johnpoz LAYER 8 Global Moderator @Gertjan
                last edited by Jul 27, 2022, 9:00 AM

                @gertjan said in pfSense resolver stops working:

                Without any proof, I think that arm based devices are more sensible to this issues.

                I think you might be on to something there, and I also think using tls forwarding doesn't help either..

                An intelligent man is sometimes forced to be drunk to spend time with his fools
                If you get confused: Listen to the Music Play
                Please don't Chat/PM me for help, unless mod related
                SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                G 1 Reply Last reply Jul 27, 2022, 9:17 AM Reply Quote 0
                • G
                  Gertjan @johnpoz
                  last edited by Jul 27, 2022, 9:17 AM

                  @johnpoz said in pfSense resolver stops working:

                  and I also think using tls forwarding doesn't help either

                  Like the TLS hardware support that 'blocks' as seen several times, and only a power down - 10 seconds - power up can make it available to the system again.
                  Not being able to make a TLS connection, and thus not being able to contact the update servers of Netgate is a known visible part of the issue.
                  DNS failing to make TLS connections, or even blocking on it, would be a nasty thing.

                  I'm totally not know where I'm talking about of course.

                  DNSSEC also 'signes' stuff, and checks signatures, so it uses TLS ? In that case ....

                  No "help me" PM's please. Use the forum, the community will thank you.
                  Edit : and where are the logs ??

                  J 1 Reply Last reply Jul 27, 2022, 9:30 AM Reply Quote 0
                  • J
                    johnpoz LAYER 8 Global Moderator @Gertjan
                    last edited by Jul 27, 2022, 9:30 AM

                    @gertjan said in pfSense resolver stops working:

                    so it uses TLS ? In that case ....

                    dnssec is not tls based.. The traffic between the dns and the client is not encrypted, the records and info are just signed, and can be verified with the public key.

                    https://www.cloudflare.com/dns/dnssec/how-dnssec-works/

                    An intelligent man is sometimes forced to be drunk to spend time with his fools
                    If you get confused: Listen to the Music Play
                    Please don't Chat/PM me for help, unless mod related
                    SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                    G 1 Reply Last reply Jul 27, 2022, 10:13 AM Reply Quote 0
                    • G
                      Gertjan @johnpoz
                      last edited by Jul 27, 2022, 10:13 AM

                      @johnpoz said in pfSense resolver stops working:

                      The traffic between

                      Correct. DNSSEC traffic is send over the wire in clear.
                      But it's the "check the crypting", the check of hashes, signing keys etc that makes me think : is the same openssl library used ? Guess so : https://www.cloudflare.com/dns/dnssec/dnssec-complexities-and-considerations/
                      And if so, is openssl using hardware for this, if aviable ? The same hardware it uses for "AES" TLS etc.

                      No "help me" PM's please. Use the forum, the community will thank you.
                      Edit : and where are the logs ??

                      1 Reply Last reply Reply Quote 0
                      • M
                        maverickws
                        last edited by maverickws Jul 27, 2022, 12:19 PM Jul 27, 2022, 12:03 PM

                        @johnpoz said in pfSense resolver stops working:

                        @maverickws if your saying its running, but not responding..

                        So when you query directly, you get a timeout, NX, refused? Do you have unbound using what interfaces - did you have an interface go down, like wan or vpn, or whatever?

                        It resolves nothing, not pfsense own name? Or local - or doesn't resolve public stuff like google.com?

                        example I just send a empty query to unbound from my pc, and I get back roots.. Your saying this fails? With what error, timeout?

                        Alright so back to the issue again. It happened again yesterday at local time 17:46 GMT +1 (daylight savings) - Not resolving.

                        In the meanwhile it recovered and I waited until it failed again.
                        Today it didn't recover, I had to restart the unbound service manually, and before I did all of the remaining tests.

                        1. DNS Resolver System logs. Yesterday had no entries since 16:50 (issue occurred at ~17:45) and today no entries on the log since 10:35 and the issue occurred after 12h00). No start/stops;
                        2. Last of process dhcpleases is also of 16:50, today of 10:36;
                        3. No interface changes or other issues within the timeframe where the issue started occurring, let's say the last 20 minutes;

                        Last entries on resolver log:

                        
                        Time	Process	PID	Message
                        Jul 26 16:50:08	unbound	94538	[94538:0] info: generate keytag query _ta-4f66. NULL IN
                        Jul 26 16:50:07	unbound	94538	[94538:0] info: start of service (unbound 1.15.0).
                        Jul 26 16:50:07	unbound	94538	[94538:0] notice: init module 1: iterator
                        Jul 26 16:50:07	unbound	94538	[94538:0] notice: init module 0: validator
                        Jul 26 16:50:07	unbound	94538	[94538:0] notice: Restart of unbound 1.15.0.
                        

                        On the general log:

                        
                        Time	Process	PID	Message
                        Jul 26 17:36:00	sshguard	75200	Now monitoring attacks.
                        Jul 26 17:36:00	sshguard	67697	Exiting on signal.
                        Jul 26 17:10:00	sshguard	67697	Now monitoring attacks.
                        Jul 26 17:10:00	sshguard	26927	Exiting on signal.
                        

                        When I do dig to the interface CARP VIP without any query:

                        # dig @10.0.0.254
                        
                        ; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> @10.0.0.254
                        ; (1 server found)
                        ;; global options: +cmd
                        ;; Got answer:
                        ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45328
                        ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 1
                        
                        ;; OPT PSEUDOSECTION:
                        ; EDNS: version: 0, flags:; udp: 1332
                        ;; QUESTION SECTION:
                        ;.				IN	NS
                        
                        ;; ANSWER SECTION:
                        .			83046	IN	NS	j.root-servers.net.
                        .			83046	IN	NS	k.root-servers.net.
                        .			83046	IN	NS	l.root-servers.net.
                        .			83046	IN	NS	m.root-servers.net.
                        .			83046	IN	NS	a.root-servers.net.
                        .			83046	IN	NS	b.root-servers.net.
                        .			83046	IN	NS	c.root-servers.net.
                        .			83046	IN	NS	d.root-servers.net.
                        .			83046	IN	NS	e.root-servers.net.
                        .			83046	IN	NS	f.root-servers.net.
                        .			83046	IN	NS	g.root-servers.net.
                        .			83046	IN	NS	h.root-servers.net.
                        .			83046	IN	NS	i.root-servers.net.
                        
                        ;; Query time: 1 msec
                        ;; SERVER: 10.0.0.254#53(10.0.0.254)
                        ;; WHEN: Tue Jul 26 17:46:02 WEST 2022
                        ;; MSG SIZE  rcvd: 239
                        

                        When I do the dig with a query:

                        # dig @10.0.0.254 google.com
                        
                        ; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> @10.0.0.254 google.com
                        ; (1 server found)
                        ;; global options: +cmd
                        ;; Got answer:
                        ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 10156
                        ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
                        
                        ;; OPT PSEUDOSECTION:
                        ; EDNS: version: 0, flags:; udp: 1332
                        ;; QUESTION SECTION:
                        ;google.com.			IN	A
                        
                        ;; Query time: 0 msec
                        ;; SERVER: 10.0.0.254#53(10.0.0.254)
                        ;; WHEN: Tue Jul 26 17:47:28 WEST 2022
                        ;; MSG SIZE  rcvd: 39
                        

                        So the error is SERVFAIL.

                        If the query has a fqdn of the internal domain, it returns the results successfully without any error. (NOERROR) So it does resolves locally.

                        result of nslookup

                        # nslookup stackoverflow.com
                        ;; Got SERVFAIL reply from PUB.LIC.DNS.SV0, trying next server
                        ;; Got SERVFAIL reply from PUB.LIC.DNS.SV1, trying next server
                        Server:		10.0.0.254
                        Address:	10.0.0.254#53
                        
                        ** server can't find stackoverflow.com: SERVFAIL
                        

                        After restarting unbound it starts working again.
                        I have no stopped/started messages anywhere near the time it stops working.

                        EDIT:
                        In the meanwhile I disabled the Register DHCP Leases option on the Resolver to see how it goes.
                        But while looking for possible causes behind this, I found this blog article and also got me thinking that, in fact, the issue occurs with domain names that have many CNAME records, as google, stripe, stackoverflow, etc. But really dunno. Just trying to look everywhere to see if the culprit is found.

                        This is causing us major concerns as, for example, customers try to login to websites on servers behind this pfSense, and if the server can't resolve google for the recaptcha - people can't login, if it can't resolve stripe, we get payments issues, so this is creating a bit of a grief.

                        EDIT2:
                        I just remembered one of the issues that occur is with our email server, and that record is an A Record not a CNAME so what I mentioned before must be unrelated.

                        J 1 Reply Last reply Jul 27, 2022, 12:24 PM Reply Quote 0
                        • J
                          johnpoz LAYER 8 Global Moderator @maverickws
                          last edited by johnpoz Jul 27, 2022, 12:39 PM Jul 27, 2022, 12:24 PM

                          @maverickws said in pfSense resolver stops working:

                          So it does resolves locally

                          This is good info, so unbound is running and can resolve locally - so the trick here is figuring out why servfail on specific domains or fqdns

                          Not sure exactly what this is

                          ;; Got SERVFAIL reply from PUB.LIC.DNS.SV0, trying next server
                          ;; Got SERVFAIL reply from PUB.LIC.DNS.SV1, trying next server
                          

                          To me that reads that your forwarding, and where your forwarding sent back fail.

                          Did you obfuscate the server IP or something with that? Who exactly got asked that reported servfail?

                          Or is that just the client saying hey I asked these 2 servers and they both reported servfail - and what are those 2 servers?

                          An intelligent man is sometimes forced to be drunk to spend time with his fools
                          If you get confused: Listen to the Music Play
                          Please don't Chat/PM me for help, unless mod related
                          SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                          M 1 Reply Last reply Jul 27, 2022, 12:42 PM Reply Quote 0
                          • M
                            maverickws @johnpoz
                            last edited by maverickws Jul 27, 2022, 12:43 PM Jul 27, 2022, 12:42 PM

                            @johnpoz its obfuscation of server IP.

                            The records obfuscated correspond to the pfSense's CARP WAN VIP and CARP DMZ VIP.

                            On the DHCP Server options, beside only having static leases, we use those IP's as DNS servers (instead of using the LAN CARP VIP, which would be something alike 10.0.0.254).

                            On normal conditions (like now, it recovered after the manual restart to unbound) it works normally and resolves without issues.

                            J G 2 Replies Last reply Jul 27, 2022, 12:54 PM Reply Quote 0
                            • J
                              johnpoz LAYER 8 Global Moderator @maverickws
                              last edited by Jul 27, 2022, 12:54 PM

                              @maverickws ok that makes more sense ;)

                              So when it fails like that with servfail - all things you try fail, or does anything work?

                              Problem with servfail is its sort of a catchall - and isn't specific in what exactly failed.. But knowing that local resources are resolving tells us unbound didn't go full belly up.

                              Might help to up the verbosity of the unbound logs all the way, but that can be a lot of logging ;)

                              I have a 3100 sitting here in a box.. I am thinking of firing it up, and then running some dnsperf testing on it, say have it run through million different queries at like 100 queries a second or something, and then loop that to see if can cause failure.. There are sample files you can download that have 10million records in them to lookup..

                              hmmmm - need to check my cal to what real work is going to be like today ;)

                              An intelligent man is sometimes forced to be drunk to spend time with his fools
                              If you get confused: Listen to the Music Play
                              Please don't Chat/PM me for help, unless mod related
                              SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                              M 1 Reply Last reply Jul 27, 2022, 1:18 PM Reply Quote 0
                              • bmeeksB
                                bmeeks
                                last edited by bmeeks Jul 27, 2022, 1:21 PM Jul 27, 2022, 1:16 PM

                                Here is my suspicion about the unbound problems.

                                pfSense is currently running the 1.15.0 version of unbound in the RELEASE branches. That version has a bug that is discussed at length here: https://github.com/NLnetLabs/unbound/issues/670. That bug should be fixed in the latest unbound package version (which is 1.16.1).

                                FreeBSD ports has the most recent unbound version (1.16.1). Because unbound is a built-in package within pfSense, I don't think it is easy for them to push an update unless they change the pfSense version.

                                And just to be clear, turning on the "Register DHCP Leases" option is also problematic because it results in a ton of unbound restarts. While updating to the latest unbound version, I would also like to see the Netgate team fix the "Register DHCP Leases" option so that it works properly and does not restart the resolver with each lease renewal.

                                M 1 Reply Last reply Jul 27, 2022, 1:18 PM Reply Quote 1
                                • M
                                  maverickws @bmeeks
                                  last edited by Jul 27, 2022, 1:18 PM

                                  This post is deleted!
                                  1 Reply Last reply Reply Quote 0
                                  • M
                                    maverickws @johnpoz
                                    last edited by maverickws Jul 27, 2022, 1:20 PM Jul 27, 2022, 1:18 PM

                                    @johnpoz sorry for not being clearer!

                                    Ok so I'm not sure what is your question when you say

                                    So when it fails like that with servfail - all things you try fail, or does anything work?

                                    What things you mean? Usually it goes like we start catching some errors like captcha stops working or API connection to stripe stops, also our mail server sends warnings on failed resolutions, so we get about that occurrence.
                                    We then login to our jump box and to the server where the errors come from, could be a web server or the mail server or other, eg. yesterday I was testing on the webserver and jump box and today I was testing on the mail server. On that regard these are VM's, and the hosts where these VM's sit are maybe a thousand clicks apart, the host with the webserver VM is at a DC in Germany, the jumpbox is on one DC in Scandinavia, and the mail server is also in Scandinavia but on another DC room.

                                    Since I've disabled the DHCP leases option, haven't had any more hiccups.

                                    EDIT:
                                    @bmeeks 's comment and issue do seem very to the point.

                                    J 1 Reply Last reply Jul 27, 2022, 2:06 PM Reply Quote 0
                                    • G
                                      Gertjan @maverickws
                                      last edited by Jul 27, 2022, 1:56 PM

                                      @maverickws said in pfSense resolver stops working:

                                      the pfSense's CARP WAN VIP and CARP DMZ VIP.
                                      ....
                                      dig @10.0.0.254 google.com
                                      ....
                                      Server: 10.0.0.254
                                      Address: 10.0.0.254#53

                                      This 10.0.0.254 is a virtual or 'software' defined interface ?
                                      ( I never used VIP or CARP stuff )

                                      While failing, what happens when you do the mighty :

                                      dig @127.0.0.1 google.com
                                      

                                      I recall (a couple of years ago) seeing on my own pfSense that "127.0.0.1" didn't exist any more.
                                      That was bad.
                                      I wasn't unbound's fault, and unbound didn't like this situation that all.
                                      I, as an admin, could still 'dig' using any of my LAN IP interfaces.
                                      I didn't know what killed 127.0.0.1, had to reboot.

                                      @bmeeks That bug report was already mentioned no so long ago.
                                      My thoughts : It is an OpenBSD 7 compiled version.
                                      The fact that "OpenBSD" is mentioned here, means that it is OpenBSD related ?
                                      One of the unbound coders is posting : wouldn't he know that it could be an "any OS issue" ?

                                      The patch goes into iterator/iterator.c : that, for me, the core of the resolver.

                                      Btw : the patch :

                                      The green 'added' code :

                                      	iter_mark_cycle_targets(qstate, iq->dp);
                                      	missing = (int)delegpt_count_missing_targets(iq->dp);
                                      	log_assert(maxtargets != 0); /* that would not be useful */
                                      
                                      	/* Generate target requests. Basically, any missing targets
                                      	 * are queried for here, regardless if it is necessary to do
                                      	 * so to continue processing. */
                                      	if(maxtargets < 0 || maxtargets > missing)
                                      		toget = missing;
                                      	else	toget = maxtargets;
                                      	if(toget == 0) {
                                      		*num = 0;
                                      		return 1;
                                      	}
                                      

                                      The removed "red" code

                                      	iter_mark_cycle_targets(qstate, iq->dp);
                                      	missing = (int)delegpt_count_missing_targets(iq->dp);
                                      	log_assert(maxtargets != 0); /* that would not be useful */
                                      
                                      	/* Generate target requests. Basically, any missing targets 
                                      	 * are queried for here, regardless if it is necessary to do 
                                      	 * so to continue processing. */
                                      	if(maxtargets < 0 || maxtargets > missing)
                                      		toget = missing;
                                      	else	toget = maxtargets;
                                      	if(toget == 0) {
                                      		*num = 0;
                                      		return 1;
                                      	}
                                      

                                      The WTF part : both are identical to me.
                                      That's what I call a NOP.

                                      No "help me" PM's please. Use the forum, the community will thank you.
                                      Edit : and where are the logs ??

                                      bmeeksB M 2 Replies Last reply Jul 27, 2022, 2:21 PM Reply Quote 0
                                      • J
                                        johnpoz LAYER 8 Global Moderator @maverickws
                                        last edited by johnpoz Jul 27, 2022, 2:06 PM Jul 27, 2022, 2:06 PM

                                        @maverickws Yeah I concur with @bmeeks unbound should be updated if there is known issues in the 1.15 that could cause failure, even if not directly related. Any sort of issues that could cause failure

                                        From that thread, makes mention of

                                        do-ip6: no

                                        And that user unable to reproduce the problem... That could be something you could try.. Its easy enough to add to the custom options box.

                                        What I meant with my question is while you do mention a few domains fail.. Is nothing resolving, do cached entries still work I take it.. When you were testing and seeing servfail - did anything respond, or everything nonlocal you tried was servfail. You can always look in the cache - if there is issue with resolving but cache still works, that is just another piece of the puzzle that could be helpful.

                                        An intelligent man is sometimes forced to be drunk to spend time with his fools
                                        If you get confused: Listen to the Music Play
                                        Please don't Chat/PM me for help, unless mod related
                                        SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                                        M 1 Reply Last reply Jul 27, 2022, 2:26 PM Reply Quote 0
                                        • bmeeksB
                                          bmeeks @Gertjan
                                          last edited by Jul 27, 2022, 2:21 PM

                                          @gertjan:
                                          The new code is added to the source file up higher. That code is a type of "limit check". It is called earlier in the revised code than it was in the v1.15.0 code.

                                          It now makes its test earlier in the processing logic. That is the "fix" for the bug.

                                          1 Reply Last reply Reply Quote 0
                                          24 out of 66
                                          • First post
                                            24/66
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.