Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    sg-1100 2.4.5 unbound python module + DHCP lease DNS registration memory leak

    Scheduled Pinned Locked Moved DHCP and DNS
    18 Posts 3 Posters 2.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B
      bruor
      last edited by bruor

      I'm hoping someone here might be able to help me sort out why unbound on pfsense isn't fond of this python module?

      On previous versions of pfsense I've used a custom install of BIND and domain overrides to do AAAA filtering for specific domains (youtube) that don't work great over my ISP's (Start.ca) 6RD tunnel, or for IPv6 enabled networks where I want to make use of IP whitelisting for Google's SMTP-relay (which doesn't allow whitelisting entire IPv6 ranges. But on version 2.4.5 that is no longer an option. johnpoz recommended switching to an unbound python module here to handle that type of filtration: https://forum.netgate.com/topic/151745/bind-filter-aaaa/8

      However, I've noticed that when enabled, unbound will steadily consume all memory on the firewall until it becomes unresponsive. If I disable the python module function memory utilization is stable.

      here's a graph where you can clearly see the moment I turned it on (11:46), and back off again.
      Capture.PNG

      Contents of python module file:

      def init_standard(id, cfg):
          return True
      
      def deinit(id):
          return True
      
      def inform_super(id, qstate, superqstate, qdata):
          return True
      
      domains = [
          "smtp-relay.gmail.com.",
          "youtube.com.",
          "googlevideo.com.",
          "ytimg.com.",
      #    "netflix.com.",
      #    "netflix.net.",
      #    "nflxext.com.",
      #    "nflximg.net.",
      #    "nflxvideo.net.",
      #    "nflxso.net.",
      ]
      
      def operate(id, event, qstate, qdata):
          if event == MODULE_EVENT_NEW or event == MODULE_EVENT_PASS:
              if qstate.qinfo.qtype != RR_TYPE_AAAA:
                  qstate.ext_state[id] = MODULE_WAIT_MODULE
                  return True
      
              for domain in domains:
                  if qstate.qinfo.qname_str == domain or qstate.qinfo.qname_str.endswith("." + domain):
                      msg = DNSMessage(qstate.qinfo.qname_str, RR_TYPE_A, RR_CLASS_IN, PKT_QR | PKT_RA | PKT_AA)
                      if not msg.set_return_msg(qstate):
                          qstate.ext_state[id] = MODULE_ERROR
                          return True
                      # We don't need validation, result is valid
                      qstate.return_msg.rep.security = 2
                      qstate.return_rcode = RCODE_NOERROR
                      qstate.ext_state[id] = MODULE_FINISHED
                      log_info("no-aaaa: blocking AAAA request for %s" % qstate.qinfo.qname_str)
                      return True
              qstate.ext_state[id] = MODULE_WAIT_MODULE
              return True
      
          if event == MODULE_EVENT_MODDONE:
              qstate.ext_state[id] = MODULE_FINISHED
              return True
      
          qstate.ext_state[id] = MODULE_ERROR
          return True
      
      log_info("pythonmod: script loaded")
      

      I have just noticed that there is an unbound update available, I've applied it via SSH, will re-test on that version and post back here with results.

      1 Reply Last reply Reply Quote 0
      • GertjanG
        Gertjan
        last edited by

        https://forum.netgate.com/topic/154111/prevent-unbound-resolving-ipv6-for-one-domain/4
        I'm using the very same python mod for a long time now.

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        B 1 Reply Last reply Reply Quote 0
        • B
          bruor
          last edited by

          Still present after updating unbound. Turned on the no-aaaa module last night around 10pm and left it overnight, today unbound was using up way more ram than it should be.

          Capture.PNG

          I just checked in on an SG-3100 and it seems to have the same issue, although way less pronounced because it's got 4x the RAM to consume. Ram usage has been slowly increasing over the last few months but the only thing we filter with that script is smtp-relay.gmail.com

          I will also try this on a freshly deployed sg-1100 to see if the issue happens there.

          1 Reply Last reply Reply Quote 0
          • B
            bruor @Gertjan
            last edited by

            @Gertjan Mind posting your unbound config? Also, are you running on an SG-1100 also?

            ##
            # Server configuration
            ##
            server:
            
            chroot: /var/unbound
            username: "unbound"
            directory: "/var/unbound"
            pidfile: "/var/run/unbound.pid"
            use-syslog: yes
            port: 53
            verbosity: 1
            hide-identity: yes
            hide-version: yes
            harden-glue: yes
            do-ip4: yes
            do-ip6: yes
            do-udp: yes
            do-tcp: yes
            do-daemonize: yes
            module-config: "python validator iterator"
            unwanted-reply-threshold: 0
            num-queries-per-thread: 512
            jostle-timeout: 200
            infra-host-ttl: 900
            infra-cache-numhosts: 10000
            outgoing-num-tcp: 10
            incoming-num-tcp: 10
            edns-buffer-size: 4096
            cache-max-ttl: 86400
            cache-min-ttl: 0
            harden-dnssec-stripped: yes
            msg-cache-size: 4m
            rrset-cache-size: 8m
            
            num-threads: 2
            msg-cache-slabs: 2
            rrset-cache-slabs: 2
            infra-cache-slabs: 2
            key-cache-slabs: 2
            outgoing-range: 4096
            #so-rcvbuf: 4m
            auto-trust-anchor-file: /var/unbound/root.key
            prefetch: no
            prefetch-key: yes
            use-caps-for-id: no
            serve-expired: no
            # Statistics
            # Unbound Statistics
            statistics-interval: 0
            extended-statistics: yes
            statistics-cumulative: yes
            
            # TLS Configuration
            tls-cert-bundle: "/etc/ssl/cert.pem"
            
            # Interface IP(s) to bind to
            interface-automatic: yes
            interface: 0.0.0.0
            interface: ::0
            
            # Outgoing interfaces to be used
            outgoing-interface: _removed_
            outgoing-interface: _removed_
            outgoing-interface: _removed_
            outgoing-interface: _removed_
            
            # DNS Rebinding
            # For DNS Rebinding prevention
            private-address: 127.0.0.0/8
            private-address: 10.0.0.0/8
            private-address: ::ffff:a00:0/104
            private-address: 172.16.0.0/12
            private-address: ::ffff:ac10:0/108
            private-address: 169.254.0.0/16
            private-address: ::ffff:a9fe:0/112
            private-address: 192.168.0.0/16
            private-address: ::ffff:c0a8:0/112
            private-address: fd00::/8
            private-address: fe80::/10
            
            
            # Access lists
            include: /var/unbound/access_lists.conf
            
            # Static host entries
            include: /var/unbound/host_entries.conf
            
            # dhcp lease entries
            include: /var/unbound/dhcpleases_entries.conf
            
            
            
            # Domain overrides
            include: /var/unbound/domainoverrides.conf
            # Forwarding
            forward-zone:
                    name: "."
                    forward-addr: 8.8.8.8
                    forward-addr: 8.8.4.4
                    forward-addr: 2001:4860:4860::8888
                    forward-addr: 2001:4860:4860::8844
            
            
            # Unbound custom options
            server: include: /var/unbound/pfb_dnsbl.*conf
            local-data: "_vlmcs._tcp.domain.com 3600 IN SRV 10 0 1688 pfsense.domain.com"
            
            ###
            # Remote Control Config
            ###
            include: /var/unbound/remotecontrol.conf
            
            # Python Module
            python:
            python-script: no-aaaa.py
            
            1 Reply Last reply Reply Quote 0
            • B
              bruor
              last edited by

              I looked at this from the beginning and it appears that my script was owned by root:unbound, I've changed that to unbound:unbound, monitoring to see if it makes any difference.

              1 Reply Last reply Reply Quote 0
              • B
                bruor
                last edited by bruor

                I turned off DHCP leases being registered at the same time, preliminary results look good here. I wonder if the reloads caused by DHCP renewals is the actual cause of the issue.... Turning it back on to see.

                Capture.PNG

                1 Reply Last reply Reply Quote 0
                • B
                  bruor
                  last edited by bruor

                  So this appears to be a memory leak that happens when you use a python module and have "register DHCP leases in DNS resolver" enabled.

                  1 Reply Last reply Reply Quote 0
                  • GertjanG
                    Gertjan
                    last edited by Gertjan

                    @bruor said in sg-1100 2.4.5 unbound python module + DHCP lease DNS registration memory leak:

                    "register DHCP leases in DNS resolver"

                    has this nice habit of restarting the Resolver on every new lease on a LAN.
                    The cache get's ditched while doing so.
                    When you use a package like pfBlockerNG with a lot of feeds, the restart will take (a lot of) time, during which no DNS isn't working : another not so nice side effect.
                    You have to have special reasons to have "register DHCP leases in DNS resolver" activated (checked).

                    Anyway.
                    I'll activate 'check' mine and see what happens with the memory.

                    No "help me" PM's please. Use the forum, the community will thank you.
                    Edit : and where are the logs ??

                    B 1 Reply Last reply Reply Quote 0
                    • B
                      bruor @Gertjan
                      last edited by

                      @Gertjan here I have about 40 devices on the network and it it takes about a day for unbound to consume 400ish MB of RAM.

                      My best guess is that the restart of unbound is just a reload and some stuff stays resident in mom, and gets launched again. If I do a full restart of the unbound service it'll free up the ram it's hogging and go back to normal.

                      In order to diagnose this I actually disabled pfblocker-ng and the issue still happened.

                      GertjanG 1 Reply Last reply Reply Quote 0
                      • GertjanG
                        Gertjan @bruor
                        last edited by

                        @bruor said in sg-1100 2.4.5 unbound python module + DHCP lease DNS registration memory leak:

                        My best guess is that the restart of unbound is just a reload and some stuff stays resident in mom, and gets launched again.

                        The thing is, it's worse... Dono ... depends which issue we're talking about.
                        When unbound "reloads", the process is stopped, all memory is freed. This in an "OS" thing, not an unbound thing.
                        Then, it starts again ...

                        This repeatability process re starting can not leak memory. If it would, millions of FreeBSD installs would fault after several minutes ...
                        But .... unbound instantiates Python. and I presume (^^) that, when unbound stops, Python also 'stops' == freeing its used memory and resources.

                        No "help me" PM's please. Use the forum, the community will thank you.
                        Edit : and where are the logs ??

                        1 Reply Last reply Reply Quote 0
                        • B
                          bruor
                          last edited by bruor

                          I agree, but I believe the DHCP leases cause an unbound reload and not a full restart. I have a feeling that reload doesn't clean up the python process but launches another. Full restart however, does. I think I could figure out how to just script 20x reloads for unbound back to back to see if this is the case.

                          GertjanG 1 Reply Last reply Reply Quote 0
                          • GertjanG
                            Gertjan @bruor
                            last edited by

                            @bruor said in sg-1100 2.4.5 unbound python module + DHCP lease DNS registration memory leak:

                            I agree, but I believe the DHCP leases cause an unbound reload and not a full restart

                            A boatload of forum threads already handle that questions over the last two, three years.
                            A couple of us actually looked at the source code of unbound. It's a 'free' project anyway, so : why would you choose to believe if you can see the thing in front of you : Unbound, when it receives a SIGHUP signal, stops. It will exit. It does not implement a something that looks like a "reload the config an go".

                            I activated (checked) " DHCP Registration Register DHCP leases in the DNS Resolver" - I have about 40 LAN devices ** and a another LAN network for visitors/clients/BYOD devices.

                            My pfSense Memory usage is public info : https://www.test-domaine.fr/munin/brit-hotel-fumel.net/pfsense.brit-hotel-fumel.net/memory.html
                            Active memory is actually .. lowering. W'll see what happens after some days.

                            No "help me" PM's please. Use the forum, the community will thank you.
                            Edit : and where are the logs ??

                            1 Reply Last reply Reply Quote 0
                            • B
                              bruor
                              last edited by bruor

                              Thanks for trying to reproduce on your end.

                              When looking at https://www.freebsd.org/cgi/man.cgi?unbound.conf it indicates that a HUP is handled as a config reload and is different than a full restart. Mentioned in the interface and username sections

                              GertjanG 1 Reply Last reply Reply Quote 0
                              • B
                                bruor
                                last edited by

                                Can you have a look at your unbound process and see if it is consuming more ram than expected? Based on your graphs it's definitely not impacting your setup as radically as it did mine at home.

                                1 Reply Last reply Reply Quote 0
                                • GertjanG
                                  Gertjan @bruor
                                  last edited by Gertjan

                                  @bruor said in sg-1100 2.4.5 unbound python module + DHCP lease DNS registration memory leak:

                                  Thanks for trying to reproduce on your end.

                                  When looking at https://www.freebsd.org/cgi/man.cgi?unbound.conf it indicates that a HUP is handled as a config reload and is different than a full restart. Mentioned in the interface and username sections

                                  Doc is nice.
                                  But pure BS - or at least the interpretation is wrong.
                                  The nice thing of Open Software is : the real manual is the code. You can read it yourself.
                                  Again : HUP = dump process and start over. It's just a couple of lines of Code.

                                  https://github.com/NLnetLabs/unbound/blob/68eab24db7b3ecbd9c13feeaa4e3f4e4aea9c08a/daemon/unbound.c
                                  You find the main() - the rundeamon() and here https://github.com/NLnetLabs/unbound/blob/334498d9b94f080cd29fa5e1d3d426b9d1edfe6b/daemon/worker.c you see how, in a worker thread, the signals are handled.

                                  New seems : a HUP (SIGHUP) keeps the interfaces 'as is , and reloads the rest. That not ok for pfSEnse, where interfaces can come and go : cables are unplugged, OpenVPN instances are started or stopped, etc.

                                  But, hey, who cares : see for yourself the ancient way : dump the resolver.log real time in an console/SSH window.
                                  In another console/SSH, stop start / restart / reload / whatever your unbound process.
                                  The log times in seconds will give you all the details.

                                  For me : stop + restart or reload or restart = identical.
                                  To have it sweat a little bit, I bloated my pfBlockerNG-devel more then usual. More then a one GB of memory was being used.

                                  @bruor said in sg-1100 2.4.5 unbound python module + DHCP lease DNS registration memory leak:

                                  Can you have a look at your unbound process and see if it is consuming more ram than expected? Based on your graphs it's definitely not impacting your setup as radically as it did mine at home.

                                  I had pfBlockerNG-devel shut down for a day or two. It needed a day of.

                                  No "help me" PM's please. Use the forum, the community will thank you.
                                  Edit : and where are the logs ??

                                  1 Reply Last reply Reply Quote 0
                                  • B
                                    bruor
                                    last edited by

                                    Thanks, perhaps this is limited to the sg-1100 in some way. Opened a redmine to see if they can reproduce at their end

                                    B 1 Reply Last reply Reply Quote 0
                                    • B
                                      bruor @bruor
                                      last edited by bruor

                                      pfblocker has made a note about python modules being incompatible with DHCP lease registration updates in unbound needed a full restart vs sighup.

                                      https://forum.netgate.com/topic/158592/pfblockerng-devel-v3-0-0-no-longer-bound-by-unbound/2

                                      This led me to another search which seems to indicate that this is a long-standing issue.
                                      https://redmine.pfsense.org/issues/5413

                                      cmcdonaldC 1 Reply Last reply Reply Quote 0
                                      • cmcdonaldC
                                        cmcdonald Netgate Developer @bruor
                                        last edited by

                                        This has been addressed in the latest snapshots.

                                        We are testing the changes and will include them in 23.01 which is due soon.

                                        The issue is multifaceted.

                                        I've submitted upstream patches to both Unbound and the MaxMind DB Python module.

                                        The MaxMindDB Python module had several issues. The major issue though was a reference counting bug causing the Python garbage collector to prematurely free a heap-allocated structure. This led to a use-after-free causing Unbound to segfault.

                                        Unbound reloads the built-in Python interpreter every time Unbound is reloaded either by a SIGHUP signal or using the unbound-control interface. Python was not designed to be reloaded like that in the same process.

                                        I've fixed the refcounting bug in Maxmind, and patched Unbound so Python is only initialized and unwound once. I've also upgraded Python from 3.9 to 3.11.

                                        The memory usage should be significantly improved.

                                        The next improvement would be to rewrite the integration with ISC DHPCD to use a better interface with Unbound. That likely will have to wait until 23.05

                                        Need help fast? https://www.netgate.com/support

                                        1 Reply Last reply Reply Quote 2
                                        • First post
                                          Last post
                                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.