Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    23.09 Unbound killed failing to reclaim memory

    Scheduled Pinned Locked Moved General pfSense Questions
    34 Posts 7 Posters 5.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • GertjanG
      Gertjan @Mission-Ghost
      last edited by

      @Mission-Ghost said in 23.09 Unbound killed failing to reclaim memory:

      unbound Message Cache Size to 4mb (default) from 512mb,

      Isn't that half the system memory > Memory: 1GB DDR4 ?

      No "help me" PM's please. Use the forum, the community will thank you.
      Edit : and where are the logs ??

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Yes that does seem large for a 1G device. Yet presumably it was running like that in 23.05.1 so some other change has caused that to now become too large.

        Unfortunately there's no way we can test every combination of custom values like that. So whilst we ran many upgrade tests on the 1100 none of them would have included that.

        Anyway I'm glad you found what looks like the cause here.

        M 1 Reply Last reply Reply Quote 0
        • M
          Mission-Ghost @stephenw10
          last edited by

          @stephenw10 In retrospect it does seem a bit high. But even so it didn't use up all the memory on the box. Memory usage was at about 80%, and, well, why waste it? 🤷

          Memory usage had gone down some with 23.09 and the old parameters, but with these parameters reset to defaults, it's stable at around 60%.

          The error message about being killed for not releasing memory is still interesting. The box wasn't short of memory when these happened. Are there process quotas for memory usage?

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            There are for some things, like individual php processes for example. I suspect you might have hit that somehow since I'm not aware of a limit for Unbound directly.

            1 Reply Last reply Reply Quote 0
            • T
              tim_co @stephenw10
              last edited by

              @stephenw10

              Version 2.7.0-RELEASE (amd64)
              built on Wed Jun 28 03:53:34 UTC 2023
              FreeBSD 14.0-CURRENT

              The system is on the latest version.
              Version information updated at Mon Nov 20 10:38:18 MST 2023
              CPU Type Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
              Current: 1204 MHz, Max: 2700 MHz
              4 CPUs: 1 package(s) x 2 core(s) x 2 hardware threads
              AES-NI CPU Crypto: Yes (active)
              QAT Crypto: No
              Hardware crypto AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS
              Kernel PTI Enabled
              MDS Mitigation VERW

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                That's not the latest pfSense version. But yes there is a known issue with Suricata. There should be an update for that shortly.

                1 Reply Last reply Reply Quote 0
                • jimpJ
                  jimp Rebel Alliance Developer Netgate
                  last edited by

                  Might not be related but there is a newer version of unbound up now for 23.09 (1.18.0_1) which fixes a bug where unbound could get stuck, which may manifest itself in different ways.

                  See https://redmine.pfsense.org/issues/14980 for details there.

                  On 23.09 you can pull in the new version with pkg-static update; pkg-static upgrade -y unbound and then after that make sure you save/apply in the DNS Resolver settings (or reboot) to make sure the new daemon is running.

                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                  Need help fast? Netgate Global Support!

                  Do not Chat/PM for help!

                  M T 2 Replies Last reply Reply Quote 2
                  • M
                    Mission-Ghost @jimp
                    last edited by Mission-Ghost

                    @jimp thank you very much for following up. I'll go get this update and apply.

                    Seemed to go in ok:

                    Updating pfSense-core repository catalogue...
                    Fetching meta.conf:
                    pfSense-core repository is up to date.
                    Updating pfSense repository catalogue...
                    Fetching meta.conf:
                    pfSense repository is up to date.
                    All repositories are up to date.
                    Updating pfSense-core repository catalogue...
                    Fetching meta.conf:
                    pfSense-core repository is up to date.
                    Updating pfSense repository catalogue...
                    Fetching meta.conf:
                    pfSense repository is up to date.
                    All repositories are up to date.
                    The following 1 package(s) will be affected (of 0 checked):
                    
                    Installed packages to be UPGRADED:
                    	unbound: 1.18.0 -> 1.18.0_1 [pfSense]
                    
                    Number of packages to be upgraded: 1
                    
                    1 MiB to be downloaded.
                    [1/1] Fetching unbound-1.18.0_1.pkg: .......... done
                    Checking integrity... done (0 conflicting)
                    [1/1] Upgrading unbound from 1.18.0 to 1.18.0_1...
                    ===> Creating groups.
                    Using existing group 'unbound'.
                    ===> Creating users
                    Using existing user 'unbound'.
                    [1/1] Extracting unbound-1.18.0_1: .......... done
                    ------System/General log:
                    Nov 20 18:43:08 	check_reload_status 	342 	Syncing firewall
                    Nov 20 18:43:07 	php-fpm 	51606 	/services_unbound.php: Configuration Change: admin@192.168.10.110 (Local Database): DNS Resolver configured.
                    Nov 20 18:41:46 	pkg-static 	51529 	unbound upgraded: 1.18.0 -> 1.18.0_1 
                    
                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Let us know if that helps.

                      M 1 Reply Last reply Reply Quote 0
                      • M
                        Mission-Ghost @stephenw10
                        last edited by Mission-Ghost

                        [Edit 1: Ooma box started flashing again for no apparent reason, but the 'not found' error discussed below isn't in the DHCP server logs this time. Confounding problem so far. I have tested the drop cable and cable run from the switch--both ok. I've cold-booted the Ooma box multiple times.]

                        [Edit 2: Could the ISC DHCP binary being different in 23.09 than in 23.05.1 be the issue, as in these discussions: https://www.reddit.com/r/PFSENSE/comments/17xi9ut/isc_dhcp_server_in_2309/ and https://redmine.pfsense.org/issues/15011 ? If so, the switch to Kea may resolve this. I'll report back.]

                        @stephenw10 unbound seems to continue to work ok with the patch, although it already was working ok after I lowered the advanced settings for buffers and memory.

                        Oddly, since the update my Ooma Telo2 box has periodically stopped responding to pings and the status light goes from steady blue (everything's fine) to flashing blue-red (not ok). During these episodes the telephone service still works but it's concerning that at some point it won't work. Prior to 23.09 it has worked fine for many months.

                        I can find no clear cause-effect of the pfSense update to this Ooma box problem beyond the coincidence. I've made no changes to the DHCP configurations nor firewall rules before, during or after the update.

                        I'm having a bit of a difficult time diagnosing this issue too. Ooma box has no logs, and the pfSense logs are all clear except DHCP server (still ISC DHCP) which while reissuing the address periodically seems to be saying it's "not found" occasionally at times that correspond with the loss of ping. The Ooma box is 192.168.30.103 (on VLAN 30):

                        Nov 21 10:33:08 	dhcpd 	89986 	DHCPACK on 192.168.30.103 to 00:18:61:59:56:75 via mvneta0.30
                        Nov 21 10:33:08 	dhcpd 	89986 	DHCPREQUEST for 192.168.30.103 from 00:18:61:59:56:75 via mvneta0.30
                        Nov 21 09:33:07 	dhcpd 	89986 	DHCPACK on 192.168.30.103 to 00:18:61:59:56:75 via mvneta0.30
                        Nov 21 09:33:07 	dhcpd 	89986 	DHCPREQUEST for 192.168.30.103 from 00:18:61:59:56:75 via mvneta0.30
                        Nov 21 09:33:07 	dhcpd 	89986 	DHCPRELEASE of 192.168.30.103 from 00:18:61:59:56:75 via mvneta0.30 (not found)
                        Nov 21 09:22:24 	dhcpd 	89986 	DHCPACK on 192.168.30.103 to 00:18:61:59:56:75 via mvneta0.30
                        Nov 21 09:22:24 	dhcpd 	89986 	DHCPREQUEST for 192.168.30.103 from 00:18:61:59:56:75 via mvneta0.30
                        Nov 21 09:20:31 	dhcpd 	89986 	DHCPACK on 192.168.30.103 to 00:18:61:59:56:75 via mvneta0.30
                        Nov 21 09:20:31 	dhcpd 	89986 	DHCPREQUEST for 192.168.30.103 from 00:18:61:59:56:75 via mvneta0.30 
                        

                        I am not observing this "not found" error from a sampling of other devices on the .30 or other VLANs on the network. Just this one.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          The timing on those updates is interesting. Exactly 1h between the last updates looks expected. Where it eventually shows 'not found' looks like failed lease renewals. It would be interesting to see what the interval was before that initial failure. Did it go past the lease time for some reason maybe.

                          M 1 Reply Last reply Reply Quote 0
                          • M
                            Mission-Ghost @stephenw10
                            last edited by

                            @stephenw10 I switched to the Kea DHCP server at at 11:54:53, and I have not had the Ooma box fail to answer pings or function properly since:

                            Nov 21 15:43:53 	kea-dhcp4 	41486 	INFO [kea-dhcp4.leases.0x7f70e8815f00] DHCP4_LEASE_ALLOC [hwtype=1 00:18:61:59:56:75], cid=[01:00:18:61:59:56:75], tid=0xa77b51c9: lease 192.168.30.103 has been allocated for 7200 seconds
                            Nov 21 14:43:53 	kea-dhcp4 	41486 	INFO [kea-dhcp4.leases.0x7f70e8818200] DHCP4_LEASE_ALLOC [hwtype=1 00:18:61:59:56:75], cid=[01:00:18:61:59:56:75], tid=0xa77b51c9: lease 192.168.30.103 has been allocated for 7200 seconds
                            Nov 21 13:43:53 	kea-dhcp4 	41486 	INFO [kea-dhcp4.leases.0x7f70e8817400] DHCP4_LEASE_ALLOC [hwtype=1 00:18:61:59:56:75], cid=[01:00:18:61:59:56:75], tid=0xa77b51c9: lease 192.168.30.103 has been allocated for 7200 seconds
                            Nov 21 12:43:53 	kea-dhcp4 	41486 	INFO [kea-dhcp4.leases.0x7f70e8816600] DHCP4_LEASE_ALLOC [hwtype=1 00:18:61:59:56:75], cid=[01:00:18:61:59:56:75], tid=0xa77b51c9: lease 192.168.30.103 has been allocated for 7200 seconds
                            Nov 21 11:43:52 	dhcpd 	89986 	DHCPACK on 192.168.30.103 to 00:18:61:59:56:75 via mvneta0.30
                            Nov 21 11:43:52 	dhcpd 	89986 	DHCPREQUEST for 192.168.30.103 from 00:18:61:59:56:75 via mvneta0.30
                            Nov 21 11:43:52 	dhcpd 	89986 	DHCPRELEASE of 192.168.30.103 from 00:18:61:59:56:75 via mvneta0.30 (not found)
                            Nov 21 11:33:08 	dhcpd 	89986 	DHCPACK on 192.168.30.103 to 00:18:61:59:56:75 via mvneta0.30
                            Nov 21 11:33:08 	dhcpd 	89986 	DHCPREQUEST for 192.168.30.103 from 00:18:61:59:56:75 via mvneta0.30
                            Nov 21 11:31:15 	dhcpd 	89986 	DHCPACK on 192.168.30.103 to 00:18:61:59:56:75 via mvneta0.30
                            Nov 21 11:31:15 	dhcpd 	89986 	DHCPREQUEST for 192.168.30.103 from 00:18:61:59:56:75 via mvneta0.30
                            Nov 21 11:29:22 	dhcpd 	89986 	DHCPACK on 192.168.30.103 to 00:18:61:59:56:75 via mvneta0.30
                            Nov 21 11:29:22 	dhcpd 	89986 	DHCPREQUEST for 192.168.30.103 from 00:18:61:59:56:75 via mvneta0.30
                            Nov 21 11:25:38 	dhcpd 	89986 	DHCPACK on 192.168.30.103 to 00:18:61:59:56:75 via mvneta0.30
                            Nov 21 11:25:38 	dhcpd 	89986 	DHCPREQUEST for 192.168.30.103 from 00:18:61:59:56:75 via mvneta0.30 
                            

                            This certainly supports the theory there is something wrong with the 23.09 ISC DHCP server. This is a weird regression. I wouldn't think the ISC binary would have been fiddled with during the 23.09 development given it's being retired. Someone in the previously quoted forum threads replaced their 23.09 ISC binary with a 23.05.1 binary and their problem stopped.

                            Did something go wrong in the version control system for this module in the 23.09 release?

                            Something I don't quite get looking at this log why only the Ooma box is logging a lease reallocation every hour. The .30 network alone has half a dozen devices on it, never mind the other five VLANs.

                            So far the Kea server has been working fine, so I imagine I'll stay with it unless something malfunctions.

                            S 1 Reply Last reply Reply Quote 0
                            • S
                              SteveITS Galactic Empire @Mission-Ghost
                              last edited by

                              @Mission-Ghost said in 23.09 Unbound killed failing to reclaim memory:

                              something wrong with the 23.09 ISC DHCP server

                              It's noted in the redmine but the file did change even though the version number didn't. Per the redmine reporters it seems the updated program responds on a random port. Some devices don't particularly care, apparently, and others ignore the response, I suppose until happens to use the correct port, or a port to which it pays attention.

                              Overall some devices retry quite a bit at each renewal, and per reports others try and try but eventually lose their lease.

                              Also annoyingly, I found some IoT type devices renew whenever they heck they want to (e.g. 10 minutes not 1 day) so the log spam can be high in some cases.

                              For me at home, I don't need the things Kea DHCP doesn't have so switched to Kea.

                              This is all in forum thread https://forum.netgate.com/topic/184104/dhcp-weirdness-after-23-09-upgrade/ :)

                              Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                              When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                              Upvote 👍 helpful posts!

                              M 1 Reply Last reply Reply Quote 0
                              • M
                                Mission-Ghost @SteveITS
                                last edited by

                                @SteveITS indeed, this seems key: "This seems to imply the intent of RFC2131 was that a fixed source port be used, at least for relays. The new behavior doesn't comply with this."

                                Seems like a code review didn't happen or wasn't as detail-oriented as it should have been before someone allowed this new code to be committed. And in a module that's deprecated anyway? Why bother, and then breaking it now, at the end of life?

                                1 Reply Last reply Reply Quote 0
                                • J jrey referenced this topic on
                                • T
                                  tman222 @jimp
                                  last edited by tman222

                                  @jimp said in 23.09 Unbound killed failing to reclaim memory:

                                  Might not be related but there is a newer version of unbound up now for 23.09 (1.18.0_1) which fixes a bug where unbound could get stuck, which may manifest itself in different ways.

                                  See https://redmine.pfsense.org/issues/14980 for details there.

                                  On 23.09 you can pull in the new version with pkg-static update; pkg-static upgrade -y unbound and then after that make sure you save/apply in the DNS Resolver settings (or reboot) to make sure the new daemon is running.

                                  Quick follow up question on this: I upgraded the Unbound package from 1.18.0 to 1.18.0_1, but the Unbound logs and unbound-control -h still show the version as 1.18.0, even after rebooting. Is that expected or should I be seeing 1.18.0_1 instead? Thanks in advance.

                                  jimpJ 1 Reply Last reply Reply Quote 0
                                  • M
                                    Mission-Ghost @stephenw10
                                    last edited by

                                    @stephenw10 so now netstat has started being killed off for not reclaiming memory:

                                    
                                    Nov 26 17:39:38	kernel		pid 44612 (netstat), jid 0, uid 0, was killed: failed to reclaim memory
                                    Nov 25 11:29:01	kernel		pid 92717 (netstat), jid 0, uid 0, was killed: failed to reclaim memory
                                    Nov 24 18:18:39	kernel		pid 81047 (netstat), jid 0, uid 0, was killed: failed to reclaim memory
                                    Nov 22 12:48:27	kernel		pid 11499 (netstat), jid 0, uid 0, was killed: failed to reclaim memory
                                    
                                    

                                    I can’t tell where or how it’s run or what parameters affect it. Can you please point me in the right direction to troubleshoot?

                                    Thanks!

                                    1 Reply Last reply Reply Quote 0
                                    • jimpJ
                                      jimp Rebel Alliance Developer Netgate @tman222
                                      last edited by

                                      @tman222 said in 23.09 Unbound killed failing to reclaim memory:

                                      Quick follow up question on this: I upgraded the Unbound package from 1.18.0 to 1.18.0_1, but the Unbound logs and unbound-control -h still show the version as 1.18.0, even after rebooting. Is that expected or should I be seeing 1.18.0_1 instead? Thanks in advance.

                                      The _1 bit is from the FreeBSD package, it's not part of the Unbound version string. The _1 version of the package added a patch to address an Unbound bug but it did not increment the Unbound version.

                                      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                      Need help fast? Netgate Global Support!

                                      Do not Chat/PM for help!

                                      M 1 Reply Last reply Reply Quote 0
                                      • J jrey referenced this topic on
                                      • M
                                        Mission-Ghost @jimp
                                        last edited by

                                        @jimp The Unbound crash happened again today. The Unbound crash has not happened in months, particularly since reducing memory size parameters. It's been so long, in fact, that I removed service watchdog a week or two ago, thinking the issue was resolved. So much for that.

                                        Here's various symptoms:

                                        1. The only relevant error message I found in the System>General log is:
                                        Nov 24 10:57:21 	kernel 		pid 54097 (unbound), jid 0, uid 59, was killed: a thread waited too long to allocate a page 
                                        

                                        Note this error is different than those in the past where Unbound was killed failing to reclaim memory. End result is the same: dead Unbound and dead production on my network (without service_watchdog, which I have now restored to service).

                                        1. I haven't found any relevant messages in the Unbound logs.

                                        2. The Status>Monitoring>System>Memory shows a puzzling zeroing of all parameters at about the same time as the Unbound crash:

                                        f65f0225-4352-4253-801a-02172f323524-image.png

                                        So, while I've been admonished in this forum to not use service_watchdog, I can't maintain production uptime without while these Unbound discrepancies live on.

                                        If there's something more I can do to assist Netgate in figuring this out, please let me know. I'll be happy to do whatever I'm able.

                                        Thanks!

                                        1 Reply Last reply Reply Quote 0
                                        • First post
                                          Last post
                                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.