Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SG3100 needs to reboot every few days after 2.4.4 upgrade

    Scheduled Pinned Locked Moved General pfSense Questions
    23 Posts 5 Posters 1.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      maroon2019 @bbrendon
      last edited by

      @bbrendon said in SG3100 needs to reboot every few days after 2.4.4 upgrade:

      Are you sure DNS isn't broken but IP is still working? I recently had OOM killer go after unbound on a sg3100 because my DNSBL was too big.

      Also I'd try disabling or uninstalling some of the packages.

      Nope, my RAM is only 6% utilized and I don't see any evidence of Unbound dying. The only package I have installed is Traffic Totals.

      1 Reply Last reply Reply Quote 0
      • O
        onnastick @bbrendon
        last edited by

        @bbrendon No I am not sure of that at all.

        Just an update, it happened again, and I noticed that it happened pretty much exactly 4 days after the last one.

        Just in case it's related to DNS, I went poking around my Resolver settings. In previous releases, back when it wasn't exposed in the settings list, I had set some custom options in the resolver to use TLS for outbound queries to my forward targets. So I went ahead and removed all custom options, enabled forwarding mode, and checked the new box "Use SSL/TLS for outgoing DNS Queries to Forwarding Servers." I don't think I am using DNSBL, as that is part of pfBlockerNG is it not? My only installed packages are ntopng and snort. Though I admit the problem could be related to something leaking... even though it never happened before 2.4.4.

        Any general advice as to exactly WHAT I should do to capture logs once the problem starts happening, if it happens again?

        1 Reply Last reply Reply Quote 0
        • M
          maroon2019
          last edited by

          Another update on my end - I disabled the one OpenVPN client I have running and so far my WAN has been stable. The OpenVPN configuration does use push routes - is it possible that interruptions in OpenVPN connectivity could be affecting my WAN gateway at all?

          1 Reply Last reply Reply Quote 0
          • O
            onnastick
            last edited by

            I have another update to share. The problem happened again, after perhaps 6.5 days. It appears the problem isn't that connectivity goes away... DNS Resolver is dying. It appears there is some kind of memory leak in unbound when using DNS over TLS, but that is just an assumption. When I manually start the DNS Resolver service once it gets into this state, it starts working again with no reboot required. I have pasted the log entries below in case anyone has any insight to share about how to resolve this. No pun intended.

            Last 50 DNS Resolver Log Entries. (Maximum 50)
            Oct 21 12:12:25 	unbound 	71561:1 	notice: ssl handshake failed 1.1.1.1 port 853
            Oct 21 12:12:25 	unbound 	71561:1 	error: ssl handshake failed crypto error:1409C041:SSL routines:ssl3_setup_read_buffer:malloc failure
            Oct 21 12:12:25 	unbound 	71561:1 	notice: ssl handshake failed 1.1.1.1 port 853
            Oct 21 12:12:25 	unbound 	71561:1 	error: ssl handshake failed crypto error:1409C041:SSL routines:ssl3_setup_read_buffer:malloc failure
            Oct 21 12:12:25 	unbound 	71561:1 	notice: ssl handshake failed 1.1.1.1 port 853
            Oct 21 12:12:46 	unbound 	71561:1 	error: ssl handshake failed crypto error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure
            Oct 21 12:12:46 	unbound 	71561:1 	error: and additionally crypto error:0607B041:digital envelope routines:EVP_CipherInit_ex:malloc failure
            Oct 21 12:12:46 	unbound 	71561:1 	error: and additionally crypto error:140D1044:SSL routines:tls1_change_cipher_state:internal error
            Oct 21 12:12:46 	unbound 	71561:1 	notice: ssl handshake failed 1.1.1.1 port 853
            Oct 21 12:12:46 	unbound 	71561:1 	error: ssl handshake failed crypto error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure
            Oct 21 12:12:46 	unbound 	71561:1 	error: and additionally crypto error:0607B041:digital envelope routines:EVP_CipherInit_ex:malloc failure
            Oct 21 12:12:46 	unbound 	71561:1 	error: and additionally crypto error:140D1044:SSL routines:tls1_change_cipher_state:internal error
            Oct 21 12:12:46 	unbound 	71561:1 	notice: ssl handshake failed 1.1.1.1 port 853
            Oct 21 12:12:46 	unbound 	71561:1 	error: ssl handshake failed crypto error:07069041:memory buffer routines:BUF_MEM_grow_clean:malloc failure
            Oct 21 12:12:46 	unbound 	71561:1 	error: and additionally crypto error:0607B041:digital envelope routines:EVP_CipherInit_ex:malloc failure
            Oct 21 12:12:46 	unbound 	71561:1 	error: and additionally crypto error:140D1044:SSL routines:tls1_change_cipher_state:internal error
            Oct 21 12:12:46 	unbound 	71561:1 	notice: ssl handshake failed 1.1.1.1 port 853
            Oct 21 12:12:48 	unbound 	71561:0 	info: service stopped (unbound 1.7.3).
            Oct 21 12:12:48 	unbound 	71561:0 	info: server stats for thread 0: 92 queries, 16 answers from cache, 76 recursions, 0 prefetch, 0 rejected by ip ratelimiting
            Oct 21 12:12:48 	unbound 	71561:0 	info: server stats for thread 0: requestlist max 2 avg 0.276316 exceeded 0 jostled 0
            Oct 21 12:12:48 	unbound 	71561:0 	info: average recursion processing time 1.013904 sec
            Oct 21 12:12:48 	unbound 	71561:0 	info: histogram of recursion processing times
            Oct 21 12:12:48 	unbound 	71561:0 	info: [25%]=0.049152 median[50%]=0.109227 [75%]=0.220753
            Oct 21 12:12:48 	unbound 	71561:0 	info: lower(secs) upper(secs) recursions
            Oct 21 12:12:48 	unbound 	71561:0 	info: 0.000000 0.000001 12
            Oct 21 12:12:48 	unbound 	71561:0 	info: 0.032768 0.065536 14
            Oct 21 12:12:48 	unbound 	71561:0 	info: 0.065536 0.131072 18
            Oct 21 12:12:48 	unbound 	71561:0 	info: 0.131072 0.262144 19
            Oct 21 12:12:48 	unbound 	71561:0 	info: 0.262144 0.524288 9
            Oct 21 12:12:48 	unbound 	71561:0 	info: 16.000000 32.000000 4
            Oct 21 12:12:48 	unbound 	71561:0 	info: server stats for thread 1: 50 queries, 15 answers from cache, 35 recursions, 0 prefetch, 0 rejected by ip ratelimiting
            Oct 21 12:12:48 	unbound 	71561:0 	info: server stats for thread 1: requestlist max 1 avg 0.171429 exceeded 0 jostled 0
            Oct 21 12:12:48 	unbound 	71561:0 	info: average recursion processing time 1.271249 sec
            Oct 21 12:12:48 	unbound 	71561:0 	info: histogram of recursion processing times
            Oct 21 12:12:48 	unbound 	71561:0 	info: [25%]=0.053248 median[50%]=0.155648 [75%]=0.32768
            Oct 21 12:12:48 	unbound 	71561:0 	info: lower(secs) upper(secs) recursions
            Oct 21 12:12:48 	unbound 	71561:0 	info: 0.000000 0.000001 4
            Oct 21 12:12:48 	unbound 	71561:0 	info: 0.016384 0.032768 1
            Oct 21 12:12:48 	unbound 	71561:0 	info: 0.032768 0.065536 6
            Oct 21 12:12:48 	unbound 	71561:0 	info: 0.065536 0.131072 5
            Oct 21 12:12:48 	unbound 	71561:0 	info: 0.131072 0.262144 8
            Oct 21 12:12:48 	unbound 	71561:0 	info: 0.262144 0.524288 9
            Oct 21 12:12:48 	unbound 	71561:0 	info: 16.000000 32.000000 2
            Oct 21 12:12:48 	unbound 	71561:0 	notice: Restart of unbound 1.7.3.
            Oct 21 12:12:48 	unbound 	71561:0 	notice: init module 0: validator
            Oct 21 12:12:48 	unbound 	71561:0 	notice: init module 1: iterator
            Oct 21 12:12:48 	unbound 	71561:0 	error: malloc failed
            Oct 21 12:12:48 	unbound 	71561:0 	error: could not create outgoing sockets
            Oct 21 12:12:48 	unbound 	71561:0 	error: ./util/alloc.c at 167 could not pthread_spin_destroy(&alloc->lock): Invalid argument
            Oct 21 12:12:48 	unbound 	71561:0 	fatal error: Could not initialize main thread
            
            1 Reply Last reply Reply Quote 0
            • jimpJ
              jimp Rebel Alliance Developer Netgate
              last edited by jimp

              From the logs that definitely looks like a memory issue.

              I see they released unbound 1.8.1 earlier this month and it has some memory leak fixes. We'll get that pulled in soon, it will be in -p1.

              EDIT: Issue for tracking: https://redmine.pfsense.org/issues/9059 (also has a link to the unbound release notes)

              Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

              Need help fast? Netgate Global Support!

              Do not Chat/PM for help!

              1 Reply Last reply Reply Quote 0
              • B
                bbrendon
                last edited by

                So maybe my DNSBL wasn’t too big. I guess I’m still leaking memory and it’s just going to take longer to fail since I’m not using as much memory now. Interesting.

                jimpJ 1 Reply Last reply Reply Quote 0
                • jimpJ
                  jimp Rebel Alliance Developer Netgate @bbrendon
                  last edited by

                  @bbrendon said in SG3100 needs to reboot every few days after 2.4.4 upgrade:

                  So maybe my DNSBL wasn’t too big. I guess I’m still leaking memory and it’s just going to take longer to fail since I’m not using as much memory now. Interesting.

                  They fixed several leaks, from the release notes:

                  A memory leak in the TLS lookup code is fixed. Leaked requests in the requestlist are fixed.

                  • free memory leaks in config strlist and str2list insert functions.
                  • Free memory leak in config strlist append.
                  • Fix memory leak when message parse fails partway through copy.

                  So it is possible if you have an issue with unbound that happens over time, it may be due to a memory leak.

                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                  Need help fast? Netgate Global Support!

                  Do not Chat/PM for help!

                  1 Reply Last reply Reply Quote 0
                  • ?
                    A Former User
                    last edited by

                    Is there a way to restart unbound every 24 hours to try and mitigate the problem until 2.4.4-p1?
                    Thanks!

                    1 Reply Last reply Reply Quote 0
                    • jimpJ
                      jimp Rebel Alliance Developer Netgate
                      last edited by

                      You can install the cron package and then make a cron entry that periodically does /usr/local/sbin/pfSsh.php playback svc restart unbound on whatever schedule you want.

                      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                      Need help fast? Netgate Global Support!

                      Do not Chat/PM for help!

                      1 Reply Last reply Reply Quote 0
                      • O
                        onnastick
                        last edited by

                        I don't know if I just caught it at a funny time, but I just went to go look at the DNS Resolver log page again, but it looks like the service restarted itself multiple times within just the last few minutes. I wonder if it has been doing this all day? I am wondering if installing cron and having it restart periodically (say daily) is even worthwhile if it is restarting itself frequently anyway...

                        I have a mind to disable DNS over TLS for now and wait until 2.4.4-p1 comes out before I try to re-enable it. DNS privacy appeals to me, but in principle only - no real reasons to hide. It's not worth periodically losing DNS functionality. I haven't really decided to turn it off yet, but what do other people think about the trade-off?

                        1 Reply Last reply Reply Quote 0
                        • B
                          bbrendon
                          last edited by

                          I woke up without internet this morning :(

                          Is there a way to install unbound 1.8.1 manually?

                          ? 1 Reply Last reply Reply Quote 0
                          • ?
                            A Former User @bbrendon
                            last edited by

                            @bbrendon said in SG3100 needs to reboot every few days after 2.4.4 upgrade:

                            I woke up without internet this morning :(

                            Is there a way to install unbound 1.8.1 manually?

                            Why don't you follow the advice of @jimp above? Install the cron package, then add a restart of unbound every ~6 hours (or hour or whatever works best) as a mitigation?

                            B 1 Reply Last reply Reply Quote 0
                            • O
                              onnastick
                              last edited by

                              For what it's worth, I have checked back at random periods at the DNS Resolver log entries (but NOT when the problem occurred when I had to manually start it). The few times I checked the DNS Resolver log, I found each time that unbound had stopped (message like info: service stopped (unbound 1.7.3). ) a couple of times within a 5 minute period. I am not sure why it stops so often, and why sometimes it will stop and not start back on its own without my intervention. But I'm not sure adding a restart every hour or few hours would make a difference. If the service is stopping and starting on its own fairly often, one would think adding a cron-based restart wouldn't make a big difference.

                              1 Reply Last reply Reply Quote 0
                              • B
                                bbrendon @A Former User
                                last edited by

                                @muppet said in SG3100 needs to reboot every few days after 2.4.4 upgrade:

                                restart of unbound every ~6 hours

                                I could but that's janky.

                                1 Reply Last reply Reply Quote 0
                                • ?
                                  A Former User
                                  last edited by A Former User

                                  Edit: Updated to remove the bit about it being unsupported.

                                  Unbound 1.8.1 has been pushed to the 2.4.4 repo.
                                  If you do a "pkg update && pkg upgrade" you can find it. If you wanted to be more careful (always a good idea) you could do "pkg update && pkg upgrade unbound" then you'll only get an upgraded unbound and strongswan (ipsec package)
                                  KEEP IN MIND YOU'LL BE RUNNING A SEMI-UNSUPPORTED VERSION OF PFSENSE BY DOING THIS - So only do it if you really really need to. But it should fix your unbound problems.

                                  Thanks heaps to netcat6549 in the IRC channel for explaining some of the above to me.

                                  1 Reply Last reply Reply Quote 0
                                  • jimpJ
                                    jimp Rebel Alliance Developer Netgate
                                    last edited by

                                    That is not unsupported. We pushed it out specifically to address the unbound memory leaks. It's perfectly fine to run it either way, whichever way works better for someone.

                                    In fact if someone were to upgrade to 2.4.4 today, they'd automatically pull in the new version.

                                    tl;dr: If you have problems with unbound eating memory, upgrade, don't suffer until -p1, even if it is coming soon.

                                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                    Need help fast? Netgate Global Support!

                                    Do not Chat/PM for help!

                                    ? 1 Reply Last reply Reply Quote 0
                                    • ?
                                      A Former User @jimp
                                      last edited by

                                      @jimp said in SG3100 needs to reboot every few days after 2.4.4 upgrade:

                                      That is not unsupported. We pushed it out specifically to address the unbound memory leaks. It's perfectly fine to run it either way, whichever way works better for someone.

                                      In fact if someone were to upgrade to 2.4.4 today, they'd automatically pull in the new version.

                                      tl;dr: If you have problems with unbound eating memory, upgrade, don't suffer until -p1, even if it is coming soon.

                                      Ahhh ok, thank you
                                      I was asking in the IRC channel about how "supported" the pkg update/pkg upgrade commands were. The feedback I got was that it would be an unsupported thing to do. Thank you for clarification!

                                      1 Reply Last reply Reply Quote 0
                                      • jimpJ
                                        jimp Rebel Alliance Developer Netgate
                                        last edited by

                                        I could see how someone might think that, but in this case we put out some updates for a few issues that some had concerns about (unbound, strongSwan, curl, libssh) and if someone needs to, they can update by hand. We don't do that often, but in this case the issues warranted an OOB update of that nature.

                                        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                        Need help fast? Netgate Global Support!

                                        Do not Chat/PM for help!

                                        1 Reply Last reply Reply Quote 1
                                        • First post
                                          Last post
                                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.