Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    KEA service stopping through the day

    Scheduled Pinned Locked Moved DHCP and DNS
    43 Posts 16 Posters 7.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      DavidIr @DavidIr
      last edited by

      @DavidIr Hi @marcosm any joy looking at the dump files?

      I have implemented the changes in https://forum.netgate.com/post/1199521 for now in the hope this will restart the DHCP service when it fails, but would love to understand what's going on and help solve the potentially wider issue.

      Thank you

      M 1 Reply Last reply Reply Quote 0
      • M
        marcosm Netgate @DavidIr
        last edited by

        @DavidIr It would help to have some additional info about the system. You can get that by going to /status.php.

        D 1 Reply Last reply Reply Quote 0
        • D
          DavidIr @marcosm
          last edited by DavidIr

          @marcosm status_output.tgz uploaded to the same link provided above.

          Since the previous messages I have installed and configured the Service Watchdog plugin

          D 1 Reply Last reply Reply Quote 0
          • D
            DavidIr @DavidIr
            last edited by

            In case you need any additional info I am now on holiday until Jan 5th so will not see or be able to respond to any posts or requests for info until I return.

            1 Reply Last reply Reply Quote 0
            • R
              rafal.arciszewski
              last edited by rafal.arciszewski

              Hi
              this weekend the core dump happened exactly the same in my Netgate 3100.

              70d2d633-116f-4776-834f-d9242c2ff14c-obraz.png

              c14717b7-180e-43f9-b5e0-2c01a931a165-obraz.png

              I wonder if there is a solution for this problem?

              Regards,

              GertjanG 1 Reply Last reply Reply Quote 0
              • GertjanG
                Gertjan @rafal.arciszewski
                last edited by

                @rafal-arciszewski

                "Good news" is that the reason of the core dump was a signal 6, which means the process itself has chosen to 'pull the brakes', most probably because resources were missing, like not enough RAM to name one.

                No "help me" PM's please. Use the forum, the community will thank you.
                Edit : and where are the logs ??

                cmcdonaldC 1 Reply Last reply Reply Quote 0
                • cmcdonaldC
                  cmcdonald Netgate Developer @Gertjan
                  last edited by cmcdonald

                  @Gertjan said in KEA service stopping through the day:

                  @rafal-arciszewski

                  "Good news" is that the reason of the core dump was a signal 6, which means the process itself has chosen to 'pull the brakes', most probably because resources were missing, like not enough RAM to name one.

                  Yes, heap corruption in this case. This is turning into quite the rabbit hole. Unfortunately, this looks like an issue deeper than Kea, like failure in libcxxrt or jemalloc. We've got some test hardware setup with some additional logging and tuning to jemalloc to try to get a better view of the state of the world before the abort. But the core dump is gnarly, the heap is trashed. The effort required to fix this might be out of scope for an EOL platform, both for us and for upstream. Will know more soon.

                  Need help fast? https://www.netgate.com/support

                  D 1 Reply Last reply Reply Quote 1
                  • D
                    DavidIr @cmcdonald
                    last edited by DavidIr

                    @cmcdonald Thank you for looking into this. I was hoping that my submissions would help others, but sounds quite challenging, and as you say the EOL hardware (and no doubt the additional challenges of it running 32bit) may bring an end to the investigations.

                    If I can contribute anything to help let me know.

                    Not sure if this is relevant or helpful but I do seem to have managed to reduce the frequency of the service failing by removing NUT from the box (which was having issues talking to my UPS on USB port), although this may be an unrelated correlation rather than anything linked.

                    R 1 Reply Last reply Reply Quote 1
                    • R
                      rafal.arciszewski @DavidIr
                      last edited by

                      @DavidIr That sounds interesting. I also installed NUT package recently. Maybe it is correlated?
                      I will uninstall it just in case.

                      D 1 Reply Last reply Reply Quote 0
                      • D
                        DavidIr @rafal.arciszewski
                        last edited by

                        @rafal-arciszewski There is no evidence of a connection to NUT, was only an idea I had last night, don't read too much into that bit. was more the post from cmcdonald who is trying to analyse the error at a deep technical level.

                        R 1 Reply Last reply Reply Quote 0
                        • R
                          rafal.arciszewski @DavidIr
                          last edited by

                          @DavidIr unfortunately kea crashed again even after NUT was unistalled. This time the signal was 11.

                          Feb  3 13:03:52 netgate kernel: pid 66599 (kea-dhcp4), jid 0, uid 0: exited on signal 11 (core dumped)
                          
                          [24.11-RELEASE][admin@netgate.rnd.testlab]/root: ls -lHa *.core
                          -rw-------  1 root wheel  2891776 Jul 29  2024 bc.core
                          -rw-------  1 root wheel 12488704 Feb  3 13:03 kea-dhcp4.core
                          -rw-------  1 root wheel 94887936 Aug 23  2023 php-fpm.core
                          [24.11-RELEASE][admin@netgate.rnd.testlab]/root:
                          
                          GertjanG 1 Reply Last reply Reply Quote 0
                          • GertjanG
                            Gertjan @rafal.arciszewski
                            last edited by

                            @rafal-arciszewski

                            As you have a 3100, 'arm' base (32 bits !?) you would be way better of using ISC for the moment.

                            No "help me" PM's please. Use the forum, the community will thank you.
                            Edit : and where are the logs ??

                            1 Reply Last reply Reply Quote 1
                            • P
                              propeto13
                              last edited by

                              before:
                              093a4fbb-563a-4a96-89e6-703fd376e06c-Screenshot 2025-02-05 083421.png

                              Diagnostics > Command Prompt >

                              Execute Shell Command

                              rm /tmp/kea4-ctrl-socket.lock
                              

                              Back to Dashboard and START the kea-dhcp4 service

                              after:
                              53e5cfe7-8d1c-4d72-b5bc-6fb1c4844e90-image.png

                              -this is the way.

                              GertjanG 1 Reply Last reply Reply Quote 0
                              • GertjanG
                                Gertjan @propeto13
                                last edited by

                                @propeto13 said in KEA service stopping through the day:

                                this is the way.

                                Its 'a' way.
                                If the /tmp/kea4-ctrl-socket.lock exist, or, as seen here on the forum about kea related posts, the pid file exists when kea starts, it will not core dump, but simply refuse to start.
                                And it's normal that these files exist, as 'core-dumping' isn't a clean process exist, so these files remain in place = not good.
                                And you can't start the process kea anymore without manually deleting them.
                                I think there is a Netgate pfSense System patches (you have this package, right ?) patch that handles this issue.

                                Ones thse files are gone, you can start kea.
                                And then, suddenly, it core dumps .... and it's rinse-and-repaet time.

                                No "help me" PM's please. Use the forum, the community will thank you.
                                Edit : and where are the logs ??

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.