KEA DHCP ERROR - Service stopped
-
Hi,
Netgate 4200 Pfsense+ 24.03-RELEASE
KEA DHCP service stops working, can't be restarted.
In the log I see these messages:
2024-09-12 09:00:36.681328+02:00 kea-dhcp4 22718 ERROR [kea-dhcp4.dhcp4.0x3c1af2412000] DHCP4_INIT_FAIL failed to initialize Kea server: configuration error using file '/usr/local/etc/kea/kea-dhcp4.conf': cannot lock socket lockfile, /tmp/kea4-ctrl-socket.lock, : Resource temporarily unavailable 2024-09-12 09:00:36.681146+02:00 kea-dhcp4 22718 ERROR [kea-dhcp4.dhcp4.0x3c1af2412000] DHCP4_CONFIG_LOAD_FAIL configuration error using file: /usr/local/etc/kea/kea-dhcp4.conf, reason: cannot lock socket lockfile, /tmp/kea4-ctrl-socket.lock, : Resource temporarily unavailable 2024-09-12 09:00:36.680925+02:00 kea-dhcp4 22718 ERROR [kea-dhcp4.dhcp4.0x3c1af2412000] DHCP4_PARSER_COMMIT_FAIL parser failed to commit changes: cannot lock socket lockfile, /tmp/kea4-ctrl-socket.lock, : Resource temporarily unavailable
Leases are still handed out.
2024-09-12 09:14:03.486823+02:00 kea-dhcp4 37370 INFO [kea-dhcp4.leases.0x2f57b9416d00] DHCP4_LEASE_ADVERT [hwtype=1 02:42:a7:8d:f7:16], cid=[no info], tid=0xe94a2e9e: lease 10.0.11.6 will be advertised 2024-09-12 09:14:03.486429+02:00 kea-dhcp4 37370 INFO [kea-dhcp4.dhcpsrv.0x2f57b9416d00] EVAL_RESULT Expression pool_opt2_0 evaluated to 1 2024-09-12 09:14:03.486326+02:00 kea-dhcp4 37370 INFO [kea-dhcp4.dhcpsrv.0x2f57b9416d00] EVAL_RESULT Expression pool_lan_0 evaluated to 1
what are these numbers?
kea-dhcp4.leases.0x2f57b9416d00
kea-dhcp4.dhcp4.0x3c1af2412000Any idea ?
How do you guys monitor the services? I would like to get notified if a service goes down.
-
@sdi7 said in KEA DHCP ERROR - Service stopped:
kea4-ctrl-socket.lock
Pick any : kea4-ctrl-socket.lock
@sdi7 said in KEA DHCP ERROR - Service stopped:
How do you guys monitor the services?
For me : when I don't fool around with pfSense, it and everything in it keeps on humming for weeks if not months ...
And by any chance, if you've kids using your network, they will take care of the monitoring part for you.
Colleges @work : same thing with somewhat more delay. If the coffee machine gets (to) crowed : check the Internet connection.For now, switch to ISC-DHCP.
And before you ask : ISC-DHCP will do just fine for now. Before 2024 is over, a newer pfSense will take care of any existing issue known. -
@Gertjan said in KEA DHCP ERROR - Service stopped:
For me : when I don't fool around with pfSense, it and everything in it keeps on humming for weeks if not months ...
Yep, should do so. Just wondering why I get these KEA errors, as KEA is in production with Netgate.
@Gertjan said in KEA DHCP ERROR - Service stopped:
For now, switch to ISC-DHCP.
Do you know, if and how I can just revert back to ISC? Without loosing any settings eventually?
With the next update Netgate it seems will bring some updates/enhancements with KEA. Also dns hosts will be registerered with DHCP again (as with ISC). Let's see and hope.
-
@sdi7 said in KEA DHCP ERROR - Service stopped:
Do you know, if and how I can just revert back to ISC? Without loosing any settings eventually?
Sure :
One click - one check, and a Save at the bottom of the page : -
Is anybody else really fed up and annoyed that an essential service. A core component of a firewall product. Has been, and still remains, broken. Unfixed still, after at least a year.
Of which was 'pushed'. Or at least encouraged. In a mass transition away from legacy. Which actually worked always and without issue.
Throw "/tmp/kea4-ctrl-socket.lock" into a search engine. It's just thread after thread after thread of people reporting the same thing.
What's being done to patch this? I'm buying into paid offerings, which including buying Netgate hardware. And something as frustrating and serious as this is still screwing up in my networks. After this long.
Imagine using an enterprise level product and having to go run hacky rm commands via prompt every 2 days because the DHCP service kills itself. Your clients unable to obtain access on to the network until you go login and run that hacky fix. For over a year. And affecting everyone.
There's a patch system in my Netgate+. Where's the patch please? Can it be a priority maybe..? That'd be appreciated.
Personally, I couldn't imagine a worse situation than shipping a firewall product with an unresolved critical service bug and not resolving it immediately. But maybe that's just me.
Its been long enough. I'm allowed to be frustrated. No I won't apologise for it either.
-
it's funny (well not really) because when I mentioned something similar about switching back to ISC on another thread and as is mentioned on so many other threads, the almost immediate response was:
@cmcdonald said in Repeating problem: (unbound), jid 0, uid 59, was killed: failed to reclaim memory:
@jrey said in Repeating problem: (unbound), jid 0, uid 59, was killed: failed to reclaim memory:
reason unbound could be restarting is because you are using the Kea DHCP server. There are numerous threads on why in the current release causes unbound to restart. For now switch to ISC DHCP, and stop the Kea release from restarting unbound every time a client adds a DHCP lease. if that is your case
This it not accurate. We have only introduced Kea integration with Unbound in the upcoming 24.08 release.
And yet here we are with yet another thread similar problems and proposed solution, switch to ICS
-
@jrey Can be justified away as much as those who will to do so. The reality is, the pfSense+ image sent to me by Netgate, which I flashed onto my 6100 back in December, most certainly has Kea as the primary DHCP service. AKA the default.
And that's all that matters.
Almost can't let it go. Imagine nginx or apache just stopped serving webpages until you logged in to SSH and unf****d it. That's the comparative as I see it. Why is this not getting the pull it should? That it demands?
-
I get your point, me too, I consider 'DHCP' pretty 'core', so it should work out of the box.
But, I'm somewhat privileged to, as I was there then this page was published.
That information isn't core 'pfSense', it's pretty core-entire-world.
This was decided a couple of years ago.
( there wasn't much of a reaction out there back then .... although I said to myself : "brace for impact ...." - but that didn't come. ISC are experts **)So, worlds one and only (I exaggerate) DHCP service author announced that they are fed maintain the good old "ISC DHCP".
They, ISC, decided to rewrite the thing from the ground up. You know the name : kea.kea was introduced in pfSense in the Plus since 23.0x ? as preview ware (that's how I see it) - and they even made it 'by default'.
Not great, as there were a lot of gotchas. Way to many "don't use kea if you need anything 'special').So, imho, kea isn't really a perfect option today with 24.03. But no problem, DHCP is still there.
Netgate should have made kea as an opt-in option, after the user read and sign of on this page FIRST.
Guess what : not everybody can read.
But the info was there :(this image shows a more recent page, same subject)
I showed you the solution. Get ISC back and case closed.
This week, Netgate programmers posted about the progresses been made about the implementation of kea, and that things are tested right now.
And when it comes out, 24.08 ? there will be no bugs.
And we will find bugs as one millions users can test way more quickly as a couple of programmers and beta testers.I switched back to ISC because I'm testing for myself things on my LANs and I needed some DHCP OPTIONS.
Kea with pfSense doesn't offer any options, so I had to go back.I did use kea for a month or so, for a company btw, about 50 devices. and I saw the "/tmp/kea4-ctrl-socket.lock" bug thing right in the beginning, understood what, a real "to stupid to be true" and 'solved' the issue myself (by ditching the "/tmp/kea4-ctrl-socket.lock" file before the process was started = editing the startup script) and called it a day.
Note : Like you, I'm just a user, my opinion, etc etc. "I'm a plumber, I have my tools and I use them if needed".
** wait : ISC also wrote bind. bind is even bigger as DHCPd. bind probably serves half the planet right now, and very few know about it. bind is a corner stone of the Internet.
bind, also known with the name named from name daemon, is what DNS is today.
I'll better keep on watching this one.edit :
@banalo said in KEA DHCP ERROR - Service stopped:
Imagine nginx or apache just stopped serving webpages until you
Oh .. like : wtf, like there are still humans @Netgate, that make errors and so ?
Remember the big shot system admin that made a 'simple' DNS error @facebook ?
facebook went of the Internet for hours - their entire AS vanished ?Or Microsoft pushed out an update that broke every DHCP client on every Microsoft Windows PC (I don't recall which version).
I'll let your imagination work so you get what this did for 'users' ^^My points : yep, people make errors ... it should be forbidden.
-
@Gertjan said in KEA DHCP ERROR - Service stopped:
Oh .. like : wtf, like there are still humans @Netgate, that make errors and so ?
This isn't really about making errors. Everybody makes mistakes. It's human. In fact, I suggest that making mistakes is a good thing in life generally.
But that's kinda not what I'm getting at here. What I'm getting at is - a default service and its critical bug that has been left virtually unacknowledged and unpatched, seemingly without priority, for a long time.
My position stems from disappointment more than anything. It always helps to place yourself in to the shoes of a prospect user. Somebody (everyone) that spins up pfSense for the first time to give a go at running what is undeniably a rock solid enterprise solution otherwise, very immediately comes across a gaping bug that stops their clients being allowed on to the network with dhcp. Something a $30 home router manages to accomplish vs the Kea bug situation. Requiring a newbie user to go manually troubleshoot and apply command line temp fixes regularly. For a year and ongoing at this point. That's not remotely what any user experience should be experiencing. Especially considering new users classes who have been undoubtedly been turned off or away by this.
This can accomplish the 'break' in make or break very easily for swarms of people. Perhaps just stick with UniFi or Untangle or consumer, whatever it may be. That's not the experience I'd want somebody taking away when trying pfSense. It deserves to shine as it otherwise does.
If I want to install Ghost to create a blogging site, just say. But the installer has a bug and stops the setup process before succeeding. Requiring manual intervention to fix. I'm looking for the next best thing, as its ruined it right off the bat for me.
-
Not disagreeing with you. I understand the frustration.
As I said in the thread referenced I don't do ANY DHCP on the netgate.
However that fact that the situation is similar here and the responses to the problem are similar that that response I got specifically from a Developer says to me that there has been a shift to denial. Yet no alternate solution has ever been suggested or provided.
The solution of turning off KEA and going back to ICS still being mentioned on many threads by many people (yours included) seems to resolve the issue for most, if not all, people that have issue.
Should KEA have been introduced in 24.03 -- likely NOT
Should KEA have been made the default (I can't say that it was, again I don't use DHCP on the device at ALL) but the answer here should be - Certainly NOTI think others on the others on the thread I referenced (including people who represent Netgate, and are respected) still frequently suggest the same thing. Switch to ICS and wait for 24.0
89 or 10 or whatever it ends up being. It will be betterI just took the response to my specific post there as a my poor choice of words and crossed it out.. because clearly "It is not accurate". --
Yes it seems generally agreed that 24.0x (next) version of KEA will be better. But also because DHCP is so "Core" I chose to never actually enable it on the Netgate. Going back to a time well before any of this was ever introduced. Nothing says DHCP "has to be" run on the Negate.
I'll stop now, good luck
-
I see this is still ongoing. I found a workaround via watchdog service. The script does a cleanup before attempting to restart the sevice
-
list itemmake sure you install the watchdog service in pfsense.
-
list Add the kea dhcp 4 service.
-
Shell into pfsense and change to /usr/local/etc/rc.d
-
list Create a backup of kea service script. cp kea kea.old
-
list edit the kea file and replace contents with script below
#!/bin/sh # PROVIDE: kea # REQUIRE: NETWORK netif routing # KEYWORD: shutdown . /etc/rc.subr name=kea desc="Kea DHCP Server" rcvar=kea_enable load_rc_config $name kea_enable=${kea_enable:-"NO"} command="/usr/local/sbin/keactrl" required_files="/usr/local/etc/${name}/keactrl.conf" # Add cleanup function cleanup_kea() { # Clean up stale lock files rm -f /tmp/kea4-ctrl-socket.lock # Kill any zombie processes pkill -9 kea-dhcp4 # Wait for processes to die sleep 2 } # Modify start command to include cleanup start_cmd() { cleanup_kea ${command} start logger -t kea-watchdog "Kea DHCP4 started with cleanup" } # Modify stop command to include cleanup stop_cmd() { ${command} stop cleanup_kea logger -t kea-watchdog "Kea DHCP4 stopped with cleanup" } status_cmd="$command status" reload_cmd="$command reload" extra_commands="reload" run_rc_command "$1"
Watchdog should auto restart the service
-