After upgrading to 24.11 DHCP fails every 10-14 days
-
I did just find this in the main log:
Jan 4 11:16:23 kernel pid 90595 (kea-dhcp4), jid 0, uid 0: exited on signal 11 (core dumped)
And then 5 minutes or so later kea DNS unregisters all the DCHP clients.
Jan 4 11:16:28 kea2unbound 50780 Remove record: "[REDACTED].localdomain. 2400 IN A 192.168.107.100" Jan 4 11:16:28 kea2unbound 50780 Remove record: "100.107.168.192.in-addr.arpa. 2400 IN PTR [REDACTED].localdomain."
(14 other clients similarly purged until the DHCP server could be restarted).
-
@lohphat personally I would not be running kea as of yet.. I still run ISC and no plans to switching to kea until sometime down the road when all the kinks have been worked out. Sure its no longer being actively developed, and even if there was some new exploit of it that was a concern, dhcp is only exposed to my own secure local networks. And there is nothing of pfsense exposed to internet other than me being able to vpn in.
I am looking forward to much better and adjustable log info, and for sure the dhcp registration without having to restart unbound, etc. But to be honest while dhcp reservation sure would be nice, and it has been a long time coming. I don't really have a need to be honest, all my devices I would want to resolve via name have a reservation so they are always on the same IP, and I just register those.
But yeah that kea-dhcp4 causing a core dump - not optimal for sure ;)
Might be in your best interest to just switch back vs dealing with any possible kinks that still need a bit of work.
But next time your client looses its dhcp lease and has no ip - I would set a static on it and attempt to get access to pfsense, or console, etc.. A power cycle should be the very last course of action after you have exhausted all other options.
While with zfs sudden loss of power should be more likely to come back.. Still not a good idea. I don't recall ever having to power cycle pfsense.. If it came down to - well just going to pull the plug, I would be getting my stuff in order to do a clean install.
-
You have an SG-3100 and that uses a 32-bit ARM processor. That binary code is totally different from what most of us are running on Intel/AMD 64-bit hardware. I'm not completely surprised there is an issue with the 32-bit ARM compiled version of Kea.
The SG-3100 is a very old platform now, and its 32-bit ARM CPU has always been a bit of challenge in my opinion to run code on. This is coming from my past experience tracking down some Snort binary bugs that manifested only on the 32-bit ARM chip in the SG-3100.
It might be time to look at moving to an Intel/AMD 64-bit platform and retiring the SG-3100. FreeBSD is no longer officially supported on 32-bit platforms, so issues with binary code on 32-bit CPUs are not likely to be a high priority now.
I just switched to Kea and retired my former Microsoft Active Directory environment with the release of pfSense 24.11. I was waiting on DNS registration to get implemented hoping that would allow me to retire my old Windows Server 2012 R2 setup. So far it has been great. No DHCP or DNS issues with Kea and Unbound, and I retired my old AD setup I had used at home since way back when Microsoft TechNet and MSDN was still a thing.
-
@bmeeks said in After upgrading to 24.11 DHCP fails every 10-14 days:
ay back when Microsoft TechNet and MSDN was still a thing.
hahah - dude I use to love those binders they would send with all the CDs..
Great point about old 32bit - that for sure could be more problematic.. But I got nothing to retire to push me towards kea.. Have they cleaned up the logging.. It was horrible in first preview.. Some of the reading I did on kea, it does look like the logging is customizable, etc.
-
If the SG-3100 is no longer supported (I know it's no longer sold) then draw a line in the release binaries and not allow half-baked modules which are of unknown stability on the platform be installed unless there are clear caveats and/or checkboxes to override the cautions.
These are purpose built infrastructure devices and released code should be CLEARLY marked as to what's supported/recommended vs experimental vs untested.
The Setting / Networking page indicates that ISC DHCP is deprecated. No mention that KEA is not tested/stable anywhere obvious.
ISC DHCP has reached end-of-life and will be removed in a future version of Netgate pfSense Plus. Visit System > Advanced > Networking to switch DHCP backend.
I know I have to upgrade my hardware, but what continues to frustrate me about Netgate documentation is that MAX MTU is not clearly documented anywhere on the spec sheets last time I looked despite asking years ago for it to included in plain sight after realizing my new SG-3100 couldn't support jumbo frames and people avoiding the simple issue by getting into the benefits of jumbo or not. I have a a storage vlan set to jumbo and I can't terminate the vlan on the 3100.
All I am asking is for clarity so customers can make informed decisions about hardware and modules as to what they can and can not do.
-
@johnpoz said in After upgrading to 24.11 DHCP fails every 10-14 days:
Have they cleaned up the logging.. It was horrible in first preview.. Some of the reading I did on kea, it does look like the logging is customizable, etc.
The logging seems pretty sparse - at least the default setup. It logs some startup info and then a couple of entries when it registers or de-registers a DNS entry from a DHCP assignment.
I could have used DHCP reservations (static DHCP seems to be the new age term for those) and kept ISC, but I decided to move to Kea since the DNS registration was touted as working. So far that seems to behaving well; at least in my small home network. I only have a about 23 clients max showing up with DHCP leases.
-
@lohphat said in After upgrading to 24.11 DHCP fails every 10-14 days:
If the SG-3100 is no longer supported (I know it's no longer sold) then draw a line in the release binaries and not allow half-baked modules which are of unknown stability on the platform be installed unless there are clear caveats and/or checkboxes to override the cautions.
These are purpose built infrastructure devices and released code should be CLEARLY marked as to what's supported/recommended vs experimental vs untested.
The Setting / Networking page indicates that ISC DHCP is deprecated. No mention that KEA is not tested/stable anywhere obvious.
ISC DHCP has reached end-of-life and will be removed in a future version of Netgate pfSense Plus. Visit System > Advanced > Networking to switch DHCP backend.
I know I have to upgrade my hardware, but what continues to frustrate me about Netgate documentation is that MAX MTU is not clearly documented anywhere on the spec sheets last time I looked despite asking years ago for it to included in plain sight after realizing my new SG-3100 couldn't support jumbo frames and people avoiding the simple issue by getting into the benefits of jumbo or not. I have a a storage vlan set to jumbo and I can't terminate the vlan on the 3100.
All I am asking is for clarity so customers can make informed decisions about hardware and modules as to what they can and can not do.
First off, just want to make clear that I am not affiliated with Netgate at all. I simply am the volunteer developer/maintainer for the Snort and Suricata packages on pfSense. I have no input into any decisions made by Netgate nor do I have any inside information on their product decisions.
I just know from previous experience with the particular ARM chip used in the SG-3100 that some FreeBSD code can have issues that are random and weird on that architecture. To be fair to the ARM guys, the issues are generally due to poorly written C code that runs fine on Intel/AMD CPUs because those processors do some auto-fixup things in their microcode that ARM chose not to do (at least not for all CPU opcodes).
One thing to understand is that ARM CPU code generation takes a different compiler than Intel/AMD CPU code generation. That's another way that code can run fine on an Intel/AMD CPU but have little gotchas on ARM. There is not a one-to-one match between C code and CPU binary opcodes on the two CPU platforms. The choices of the compiler are critical to the C code running properly. I'm sure Netgate does some preliminary testing on SG-3100 boxes for now as they state that 24.11 is supported there. But that does not mean they can test every single scenario. Perhaps it's some unique combination of configuration choices and data on your network that triggers the Kea issue on the 32-bit CPU.
The SG-3100 uses a SOC (system on a chip) architecture where the available LAN ports are actually just ports on an internal Marvell switch. VLANs are used to implement the isolation between ports and make them appear as psuedo individual physical NICs. I believe it is the internal Marvell switch that does not support jumbo frames. You will find jumbo frame support to be a hit or miss thing on various hardware platforms. One thing that is for sure with jumbo frames is that all devices on that layer 2 segment need to support them or you will have issues. Since apparently you can saturate 10G links with regular 1500 byte frames if you have decent CPU capacity, there does not appear to be a burning push to have jumbo frames everywhere yet. Perhaps in isolated network segments they make sense.
-
@lohphat said in After upgrading to 24.11 DHCP fails every 10-14 days:
No mention that KEA is not tested/stable anywhere obvious.
Other than the blog they posted and the release notes.. Why would anyone read those ;)
https://www.netgate.com/blog/netgate-adds-kea-dhcp-to-pfsense-plus-software-version-23.09-1
https://docs.netgate.com/pfsense/en/latest/releases/23-09.html#rn-23-09-kea
Should they still have such a warning in the 24.11 release notes - sure ok.. I don't think anyone is giving away awards to netgate for their handling of how they rolled out the preview of kea to be honest ;)
They put up this blog for 24.08 before it got changed to be 24.11
https://www.netgate.com/blog/improvements-to-kea-dhcp
They could of maybe highlighted this statement a bit more I guess.
Migration Timeline
The migration to Kea DHCP has been ongoing for some time, and with the addition of High Availability support in pfSense Plus software version 24.08, we are approaching the final stages of this transition. Our goal is to reach feature parity between the Kea and ISC DHCP backends over the next few releases.
If anything such comments like goal over the next few releases to reach parity should flag you to maybe look into it a bit more before kicking switch.
The warning about deprecation, could of been worded better - don't think your going to find many people that think it was perfect.
But information is available, and its not hidden away somewhere in a redmine.. And breeze through the forums now and then to see what users are running into, etc. While there is also this redmine
-
@johnpoz said in After upgrading to 24.11 DHCP fails every 10-14 days:
hahah - dude I use to love those binders they would send with all the CDs..
I had lots of those little binder books full of CDs! Finally with MSDN it moved to online downloads once Internet speeds picked up.
I did all that as part of my job before retirement. Had to maintain a lot of Microsoft stuff throughout my career after Netware finally was phased out and Bill Gates won the corporate networking wars. Cut my initial networking teeth on Novell Netware and IPX/SPX. Even had some Token Ring stuff for a bit. Later we toyed with TCP/IP on Netware before moving fully to the Microsoft world.
I guess we sound like a pair of old guys talking about the hard times when we were kids -- such as "having to walk to school and back home barefoot in a snow storm and it was uphilll both ways !
-
@johnpoz said in After upgrading to 24.11 DHCP fails every 10-14 days:
Other than the blog they posted and the release notes.. Why would anyone read those ;)
I did read them and none of them really raised the issues of your previous message did.
@johnpoz said in After upgrading to 24.11 DHCP fails every 10-14 days:
I still run ISC and no plans to switching to kea until sometime down the road when all the kinks have been worked out.
Those concerns are not in the Warning info box nor in the GUI.
Customers are not all developers and should be presented with the default stable choices and not led down a path of "Deprecated, this will be going away soon" in the GUI if there are "kinks" which still need to be worked out -- placing "EXPERIMENTAL on 32bit platforms" would have been a more appropriate and informative detail to at least raise an immediate sense of caution for those of us who are on those platforms.
I've switched DCHP modules and will report back if there are any issues.
Thank you for the developer insights, but now that we know that the kea DHCP server can just poop the bed randomly on 32-bit platforms and there's no watchdog to restart it by default, something still needs to be done for those of us in this situation.
Given a new 4200 cost ($570) to replace the 3100, I might as well look at a PA-4xx series.
-
@bmeeks said in After upgrading to 24.11 DHCP fails every 10-14 days:
barefoot in a snow storm and it was uphilll both ways
It was 20 miles for me ;) heheheh
-
@bmeeks
Sounds like we had similar facets to our careers. Started working in an integration house at 15 in 1980 then moved to ARC Net, Novell, then MSFT.Now my final gig, retrofitting a TV studio with older SDI cabling ST2110 (video over IP) with a 100Gbit backbone.
-
are all 'solved' when you use 24.11 ...
That is, I wouldn't want to wait for Remote DNS server registration (kea calls it D2) and as the binary is there, I've put it to work.
I don't need the HA part of kea, as Ive only one pfSense.
The rest : works for me.I agree with what bmeeks said above about 32 bit arm code. That said, I don't have 'arm' (firewall) devices, I don't have '32 bit' device anymore. And testing these issues is pretty hard core.
Still, '32 bits' stuff was phased out more the a couple of years ago, and while Netgate promised to support, this 'kea' core dumping is scary.
Just keep in mind that even if ISC DHCP earned the "depreciated' tag, it is still rock solid for many moons to come.Kea, right now, on a 4100 using 24.11 is pretty stable for me. I'll give it the title 'rock solid' in a month or two.
And, as stated above, I messing around with it.@lohphat said in After upgrading to 24.11 DHCP fails every 10-14 days:
Given a new 4200 cost ($570) to replace the 3100, I might as well look at a PA-4xx series.
Remove 120 or so from to get bare metal costs. With a non Netgate device you would have to buy pfSense plus. 2.7.2 doesn't have the latest kea upgrades / GUI implementations yet, although that will change soon.
A '4200' will give you a 64 bits x86-64 device, and I haven't heard of the upcoming x86-128 yet ^^ -
Redmine bug opened: https://redmine.pfsense.org/issues/15973
-
@Gertjan said in After upgrading to 24.11 DHCP fails every 10-14 days:
With a non Netgate device you would have to buy pfSense plus.
If I get a PA-4xx unit it will be running PANOS (Palo Alto Networks), not pfSense.
-
@Gertjan said in After upgrading to 24.11 DHCP fails every 10-14 days:
I agree with what bmeeks said above about 32 bit arm code.
Then don't release 24.11 for 32-arm and just EOL those platforms. Don't lead affected customers down an unstable path. Just end the support instead of introducing unstable code for the platform.
By telling customers that 24.11 is the preferred release that implies it works for your platform.
The conflicting messaging as to what's preferred, deprecated, supported, potentially unstable, etc. was not handled clearly.