Config history not pruning on HA pair, has 3400 files
-
While answering another post I went to pull up the config history in a router, and it timed out after a while. Looking into it, I see over 3400 files dating back to September 2024:
]/cf/conf/backup: ls -l |more total 730824 -rw-r--r-- 1 root wheel 277474 Sep 10 2024 config-1725947104.xml -rw-r--r-- 1 root wheel 277544 Sep 10 2024 config-1725968700.xml -rw-r--r-- 1 root wheel 277299 Sep 10 2024 config-1725968704.xml -rw-r--r-- 1 root wheel 277474 Sep 10 2024 config-1725968706.xml -rw-r--r-- 1 root wheel 277544 Sep 10 2024 config-1725990300.xml
After it timed out and the "50x Error" page displayed in pfSense ["upstream timed out (60: Operation timed out) while reading response header from upstream"], it has only 39 files in the directory (one dated Nov 2024):
]/cf/conf/backup: ls -ltr total 8015 -rw-r--r-- 1 root wheel 276589 Nov 2 2024 config-1730524664.xml -rw-r--r-- 1 root wheel 272087 Mar 3 00:45 config-1740962704.xml -rw-r--r-- 1 root wheel 273351 Mar 11 00:45 config-1741650304.xml -rw-r--r-- 1 root wheel 273351 Mar 16 00:45 config-1742082304.xml -rw-r--r-- 1 root wheel 319900 Mar 31 14:46 config-1743449909.xml -rw-r--r-- 1 root wheel 316283 Apr 17 12:45 config-1744890304.xml -rw-r--r-- 1 root wheel 323636 May 8 00:45 config-1746661505.xml -rw-r--r-- 1 root wheel 326689 May 31 12:45 config-1748691905.xml -rw-r--r-- 1 root wheel 326523 May 31 12:45 config-1748713501.xml -rw-r--r-- 1 root wheel 326689 May 31 18:45 config-1748713504.xml -rw-r--r-- 1 root wheel 326523 May 31 18:45 config-1748735103.xml -rw-r--r-- 1 root wheel 326689 Jun 1 00:45 config-1748735105.xml -rw-r--r-- 1 root wheel 326523 Jun 1 00:45 config-1748756701.xml -rw-r--r-- 1 root wheel 326689 Jun 1 06:45 config-1748756704.xml -rw-r--r-- 1 root wheel 326514 Jun 1 06:45 config-1748778302.xml -rw-r--r-- 1 root wheel 326523 Jun 1 06:45 config-1748778303.xml -rw-r--r-- 1 root wheel 326689 Jun 1 12:45 config-1748778305.xml -rw-r--r-- 1 root wheel 326523 Jun 1 12:45 config-1748799901.xml -rw-r--r-- 1 root wheel 326689 Jun 1 18:45 config-1748799904.xml -rw-r--r-- 1 root wheel 326514 Jun 1 18:45 config-1748821503.xml -rw-r--r-- 1 root wheel 326523 Jun 1 18:45 config-1748821504.xml
If I load the page again the older files are removed. Per the above timestamps, it seems the cron sync is triggering multiple updates on the backup.
Since this is a backup in an HA pair, I logged in to the primary, and it has 1900 files in this folder:
]/cf/conf/backup: ls -l | wc 1969 17714 131869
It's "known" that pfBlocker's cron triggers a config update by changing a
<time></time>
field in the config file, even if no other changes are made. That is this 6 hour rotation on the primary:-rw-r--r-- 1 root wheel 330401 May 31 12:45 config-1748691902.xml -rw-r--r-- 1 root wheel 330401 May 31 18:45 config-1748713501.xml -rw-r--r-- 1 root wheel 330401 Jun 1 00:45 config-1748735102.xml
On the primary router I went to the Configuration History page and after a while it did complete successfully. Similar to above, I end up with one extra file from March, which doesn't show in the web GUI:
-rw-r--r-- 1 root wheel 284868 Mar 25 17:15 config-1742940907.xml -rw-r--r-- 1 root wheel 331123 May 28 12:45 config-1748449112.xml -rw-r--r-- 1 root wheel 331024 May 28 18:45 config-1748454301.xml -rw-r--r-- 1 root wheel 331024 May 29 00:45 config-1748475902.xml
I spot checked another router and it has 30 files in the folder. If I reload the config history page the old files are deleted.
What is supposed to trigger the backup file pruning? And why isn't it working? The default is 30 backups so 3400 is a bit over expected. (and yes that "Maximum Backups" field is blank on these routers)
Both have 24.11 installed.
-
I logged in to my home router which has 76 files:
]/cf/conf/backup: ls -l |wc 76 677 5036
...dating back to March 5:
-rw-r--r-- 1 root wheel 133456 Mar 5 21:01 config-1741230074.xml -rw-r--r-- 1 root wheel 134150 Mar 5 21:06 config-1741230100.xml -rw-r--r-- 1 root wheel 134149 Mar 5 21:03 config-1741230191.xml ...
Similarly, opening the configuration history page deletes all but 30, though (presumably) since there aren't 1000+ it doesn't time out and does actually delete all the old ones.
So whatever the problem is, is not related to HA.
On my home router the times are suspiciously similar to the last boot ("Mar 6 08:02 dmesg.boot", 89 days) however the HA pair was rebooted a month ago.
-
Yup that's a known bug: https://redmine.pfsense.org/issues/15994
It's fixed in 25.03.
-
@stephenw10 ah I see. And โAffected Plus Version set to 24.03โ explains why it went back so far. Though we would have updated both the same day. Maybe someone opened the config history on one.
3400 is a bit alarming of course. Looks like a 2-3 sync/write amplification on our backup router for some reason. Not that it does much else writing. Itโs just a bit ironic because pfBlocker has has issues syncing changes to the backup router outside of cron.
-
Yup they can really stack up!
-
-
@stephenw10 Also wanted to add that it will stack even immensly higher on a HA Pair with pfBlockerNG enabled as PFB unnecessarily does a config write every hour it gets triggered even when doing absolutely nothing. That should really be fixed! as it will trigger a ha sync run (without reason) and even triggers double the amount of config writes on the standby node (one from pfB itself on that node, one from the unnecessary sync operation). We had nodes with over 20k configs :((
See one of the posts from here:
https://forum.netgate.com/topic/188036/list-of-problems-bugs-in-ha-carp-setups/8 -
@JeGr said in Config history not pruning on HA pair, has 3400 files:
PFB unnecessarily does a config write every hour
I think that's this "new" issue from 2022, do you have DNSBL disabled? We do, in our data center.
https://forum.netgate.com/topic/174231/pfblockerng-fills-pfsense-config-history
https://redmine.pfsense.org/issues/14409There have been other similar bugs in pfB in the past, I believe.
-
The patch to fix this applies cleanly to 24.11 and works for me here.
I don't have any instances that have a stack of backups right now though. If you can test let me know.
-
@stephenw10 I applied the patch on our secondary and after making a change on primary it removed a couple dozen extra files since my earlier posts.
I think the extra config writes on the backup router were because of this pfBlocker bug where it doesn't bother syncing changes to the backup unless one manually runs a force reload. So I think the backup router was adding a list and removing a (defunct) list every time the cron jobs ran (the one on primary, and the one on secondary).
-
@stephenw10 said in Config history not pruning on HA pair, has 3400 files:
stack of backups
(cleaning up my email for the day) You can actually generate them...install pfBlocker, don't enable DNSBL, and an hourly cron should update at least its <time> value even if nothing else is selected. Then just wait a few hours, or a day, or so.
And, the patch worked on our primary router also.
On most others, we have been enabling DNSBL just to have the DoH blocking, so at least the config spam isn't nearly as bad on them. And/or have the pfB update set to like 12 or 24 hours.
Our HA units are also using SSDs...5-10 GB of config files could eat up most of the space on an eMMC. (edit: though I suppose it is compressed)
-
-
@SteveITS said in Config history not pruning on HA pair, has 3400 files:
I think that's this "new" issue from 2022, do you have DNSBL disabled? We do, in our data center.
Yes, no DNSBL at all. Just IP Block/Allowlists getting updated. And as these do no changes at all - as the content of the files is not written into the XML file and the Alias is a file on disk - it just creates useless empty entries as per my list of bugs in the pfB Forum.
@stephenw10 said in Config history not pruning on HA pair, has 3400 files:
I don't have any instances that have a stack of backups right now though. If you can test let me know.
I'll give it a try, nearly all our clusters have that problem.
-
@JeGr said in Config history not pruning on HA pair, has 3400 files:
I'll give it a try, nearly all our clusters have that problem.
Up until now, the config files stayed below the 100 versions we configured, so not much to report for "going over". I also installed a second patch from the ticket about pfBlockerNG writing useless empty changes and that seemed to work very well - no empty "writing DNSBL changes" anymore, so no hourly hits that would drive the version count up. I'll have to wait for the next few changes to bring it up to a hundred to check if it goes over again, but right now it looks good.
Cheers
-
Great. Thanks for testing!