Something taking up all the space on my system
-
@troutpocket could be inodes?
whats the output from:df -h
-
df -h Filesystem Size Used Avail Capacity Mounted on /dev/ufsid/5b0df80d6c33863e 26G 22G 2.1G 91% / devfs 1.0K 1.0K 0B 100% /dev tmpfs 4.0M 152K 3.9M 4% /var/run devfs 1.0K 1.0K 0B 100% /var/dhcpd/dev
I was thinking inodes, too. Not sure how to clean those up other than fsck. I can failover to fw2 tonight and run fsck in single user. Is that my next option?
-
@troutpocket Checking the docs often can be of help: https://docs.netgate.com/pfsense/en/latest/troubleshooting/filesystem-usage.html
-
@rcoleman-netgate
Thanks. It's not inodes I guessdf -hi Filesystem Size Used Avail Capacity iused ifree %iused Mounted on /dev/ufsid/5b0df80d6c33863e 26G 22G 2.0G 92% 37k 3.6M 1% / devfs 1.0K 1.0K 0B 100% 0 0 100% /dev tmpfs 4.0M 156K 3.8M 4% 47 14k 0% /var/run devfs 1.0K 1.0K 0B 100% 0 0 100% /var/dhcpd/dev
After hours fsck in a few...
-
@troutpocket probably a stuck log.
I would do this command and see:
du -a /var | sort -n -r | head -n 10
and then the same for /usr
-
@rcoleman-netgate
single user fsck seems to have cleared it up, but no errors were reported. I have a feeling I'll be right back where I started in a couple days. This isn't the first time is filled up with magic nothingness.df -hi Filesystem Size Used Avail Capacity iused ifree %iused Mounted on /dev/ufsid/5b0df80d6c33863e 26G 2.6G 21G 11% 37k 3.6M 1% / devfs 1.0K 1.0K 0B 100% 0 0 100% /dev tmpfs 4.0M 144K 3.9M 4% 42 14k 0% /var/run devfs 1.0K 1.0K 0B 100% 0 0 100% /var/dhcpd/dev
-
Yup, it's filling up again. Already at >60% full after only 24hrs. The actual directory sizes don't line up with the size reported in by df. I do a fair bit of logging with Suricata, but I limit that directory to 4GB and it seems to honor that limit (in suricata config). But the space remaining continues to diminish.
Any other ideas what I can check?
-
@troutpocket I vaguely remember a similar post but only vaguely. Do you have log compression on? If so try disabling it.
-
Yes. It's set for bzip2. I'll turn it off for the night and see what's happening in the morning.
-
No dice. It's 109% full again. What else could it be?
-
Here's the fsck during single user:
Forcing filesystem check (5 times)... ** /dev/ufsid/5b0df80d6c33863e ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes uhub0: 8 ports with 8 removable, self powered ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 36366 files, 638029 used, 6213394 free (4610 frags, 776098 blocks, 0.1% fragmentation) ***** FILE SYSTEM IS CLEAN ***** ** /dev/ufsid/5b0df80d6c33863e ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 36366 files, 638029 used, 6213394 free (4610 frags, 776098 blocks, 0.1% fragmentation) ***** FILE SYSTEM IS CLEAN ***** ** /dev/ufsid/5b0df80d6c33863e ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 36366 files, 638029 used, 6213394 free (4610 frags, 776098 blocks, 0.1% fragmentation) ***** FILE SYSTEM IS CLEAN ***** ** /dev/ufsid/5b0df80d6c33863e ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 36366 files, 638029 used, 6213394 free (4610 frags, 776098 blocks, 0.1% fragmentation) ***** FILE SYSTEM IS CLEAN ***** ** /dev/ufsid/5b0df80d6c33863e ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 36366 files, 638029 used, 6213394 free (4610 frags, 776098 blocks, 0.1% fragmentation) ***** FILE SYSTEM IS CLEAN ***** /dev/ufsid/5b0df80d6c33863e: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/ufsid/5b0df80d6c33863e: clean, 6213394 free (4610 frags, 776098 blocks, 0.1% fragmentation) Filesystems are clean, continuing... Mounting filesystems...
-
And after running that it's back down to the expected usage?
-
When it reaches 100% full I reboot and do the fsck. It comes back with no config so I restore a good config from backup, reboot again, and the system is back to normal, but slowly filling up with invisible stuff.
-
@troutpocket said in Something taking up all the space on my system:
It comes back with no config
Hmm, well that's odd. There have been bugs in the past where the config file get updated with bad data and grows exponentially. Do you see no config file at all in /conf? Or in /conf/backup?
-
After the reboot, the config.xml file is a fresh 8k file. /conf/backup is full of my backup configs, plus I have one off-line I can use. Everything looks good and healthy otherwise. Restoring the good config brings things back to "normal".
-
Hmm, maybe check the config file size periodically. Make sure it's not increasing before this happens.
-
I did. It's not changing. There isn't any file or folder I can find that is dramatically increasing in size. Basically, 24GB is steadily growing on the root filesystem in some way not generally visible to regular filesystem tools. I have a good graph from grafana that I'll post later which helps visualize the linear growth.
-
Here's the last 48 hours. It gracefully fills up until about 30% then there's this weird jaggy thing. Maybe syslog is attempting to trim logs?
The graph goes back to zero when it's 100% full probably because it can't send telegraf data to the logger any more. Then I reboot and fsck and we start again. This trend goes back at least month. I don't keep logs like this longer so I'm not sure when it started.
-
How are you pulling that data? I assume it lines up with the output from df at that time?
It's not something I've seen locally where there was no obvious process filling the filesystem.
-
Telegraf dumps timeseries data from the pfsense firewall to a separate "logger" system (influxdb). It's not stored locally. We do this on 50+ pfsense firewalls and it's not happening anywhere else. I've been comparing configs across multiple sites and they're all nearly identical. I bang these out a few times each month.
I guess at this point it has become an academic curiosity for me more than anything else. I can fail over to the other half of the HA pair (yay CARP!).