Something taking up all the space on my system

rcoleman-netgate

@troutpocket could be inodes?
whats the output from:

df -h

Troutpocket

@rcoleman-netgate

df -h
Filesystem                     Size    Used   Avail Capacity  Mounted on
/dev/ufsid/5b0df80d6c33863e     26G     22G    2.1G    91%    /
devfs                          1.0K    1.0K      0B   100%    /dev
tmpfs                          4.0M    152K    3.9M     4%    /var/run
devfs                          1.0K    1.0K      0B   100%    /var/dhcpd/dev

I was thinking inodes, too. Not sure how to clean those up other than fsck. I can failover to fw2 tonight and run fsck in single user. Is that my next option?

rcoleman-netgate

@troutpocket Checking the docs often can be of help: https://docs.netgate.com/pfsense/en/latest/troubleshooting/filesystem-usage.html

Troutpocket

@rcoleman-netgate
Thanks. It's not inodes I guess

df -hi
Filesystem                     Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/ufsid/5b0df80d6c33863e     26G     22G    2.0G    92%     37k  3.6M    1%   /
devfs                          1.0K    1.0K      0B   100%       0     0  100%   /dev
tmpfs                          4.0M    156K    3.8M     4%      47   14k    0%   /var/run
devfs                          1.0K    1.0K      0B   100%       0     0  100%   /var/dhcpd/dev

After hours fsck in a few...

rcoleman-netgate

@troutpocket probably a stuck log.

I would do this command and see:

du -a /var | sort -n -r | head -n 10

and then the same for /usr

Troutpocket

@rcoleman-netgate
single user fsck seems to have cleared it up, but no errors were reported. I have a feeling I'll be right back where I started in a couple days. This isn't the first time is filled up with magic nothingness.

df -hi
Filesystem                     Size    Used   Avail Capacity iused ifree %iused  Mounted on
/dev/ufsid/5b0df80d6c33863e     26G    2.6G     21G    11%     37k  3.6M    1%   /
devfs                          1.0K    1.0K      0B   100%       0     0  100%   /dev
tmpfs                          4.0M    144K    3.9M     4%      42   14k    0%   /var/run
devfs                          1.0K    1.0K      0B   100%       0     0  100%   /var/dhcpd/dev

Troutpocket

@troutpocket

Yup, it's filling up again. Already at >60% full after only 24hrs. The actual directory sizes don't line up with the size reported in by df. I do a fair bit of logging with Suricata, but I limit that directory to 4GB and it seems to honor that limit (in suricata config). But the space remaining continues to diminish.

Any other ideas what I can check?

SteveITS

@troutpocket I vaguely remember a similar post but only vaguely. Do you have log compression on? If so try disabling it.

Troutpocket

@steveits

Yes. It's set for bzip2. I'll turn it off for the night and see what's happening in the morning.

Troutpocket

@steveits

No dice. It's 109% full again. What else could it be?

Troutpocket

Here's the fsck during single user:

Forcing filesystem check (5 times)...
** /dev/ufsid/5b0df80d6c33863e
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
uhub0: 8 ports with 8 removable, self powered
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
36366 files, 638029 used, 6213394 free (4610 frags, 776098 blocks, 0.1% fragmentation)

***** FILE SYSTEM IS CLEAN *****
** /dev/ufsid/5b0df80d6c33863e
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
36366 files, 638029 used, 6213394 free (4610 frags, 776098 blocks, 0.1% fragmentation)

***** FILE SYSTEM IS CLEAN *****
** /dev/ufsid/5b0df80d6c33863e
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
36366 files, 638029 used, 6213394 free (4610 frags, 776098 blocks, 0.1% fragmentation)

***** FILE SYSTEM IS CLEAN *****
** /dev/ufsid/5b0df80d6c33863e
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
36366 files, 638029 used, 6213394 free (4610 frags, 776098 blocks, 0.1% fragmentation)

***** FILE SYSTEM IS CLEAN *****
** /dev/ufsid/5b0df80d6c33863e
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
36366 files, 638029 used, 6213394 free (4610 frags, 776098 blocks, 0.1% fragmentation)

***** FILE SYSTEM IS CLEAN *****
/dev/ufsid/5b0df80d6c33863e: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufsid/5b0df80d6c33863e: clean, 6213394 free (4610 frags, 776098 blocks, 0.1% fragmentation)
Filesystems are clean, continuing...
Mounting filesystems...

stephenw10

And after running that it's back down to the expected usage?

Troutpocket

@stephenw10

When it reaches 100% full I reboot and do the fsck. It comes back with no config so I restore a good config from backup, reboot again, and the system is back to normal, but slowly filling up with invisible stuff.

stephenw10

@troutpocket said in Something taking up all the space on my system:

It comes back with no config

Hmm, well that's odd. There have been bugs in the past where the config file get updated with bad data and grows exponentially. Do you see no config file at all in /conf? Or in /conf/backup?

Troutpocket

@stephenw10

After the reboot, the config.xml file is a fresh 8k file. /conf/backup is full of my backup configs, plus I have one off-line I can use. Everything looks good and healthy otherwise. Restoring the good config brings things back to "normal".

stephenw10

Hmm, maybe check the config file size periodically. Make sure it's not increasing before this happens.

Troutpocket

@stephenw10

I did. It's not changing. There isn't any file or folder I can find that is dramatically increasing in size. Basically, 24GB is steadily growing on the root filesystem in some way not generally visible to regular filesystem tools. I have a good graph from grafana that I'll post later which helps visualize the linear growth.

Troutpocket

@stephenw10

Here's the last 48 hours. It gracefully fills up until about 30% then there's this weird jaggy thing. Maybe syslog is attempting to trim logs?

The graph goes back to zero when it's 100% full probably because it can't send telegraf data to the logger any more. Then I reboot and fsck and we start again. This trend goes back at least month. I don't keep logs like this longer so I'm not sure when it started.

alt text

stephenw10

How are you pulling that data? I assume it lines up with the output from df at that time?

It's not something I've seen locally where there was no obvious process filling the filesystem.

Troutpocket

@stephenw10

Telegraf dumps timeseries data from the pfsense firewall to a separate "logger" system (influxdb). It's not stored locally. We do this on 50+ pfsense firewalls and it's not happening anywhere else. I've been comparing configs across multiple sites and they're all nearly identical. I bang these out a few times each month.

I guess at this point it has become an academic curiosity for me more than anything else. I can fail over to the other half of the HA pair (yay CARP!).