SG-2220 Full HDD Problems



  • I noticed a few days ago that the firewall wasn't handing out new addresses and when I logged in I saw that several services were not started. I figured a reboot was worth a try and when I did things worked for a few minutes but the services were still acting a bit quirky (not starting). When I finally checked the bottom of the dashboard I saw the following for disk usage:

    Disk usage
    / (ufs): 105% of 1.8G
    /var/run (ufs in RAM): 3% of 3.4M

    So I searched for a similar problem and found the main page for full disks but I don't know where to go from here:

    $ df -hi
    Filesystem                                  Size    Used  Avail    Capacity iused ifree %iused  Mounted on
    /dev/ufsid/55f05ec197022192    1.8G    1.7G    -89M  105%    19k  223k    8%  /
    devfs                                          1.0K    1.0K      0B    100%      0    0  100%  /dev
    /dev/md0                                    3.4M    112K    3.0M    3%      35  987    3%  /var/run
    devfs                                          1.0K    1.0K      0B      100%      0    0  100%  /var/dhcpd/dev
    fdescfs                                        1.0K    1.0K      0B    100%      11  58k    0%  /dev/fd

    To me, it looks like the first item is the issue but I don't know what I'm supposed to do about it.

    Things I've already tried include reducing the log size for Snort by enabling the automatic log management with all the max sizes set to default, which comes to barely over 2MB, clearing all logs for all services (including Snort) and the firewall, and countless reboots.

    EDIT:
    After checking the logging area to clear logs again, I noticed that the log size was blank. I set it to 511488, saved, and hit reset logs. All the logs area cleared (again) but the following is still printed on the page:

    Disk space currently used by log files: 1.2G. Remaining disk space for log files: -72M.

    Is there some other way to clear logs that I'm unaware of?

    EDIT 2:
    Found out it was a bunch of pflog.bad.[random hex] files that were way larger than any other. Cleared them out with an rm and looks like that finally got my HDD back down to ~35%.



  • You are running snort off the eMMC install? Won't that degrade it fast?


  • Rebel Alliance Developer Netgate

    I wouldn't be worried about the longevity of the eMMC but it definitely doesn't have enough space to effectively handle large packages like snort with much grace. An M.2 disk would be much better if you want to run packages like that.



  • I believe there is a much more serious thing going on here, possibly a bug with the software.

    So I thought the problem was resolved by deleting those mysterious files but then yesterday none of the nodes in the network were able to reach the Internet. They were being served IP's and I could still log into the firewall's configuration website. That's when things really got weird, my dashboard was completely different, Snort and service monitor were missing, but traffic/bandwidth monitors were there instead. I also verified that I still had a WAN IP (I'll come back to this in a minute) but when I checked services Snort and a couple others (I should have wrote this down in hindsight) were completely missing. Also, on the dashboard my memory was 108% used now.

    I checked my installed packages to see if maybe Snort was acting up and blocking everyone for some reason. Come to find out, Snort (and every other package) were completely gone, as if I had uninstalled them. So I reinstalled Snort, thinking maybe it just needed to reload my settings, but it did not, it was as if I was doing it for the first time.

    Next I went over to to the logging management page and saw that I was again in negative space. Then I went back to the trouble area and checked my log file sizes:

    $ ls -al /var/log
    total 2551544
    drwxr-xr-x  4 root  wheel      1024 Sep 29 02:57 .
    drwxr-xr-x  28 root  wheel        512 Sep 10 20:42 ..
    -rw–-----  1 root  wheel    1024000 Sep 29 05:04 dhcpd.log
    -rw-r--r--  1 root  wheel      8294 Sep 29 04:08 dmesg.boot
    -rw-------  1 root  wheel    1024000 Sep 29 05:06 filter.log
    -rw-------  1 root  wheel    1024000 Sep 29 04:09 gateways.log
    -rw-------  1 root  wheel      27158 Sep  9 11:31 installer.log
    -rw-------  1 root  wheel    1024000 Sep 27 20:00 ipsec.log
    -rw-------  1 root  wheel    1024000 Sep 27 20:00 l2tps.log
    -rw-------  1 root  wheel    1024000 Sep 27 20:00 lighttpd.log
    drwxr-xr-x  2 root  wheel        512 Jul 14 20:02 ntp
    -rw-------  1 root  wheel    1024000 Sep 29 04:09 ntpd.log
    -rw-------  1 root  wheel    1024000 Sep 27 20:00 openvpn.log
    -rw-------  1 root  wheel    1115288 Sep 29 05:06 pflog
    -rw-------  1 root  wheel  135200460 Sep 29 04:07 pflog.bad.004650c8
    -rw-------  1 root  wheel  109641426 Sep 29 04:07 pflog.bad.041fa52f
    -rw-------  1 root  wheel  50953858 Sep 29 04:07 pflog.bad.15941f7d
    -rw-------  1 root  wheel  109805392 Sep 29 04:07 pflog.bad.272d3f0f
    -rw-------  1 root  wheel  156199060 Sep 29 04:07 pflog.bad.29009618
    -rw-------  1 root  wheel  131857533 Sep 29 04:07 pflog.bad.42d2be83
    -rw-------  1 root  wheel  10192268 Sep 29 04:07 pflog.bad.62912fe5
    -rw-------  1 root  wheel  20506904 Sep 29 04:07 pflog.bad.63c03df9
    -rw-------  1 root  wheel  285900368 Sep 29 04:07 pflog.bad.69bf7eb2
    -rw-------  1 root  wheel  57179566 Sep 29 04:07 pflog.bad.a51a6664
    -rw-------  1 root  wheel  122453124 Sep 29 04:07 pflog.bad.aac65a97
    -rw-------  1 root  wheel  95092018 Sep 29 04:07 pflog.bad.cae7490b
    -rw-------  1 root  wheel    1024000 Sep 27 20:00 poes.log
    -rw-------  1 root  wheel    1024000 Sep 27 20:00 portalauth.log
    -rw-------  1 root  wheel    1024000 Sep 27 20:00 ppp.log
    -rw-------  1 root  wheel    1024000 Sep 27 20:00 pptps.log
    -rw-------  1 root  wheel    1024000 Sep 27 20:00 relayd.log
    -rw-------  1 root  wheel    1024000 Sep 29 03:11 resolver.log
    -rw-------  1 root  wheel    1024000 Sep 27 21:16 routing.log
    drwxr-xr-x  4 root  wheel        512 Sep 18 18:22 snort
    -rw-------  1 root  wheel      10240 Sep 10 21:04 spamd.log
    -rw-------  1 root  wheel    1024000 Sep 29 04:48 system.log
    -rw-------  1 root  wheel      16079 Sep 29 04:09 userlog
    -rw-r--r--  1 root  wheel        197 Sep 29 04:09 utx.lastlogin
    -rw-------  1 root  wheel      2233 Sep 29 04:09 utx.log
    -rw-------  1 root  wheel    1024000 Sep 27 20:00 vpn.log
    -rw-------  1 root  wheel    1024000 Sep 27 20:00 wireless.log

    You'll notice that those pflog.bad.[hex] files of absurd size were back. This time I cleared out all but one, and tried to read the file to see if I could further troubleshoot what is going on here:

    $ clog /var/log/pflog.bad.cae7490b
    V6F ll4igb03vʚ;E8

    V
    Voll4igb03vʚ;E8S@
    V+
    VY ll4igb03vʚ;E8
    V
    V?]ll4igb03vʚ;E8
    VT
    V[
    Ve
    VCg
    VXk
    V(x
    VBz
    V
    V
    V4R
    V?\

    Now I don't know what any of that means, and it was a lot more than that, this was just a snippet but it looked basically the same all the way through.

    At this point I thought it would be best to perform a factory reset and make this post asking for help/insight as to what to do next. Interestingly, after a factory reset, the pflog.bad file I read with clog above was still present.

    At this point my best guess as to what is going on is that my WAN situation is less than ideal.

    I live in a rural area and my only choices for Internet access are LTE/3G and satellite, I've opted for LTE/3G but run into issues every now and again. There is only one tower out here (12 miles away) and when upgrades/changes/maintenance/weather happen I'll drop out multiple times a day. Previously this wasn't much of an issue because my Cradlepoint would handle it. Recently though I was tired of being double NAT'd (a limitation of the cheapest Cradlepoint is no IP-Passthrough) so I bought a PocketPORT2 which can handle IPT for my USB modem. Unfortunately it's a lot more finicky with dropped connections and requires me to power off/on the device to reconnect.

    I've noticed there is some correlation in timing between when I have a few days of no drops and my pfSense memory usage staying normal and when I have a lot of drops that I'm getting these pflog.bad's overfilling memory.

    What can I do to prevent this issue from further happening, I'd rather not have to go and rm /var/log/pflog.bad* every single time my WAN drops?


  • Rebel Alliance Developer Netgate

    Looks like that's all from spamd. The firewall itself doesn't touch pflog, but spamd does. Kill it with fire.



  • That would make sense, towards the end that was the only package I had with Snort. Thanks for the help but could you please elaborate onto how you determined that? If only to help me in understanding a bit more.


  • Rebel Alliance Developer Netgate

    I knew pflog wasn't used by the base system so I grepped through the whole package repo to see what if anything touched it, and the only thing that does is spamd, and you had a spamd.log so it must have been installed there.



  • That would make sense as to why I couldn't find any information about it, thanks.


  • Banned

    Please file a bug about the spamd thing before it gets forgotten… This is insane. Found ~3 GiB worth of crap here after a couple of test installs.

    https://redmine.pfsense.org/issues/5231
    https://github.com/pfsense/pfsense-packages/pull/1086


Log in to reply