RRD-Data doesn't survive a restart / shutdown
-
Anybody?
Is anyone here experiencing the same problem at all or does this work for everyone else?
-
It should survive across a reboot.
Which install type are you running?
How are you rebooting the machine?Steve
-
Thank you Steve for helping!
I run an Alix 2d13 from a flash card (4GB). I did a shutdown these days (because I had to replace something with the power supply). If packages should be relevant: I have freeradius2 and pfBlocker (however disabled).
I used Diagnostics / Halt System to shutdown. I disconnected the box from the power supply after I got no more replies to pinging the box and waiting some more.
Even when an unexpected power outage should occur I would have expected that the RRD data would at least be kept until the latest backup, so at a maximum 24 hours of data should get lost.
I had this problem already before only now I looked exactly what happened.
I'm not sure whether this makes a difference ( I think it shouldn't):
-
Before shutdown I duplicated slices. (The correct way round. I can be sure because I made some changes to the configuration before.) I did not switch boot slices.
-
Before shutdown I also backuped the configuration excluding the RRD data.
-
-
Don't know if it matters:
I have Diagnostics -> nanoBSD
"Keep media mounted read/write at all times. " checked…
(as I run Snort on the CF card, too...)
-
At the moment it says "Read/Write (reference count 4)". I don't know how and when this changes and how to find these references. I would have assumed that the RRD data backup would switch to read/write on its own.
But who knows, something is odd anyway: I have checked "Keep media mounted read/write at all times." now. If that makes a difference I should know, when I do a reboot after 24 hours. I will let you know.
-
I have checked the logs for obvious problems with RRD but didn't find any direct hint. However I had a thought: From time to time I have log entries like this:
kernel: pid 73592 (php), uid 0, was killed: out of swap space
My memory utilization is at nearly 80%. Any process needing large amounts of memory could possibly fail.
Perhaps this happens to a dedicated backup script regularly? Is there a dedicated backup script? (I found this running: /var/db/rrd/updaterrd.sh.) If so how do I find out which this is and when it will be triggered?
-
It should backup the RRD data both periodically and when you shutdown correctly, as you are doing. The backup and the config file are help on the /conf/ slice which is independent of the two boot slices in Nano. I suspect that you are running out of RAM when it tries to back up. Check the date on file /conf/rrd.tgz and /conf/backup/rrd.tgz.
Steve
-
Now that's interesting. There is a file /conf/rrd.tgz with a size of approx. 2MBytes. Timestamp is today 0:00.
(Unfortunately I destroyed it now: The "Load" button on the Edit File Dialog does NOT load the file to my local computer … shame on me. ::).) Btw. How do I transfer a file from the pfSense to my machine? scp didn't work?
But: no damage done. I can view the current graphs with data back until last reboot now. This data will probably be contained in the next backup. If not that's no problem either.
Is the current data persisted somewhere also or is this kept in memory only?
Result: The rrd backup facility seems to work. I can verify this when I get the next backup (tomorrow) and manage to download and examine the file.
This means there probably is a problem with using the backuped data. How can this happen?
-
You can use the File Manager package. Once installed, look for it under Diagnostics - File Manager.
-
From Windows WinSCP works though it maynot be using SCP because from Linux use sftp://root@yourpfsensebox.
Steve
-
Oh well, sftp works if using root (not admin …)
The backup files seem to contain data back about 3 days, not longer.
That means: There is a working backup procedure. These possibilities come to mind now: On a reboot the backup might get destroyed. Or it may be overwritten with the next backup after a reboot. Or when the data gets restored from the backup the first time the data cannot be read and is dumped.
-
If there's a fault with the file the system logs should be full of errors at boot time. If you have it set to backup periodically (at 24hr periods) the files should be at most 1 day old I would have thought.
Steve
-
I did a reboot just a few minutes ago to observe the log messages during boot. After the reboot the RRD graphs are as complete as they were before. No data got lost. Needless to say there were no apparent errors in the log.
That's fine but I know there was a problem last time I did this. Now I don't even know how to analyze this. :-(
Thank you for helping so far. I give up on this now.
-
Without actually having logs for when this happened it's hard to say but I would still guess your Alix ran out of ram.
Do you have the RRD memory usage graph for when it happened?Steve
-
I only skimmed this thread but this issue happens nearly every time I reboot/poweroff my alix board. If you hook up a serial cable during the bootup, it'll say something about RRD graphs…(killed). If that happens, then it's all gone. The LEDs at this point will likely be in the knight rider mode.
The way I avoid it is to disable all OpenVPN instances prior to a reboot and not install any packages at all, especially the OpenVPN client export. This allows enough free memory during startup to avoid the RRD from getting killed.
-
How many interfaces do you have? It's easy to run the ALIX out of RAM, especially if you have more than a handful of interfaces, which can also fill up the tiny RAM disk on ALIX.
-
The fact that Flo is running Snort pretty much ensures low RAM. ;)
Steve
-
Actually I'm not running Snort! (I would have been interested but I discovered before installation that this would require more potent hardware.) I have pfBlocker installed (but not activated) an I use freeradius2. I can uninstall pfBlocker.
Regarding the number of interfaces: Currently I have 7 interfaces, of which I could delete 1 or 2. (One interface is not used anymore. A MODEM interface on the WAN side collects incoming traffic that is generated by the modem and is therefor not VLAN tagged by my provider. I want to block this traffic in the firewall silently. Apart from that this MODEM interface is not really necessary.)
Thank's to darkcrucible for his information! Until now I had no clue as to where and when this problem actually occurs. This seems clear now: Lack of memory kills the process which is responsible for restoring the backuped data during boot.
I will now reduce the number of interfaces and remove pfBlocker. As I do not use OpenVPN there is nothing to gain here. I hope this will help otherwise I will probably have to live with the problem.
Btw.:
The LEDs at this point will likely be in the knight rider mode.
I noticed that "knight rider mode" but I don't know when this occurs and what it should tell me. What is this all about?
-
Actually I'm not running Snort!
Ah, my bad. I attributed Chemlud's comment about Snort to you. :-[
My understanding is that when the RRD data is backed up all the files are gathered in one place before being compressed into the tgz file you see in /conf. At some point in that process all the files are held in RAM with only 256MB to play with it's easy to run out.
If you've upgraded from an earlier version you may find that you have some RRD files for interfaces which have changed names that are no longer in use but are still taking up space/being backed up. For example I have a file for 'WAN-quality' and 'GW_WAN-quality' but they are no longer updated the actual file is now 'WAN_PPPOE-quality'. I also have files for gateways I used to have, on my modem interface, which you may have. You can access to modem without having a gateway on the interface.I did once [url=https://forum.pfsense.org/index.php?topic=75075.msg409844#msg409844]suggest the possibility of moving either /var or /tmp onto a USB drive to free up RAM in low RAM systems. As JimP said at the time it would never be supported and could cause problems if/when the drive eventually failed. Maybe something to consider if all else fails.
Steve
-
RRD data about deprecated interfaces is not a problem as far as disk space is concerned. My CF card has plenty of space.
Are you sure that all files must be held in memory simultaneously when the tgz file is created? Aren't there pipes for handling such stuff more memory efficiently? I will check now from time to time whether the backups are successful. According to darkcrucible's comment the problem is that the data is destroyed at boot time.
Using a usb drive sounds dangerous to me. This would be ok only if the system can boot even if the usb drive fails. I have no clue as to how I would configure this anyway.
About the modem interface: This is somewhat complicated. I have two interfaces WAN and WAN_IPTV which have VLAN tagged traffic from my provider. My modem assigns itself an RFC 1918 IP address and unfortunately generates plenty of UPNP service discovery requests. I want to block this silently because otherwise this spams my firewall log. But I want to keep the logging of the default block rule. These packets arrive neither on the WAN nor the WAN_IPTV interface. So any firewall rules there do not work. This does not even work with floating firewall rules. Nonetheless in the firewall log the block get attributed to the WAN interface's default block rule. So to be able to write a firewall rule for the modem at all I have to create an interface for it first. ::)