[SOLVED]check_reload_status: log is 1 hour behind

Slam · Jun 3, 2013, 1:49 PM

I am having a slight problem with my alix 2c2, while trying to figure out what causes "kernel: pid 71994 (php), uid 0, was killed: out of swap space" as reported elsewhere on the forum, I realised some scripts that run, report at least one hour behind on my syslog page.

An example of these are:

check_reload_status: Linkup starting vr0
check_reload_status: rc.newwanip starting vr0
php: : Resyncing OpenVPN instances
check_reload_status: rc.newwanip starting ovpnc1

My timezone is correct (Europe/London) and I am running i386 NanoBSD version, with the following enabled, OpenVPN, Captive Portal on ath0 card, Dyn Client, IPV6.

Is this a known issue? log follows in reverse order.

Cheers

sc_log.txt

jimp · May 31, 2013, 12:44 PM

Usually that just means that you changed the time zone but haven't rebooted since it was set.

On FreeBSD, the time zone is only read when a process starts, so to make sure that the time is correct on all running processes, a reboot is required.

Slam · May 31, 2013, 4:27 PM

Thanks for the reply.

This alix is rebooted more or less every 3 days (snapshot upgrades), so I know its not an issue with the time zone (infact the time zone was set a long time ago and never has been changed).

I was previously running 2.0.x then I decided to upgrade to 2.1 which borked the upgrade, I wrote a fresh nanoBSD install on to my cf card and imported my config.xml, this was like 6 weeks ago, no major problems since then, apart from the error with system out of swap space which I just thought might have something to do with the time differences reported on the attached log.

EDIT: I've just tried reproducing it again, if I go to >Interfaces>WAN & hit save and apply, the following shows in my syslog

May 31 17:07:30 php: : phpDynDNS (xxxxxx): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
May 31 17:07:29 php: : phpDynDNS (xxxxxx.no-ip.org): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
May 31 17:07:28 php: : Removing static route for monitor 8.8.8.8 and adding a new route through xx.xx.xx.1
May 31 17:07:28 php: : ROUTING: setting default route to xx.xx.xx.1
May 31 17:07:28 php: : ROUTING: setting IPv6 default route to 2001:xxx:xxxx:xxx::1
May 31 16:07:26 check_reload_status: Reloading filter
May 31 17:07:26 php: /interfaces.php: Creating rrd update script
May 31 17:07:25 php: /interfaces.php: Removing static route for monitor 8.8.8.8 and adding a new route through xx.xx.xx.1
May 31 17:07:25 php: : rc.newwanip: on (IP address: xx.xx.xx.108) (interface: wan) (real interface: vr0).
May 31 17:07:25 php: : rc.newwanip: Informational is starting vr0.
May 31 16:07:23 check_reload_status: updating dyndns wan
May 31 17:07:21 dhcpleases: Could not deliver signal HUP to process because its pidfile does not exist, No such process.
May 31 17:07:18 php: /interfaces.php: ROUTING: setting default route to xx.xx.xx.1
May 31 16:07:17 check_reload_status: rc.newwanip starting vr0
May 31 17:07:17 php: /interfaces.php: Clearing states to old gateway xx.xx.xx.1.

I doubt my imported config.xml has brought over with it this problem so Im a little baffled and just short of doing another full install.

P.S I dont get this problem on my ESXi vm ova (amd64) template from the snapshot server.

phil.davis · May 31, 2013, 5:39 PM

I thought this might be a quirk of British Summer Time. I set my timezone to Europe/London on an Alix 2D13. Sure enough, as Jim says, check_reload_status kept reporting based on the old time zone. Other newer processes reported based on the new time zone. I rebooted and everything is reporting the same time:

May 31 18:23:18 	check_reload_status: Syncing firewall
May 31 18:23:22 	php: /interfaces.php: ROUTING: setting default route to 192.168.12.1
May 31 18:23:26 	dhcpleases: Could not deliver signal HUP to process because its pidfile does not exist, No such process.
May 31 18:23:28 	php: /interfaces.php: The command '/usr/local/sbin/radvd -p /var/run/radvd.pid -C /var/etc/radvd.conf -m syslog' returned exit code '1', the output was ''
May 31 18:23:28 	check_reload_status: updating dyndns wan
May 31 18:23:30 	php: /interfaces.php: Removing static route for monitor 8.8.8.8 and adding a new route through 192.168.12.1
May 31 18:23:30 	php: /interfaces.php: Removing static route for monitor 8.8.4.4 and adding a new route through 192.168.44.1
May 31 18:23:31 	php: /interfaces.php: Creating rrd update script
May 31 18:23:31 	check_reload_status: Reloading filter

So the problem is not a general one.
system_timezone_configure(); is called during rc.bootup - I suspect that any process from before that will be running thinking the timezone is UTC (and British Summer Time is UTC+1:00). Is there some way that your boot sequence has something special/interesting, and the check_reload_status process is starting before system_timezone_configure() happens?

Slam · Jun 1, 2013, 8:57 PM

Your post gave me an idea and it seems my settings dont survive a firmware upgrade.

I set the timezone to Etc/Greenwich, rebooted, logs show timestamp an hour behind but all consistent.

I set the timezone to Europe/London, rebooted, logs show correct timestamp and all consistent.

Firmware upgrade puts me to where i started with log times inconsistent.

I dont have anything special about my setup, just cp enabled, ipv6, dyn update and openvpn.

Slam · Jun 3, 2013, 1:49 PM

Seems to have sorted itself out, automagically.

Slam · Jun 5, 2013, 2:57 PM

Just an update if anyone ever comes across this issue.

What I've discovered is that once a firmware upgrade is performed and the subsequent system is restarted, my TimeZone is not picked up on this first boot up, however without going through hurdles of changing TimeZone as I have previously done in the above posts, just a simple reboot seems to pick up the TimeZone settings correctly (as set in config.xml).

2.1-RC0 (i386) nanobsd (4g)

Slam · Jun 12, 2013, 4:43 PM

Just a bump to say that I still have the problem with time diff of an hour, only remedy is to reboot again after a firmware upgrade.

What I have noticed now though is some problems are solved that I had experienced before. (maybe due to the fact I didnt reboot after firmware upgrade?)

Dashboard now shows "NanoBSD boot slice…(ro)" where as before it was constantly (rw) even if I set it to (ro)
I finally get my historic RRD graphs back, so now they survive a firmware upgrade.
3)....

phil.davis · Jun 17, 2013, 8:16 AM

Dashboard now shows "NanoBSD boot slice…(ro)" where as before it was constantly (rw) even if I set it to (ro)

If it is left rw after the boot is finished, then that means something has died during the boot (killed: out of swap space = out of real memory). The process that died would have had the disk mounted rw to write its conf files etc, but because it died, it never got to put the filesystem back to ro.
If this happens to multiple processes during the boot, then the internal "mount count" of the filesystem is more than 1. You will see this on the page where you can manually put the file system back to RO. Each time you press the button, the mount count will go down by 1, and eventually the filesystem will revert to RO.
I have this issue if I have a lot of OpenVPN servers/clients on an Alix. Depending on the extra packages you have and so on, the boot process can be doing too much at once and run out of real memory. Once my systems come up, they run fine at around 45-55% memory used.

Slam · Jul 2, 2013, 1:57 PM

Thanks for the reply Phil and sorry for the delay.

I have had a look at this problem recently especially as its got worse for me the past few snapshots.
Additionally to the "out of swap space" which happens much more often now, I am now getting errors related to the captive portal and in some cases the cp clients are getting a 503 error.
Unfortunetly I dont have any logs at hand, the recently missing kernel in nanoBSD forced me to go out to the remote location and plug in a new cf and as always I just import a backed up config.xml.

The recent errors I am faced with for captive portal is that whenever cp prune function runs, I get "captive portal already running" error, it will continue to spew out this message until it crashes (explains the 503 clients are seeing), if I am fast enough to catch the errors before a crash, it seems option 11 (restart webconfigurator) from console temporarily fixes it, until a next process is run then back to square one.

From what I can remember of the logs and what you've wrote here it does seem /tmp runs out of space and is a cause of some of the problems I've come across on this post.

I dont run any packages what so ever, but what I did try was switching off non critical services and trying and test, these were - SNMP - NTP -Dyn DNS. I however needed things like OVPN and CP running so I couldnt switch them off to test.

Finally last night I switched off RRD graph, I rebooted the alix and found the memory usage according to the dashboard was at 56%, compared with 85%> with RRD enabled, however this morning I noticed that the dashboard again reports 80%> and I dont have RRD's enabled.

What I have noticed on first boot with nanoBSD when I perform a snapshot upgrade (personal experience - might not apply to everyone)

50% chance your RRD graphs wont survive the upgrade (reported elsewhere on the forum)
Logs show inconsistant timestamps - need a reboot and its solved (so far just me :0) )
Out of swap space (many many reports and a few possible causes - mainly not enough ram on alix's)
CP prune process causes errors as reported on syslog "cp already running", eventually cp dies (as seen on syslog) and CP clients are greeted with a 503 error
….

Now I am no coder or anything like that, just full time carer and part time wannabe geek :0) .....I could be wrong but is it possible the nanoBSD's are suffering a memory leak?

My alternative to these problems would be just doing a full install on a microdrive, plugging it in the alix and I'm sure these problems will disappear.

phil.davis · Jul 2, 2013, 3:25 PM

I don't use Captive Portal, so I can't comment on the memory use or leaks there.
Certainly, when OpenVPN servers and clients are establishing a connection they seem to use a lot of memory transiently (10-20 seconds?). Then the memory use drops back to "normal". With 2 OpenVPN instances I usually have about 45%-55% memory used.
If I turn on anything that produces output to the memory disks (/tmp and /var) then memory use goes up - the memory for the memory disk space is only allocated when files are written there. Most of the log files are fixed-size circular logs, so they don't eat up any more space over time.

Now I am no coder or anything like that, just full time carer and part time wannabe geek :0) …..I could be wrong but is it possible the nanoBSD's are suffering a memory leak?

I did troll around looking for this a while ago. I only found one small place that got patched some months ago. At the moment I seem to upgrade often enough that I don't have Alix systems up long enough (weeks to months) to really get a long-term trend for memory leaking slowly over time.

Slam · Jul 5, 2013, 2:59 AM

Well the logs got really bad for me (lots & lots of out of swap space + cp prunning & cp already running errors) so I decided to switch off all services and start them up again manually with the exception of ovpn as this alix is a remote location so I didnt want to risk losing connection.

I restarted:

ntp
dhcp
dns forwarder
snmp
cp

also:
restart ipv4 wan interface
restart ipv6 wan interface (not really necessary but done anyway)
dynamic dns (not really necessary but done anyway, picks up on wan* restarts)

Its been almost 24 hours and the logs are nice and clean of any errors (normally this would be swamped with errors by now), also have reenabled rrd graphs again, memory has settled at 76% with above services running, all nice for now.

I think I'll leave this alix as it is for now and wait for another rc candidate and stop pressing on the "you know you want to click me" dashboard update widget :0)

Slam · Jul 16, 2013, 1:11 PM

Managed to solve this in the end, after I remembered similar weirdness on my main firewall (full install) which I upgraded several weeks before.

I removed the following tags from my config

<dns1gwint>wan</dns1gwint>
<dns2gwint>wan</dns2gwint>
<dns3gwint>none</dns3gwint>
<dns4gwint>none</dns4gwint>

and kept what was already there

<dns1gw>WAN_IPV4GW</dns1gw>
<dns2gw>WAN_IPV4GW</dns2gw>
<dns3gw>WAN_IPV6GW</dns3gw>
<dns4gw>WAN_IPV6GW</dns4gw>

I performed a config restore and apart from an initial "out of swap space", everything has been back to normal now for a few days.

Im not sure if those tags I removed were migrated over from my 2.0.3 > 2.1 upgrade but it seems to sorted out a lot of weirdness I was experiencing, just thought I'd post in case someone was having similar issues.

Cheers