Nano issue - out of swap file?

bennyc

Hi all,

Have had some issues yesterday after a reboot, and fail to get a clue why.
Since yesterday, during startup I get errors in the system log, and my openvpn servers won't load anymore.

part from the log:

May 2 19:59:31 php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP.
May 2 19:59:31 lighttpd[33063]: (mod_fastcgi.c.3346) response not received, request sent: 900 on socket: unix:/tmp/php-fastcgi.socket-1 for /diag_logs.php?, closing connection
May 2 19:59:31 lighttpd[33063]: (mod_fastcgi.c.2562) unexpected end-of-file (perhaps the fastcgi process died): pid: 33406 socket: unix:/tmp/php-fastcgi.socket-1
May 2 19:59:31 lighttpd[33063]: (mod_fastcgi.c.3346) response not received, request sent: 894 on socket: unix:/tmp/php-fastcgi.socket-0 for /index.php?, closing connection
May 2 19:59:31 lighttpd[33063]: (mod_fastcgi.c.2562) unexpected end-of-file (perhaps the fastcgi process died): pid: 33214 socket: unix:/tmp/php-fastcgi.socket-0
May 2 19:59:31 kernel: pid 87407 (php), uid 0, was killed: out of swap space
May 2 19:59:31 kernel: pid 86467 (php), uid 0, was killed: out of swap space
May 2 19:59:26 kernel: pid 51894 (php), uid 0, was killed: out of swap space
May 2 19:59:26 php: rc.openvpn: OpenVPN: Resync server1 blabla_Mobile_access
May 2 19:59:26 php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP.
May 2 19:59:24 kernel: pid 53782 (php), uid 0, was killed: out of swap space
May 2 19:59:23 kernel: pid 54180 (php), uid 0, was killed: out of swap space
May 2 19:59:22 login: login on console as root
May 2 19:59:20 kernel: pid 81802 (php), uid 0, was killed: out of swap space
May 2 19:59:20 lighttpd[33063]: (mod_fastcgi.c.3346) response not received, request sent: 869 on socket: unix:/tmp/php-fastcgi.socket-0 for /getstats.php?, closing connection
May 2 19:59:20 lighttpd[33063]: (mod_fastcgi.c.2562) unexpected end-of-file (perhaps the fastcgi process died): pid: 33214 socket: unix:/tmp/php-fastcgi.socket-0
May 2 19:59:19 kernel: pid 40133 (php), uid 0, was killed: out of swap space
May 2 19:59:19 check_reload_status: rc.newwanip starting ovpns1
May 2 19:59:18 lighttpd[33063]: (mod_fastcgi.c.3346) response not received, request sent: 906 on socket: unix:/tmp/php-fastcgi.socket-1 for /diag_logs.php?, closing connection
May 2 19:59:18 kernel: pid 41176 (php), uid 0, was killed: out of swap space
May 2 19:59:18 lighttpd[33063]: (mod_fastcgi.c.2562) unexpected end-of-file (perhaps the fastcgi process died): pid: 33406 socket: unix:/tmp/php-fastcgi.socket-1
May 2 19:59:15 kernel: pid 291 (php), uid 0, was killed: out of swap space
May 2 19:59:14 kernel: ovpns1: link state changed to UP
May 2 19:59:13 kernel: pid 24661 (php), uid 0, was killed: out of swap space
May 2 19:59:10 check_reload_status: Reloading filter

Out of swap space? Why?

The errors are consistent, reboot does not cure it. And during previous boots, I had other errors like this one:

The command '/sbin/mount -u -r -f -o sync,noatime /cf' returned exit code '1', the output was 'mount: not currently mounted /cf'

In an attempt to narrow it down, I removed all packages so it is as clean as possible, but no avail.
Next step might be return to factory defaults (anyone did this before? is this just clearing of the config.xml, or true restoring of the nano?), or reinstall?

Googl'ing shows a varia of post (also here on the forum), none to point me to a clear indication what the cause might be.
Because these issues came instantly and this one has been working for quite some time, I'm thinking there is one cause, resulting in multiple errors. Wondering if it's a SW or HW issue?

System is a 2.1.2 running on ALIX 2D13. Open for suggestions here, all feedback is appreciated….

--edit subject, better description--

stephenw10

The basic cause is that the Alix only has 256MB of RAM.
Your box is running out of free memory and tries to use swap to compensate but the Nano images do not have a swap partition so it reports 'out of swap'.

What packages are you running? How many interfaces do you have assigned including VPN, VLANs etc?

Did this start after attempting the 2.1.3 upgrade?

Steve

bennyc

Hi Steve… tnx for jumping on topic :)

That's the odd thing, it just appeared after a reboot?
Nothing special here.... just a home FW (but a good one ;)) with dyndns and 2 openvpn server instances.
I had running autoconfig backup, Client VPN export package, mailreport as packages, but none of them was hungry for memory if I remember correctly. (never saw memory usage beyond 50-60% after startup)
And, the log as shown in my first post, was with all packages removed... so this would indicate that nanobsd consumed > 256MB during startup? Really wondering what has happened...

I've put the APU in place now, but still would like to know what is wrong with the ALIX?

Some things I can try:

see if I can do a memory test in bios? (seen that in the APU, but can't find it directly in bios ALIX)
upgrade it's bios (it's on 0.99h now)
reinstall pfSense from scratch...
For the last 2 options I need a CF adapter, and seems I left mine at work (so that'll have to wait until Monday)

Do you happen to know how to check where the memory is going during startup?

stephenw10

Not exactly.
vmstat -m
ps -aux
top -d1

None of those list the two ramdisks used by /var and /tmp.

Steve

bennyc

So… this is getting interesting. Tried upgrading to 2.1.3, twice.

usbus1: Controller shutdown complete
Rebooting…
PC Engines ALIX.2 v0.99h
640 KB Base Memory
261120 KB Extended Memory

01F0 Master 044A CF 4GB
Phys C/H/S 7785/16/63 Log C/H/S 973/128/63

1 pfSense
2 pfSense

F6 PXE
Boot: 2
/boot/config: -h
Consoles: serial port
BIOS drive C: is disk0
BIOS 640kB/261120kB available memory

FreeBSD/x86 bootstrap loader, Revision 1.1
(root@pf2_1_1_i386.pfsense.org, Thu May 1 16:08:54 EDT 2014)
Loading /boot/defaults/loader.conf
/boot/kernel/kernel data=0x91c91c data=0x51dad4+0x9e0c4 syms=[0x4+0x9b090+0x4+0xd5cdf]

Hit [Enter] to boot immediately, or any other key for command prompt.
Booting [/boot/kernel/kernel]…

... some output omited, otherwise this gets quite long...

ad0: 3831MB <cf 20110221="" 4gb="">at ata0-master PIO4
Root mount waiting for: usbus1 usbus0
uhub0: 4 ports with 4 removable, self powered
Root mount waiting for: usbus1
uhub1: 4 ports with 4 removable, self powered
Trying to mount root from ufs:/dev/ufs/pfsense1
Configuring crash dumps...
Mounting filesystems...
mount: not currently mounted /cf
umount: /cf: not a file system root directory
Can't stat /dev/ufs/cf: No such file or directory
Can't stat /dev/ufs/cf: No such file or directory
mount: /dev/ufs/cf : No such file or directory
grep: /cf/conf/config.xml: No such file or directory
grep: /cf/conf/config.xml: No such file or directory
grep: /cf/conf/config.xml: No such file or directory
Setting up memory disks... done.
Disabling APM on /dev/ad0

___
/ f
/ p _/ Sense
_/
__/

Welcome to pfSense 2.1.3-RELEASE ...

Creating symlinks....grep: /cf/conf/config.xml: No such file or directory
grep: /cf/conf/config.xml: No such file or directory
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
[: : bad number
..done.

Under 512 megabytes of ram detected. Not enabling APC.
ls: *.xml: No such file or directory
Config.xml is corrupted and is 0 bytes. Could not restore a previous backup.Launching the init system… done.
Initializing..................ls: *.xml: No such file or directory
Config.xml is corrupted and is 0 bytes. Could not restore a previous backup.Starting CRON... done.
ls: *.xml: No such file or directory
Config.xml is corrupted and is 0 bytes. Could not restore a previous backup.Bootup complete
grep: /conf/config.xml: No such file or directory
[: -gt: unexpected operator

FreeBSD/i386 (Amnesiac) (console)

ls: *.xml: No such file or directory
Config.xml is corrupted and is 0 bytes. Could not restore a previous backup.

Logout (SSH only)

Assign Interfaces

...

Enter an option:
[/quote]

and then… I'm unable to set an Assign Interface, can't Set IP address, can't reboot, ...

But when I pull the plug, I'm able to boot it on slice 1. It does however also give this each time "Disk is dirty. Running fsck -y":

Trying to mount root from ufs:/dev/ufs/pfsense0
Configuring crash dumps…
Mounting filesystems...
mount: not currently mounted /cf
umount: /cf: not a file system root directory
Can't stat /dev/ufs/cf: No such file or directory
Can't stat /dev/ufs/cf: No such file or directory
mount: /dev/ufs/cf : No such file or directory
Setting up memory disks... done.
Disabling APM on /dev/ad0

___
/ f
/ p _/ Sense
_/
__/

Welcome to pfSense 2.1.2-RELEASE ...

Creating symlinks......done.

Under 512 megabytes of ram detected. Not enabling APC.
External config loader 1.0 is now starting...
Launching the init system... done.
Initializing............................. done.
Disk is dirty. Running fsck -y
Starting device manager (devd)...done.
Loading configuration......done.
Updating configuration...done.
Cleaning backup cache........done.
Setting up extended sysctls...done.
...

So… unless I missed something, I start to think my CF might be bad...
However, a manual fsck -y doest reveal much. Are there other ways to check the CF?</cf>

stephenw10

Hmm, doesn't look good. :(
Possibly related to this thread: https://forum.pfsense.org/index.php?topic=75069.0

Steve

bennyc

in update:
I did the Alix firmware upgrade. And I did fresh 2.1.3 install on a new 4G CF card. For now it looks like all is fine again :)
I was thinking to do some extensive testing on the old CF, but lacked motivation (After all, all it took was some time + price of the new CF. Though I really hate problems without hard-pointed cause… >:( I guess I'm going to blame the CF anyway and get over it ::))