Nano issue - out of swap file?



  • Hi all,

    Have had some issues yesterday after a reboot, and fail to get a clue why.
    Since yesterday, during startup I get errors in the system log, and my openvpn servers won't load anymore.

    part from the log:

    May 2 19:59:31 php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP.
    May 2 19:59:31 lighttpd[33063]: (mod_fastcgi.c.3346) response not received, request sent: 900 on socket: unix:/tmp/php-fastcgi.socket-1 for /diag_logs.php?, closing connection
    May 2 19:59:31 lighttpd[33063]: (mod_fastcgi.c.2562) unexpected end-of-file (perhaps the fastcgi process died): pid: 33406 socket: unix:/tmp/php-fastcgi.socket-1
    May 2 19:59:31 lighttpd[33063]: (mod_fastcgi.c.3346) response not received, request sent: 894 on socket: unix:/tmp/php-fastcgi.socket-0 for /index.php?, closing connection
    May 2 19:59:31 lighttpd[33063]: (mod_fastcgi.c.2562) unexpected end-of-file (perhaps the fastcgi process died): pid: 33214 socket: unix:/tmp/php-fastcgi.socket-0
    May 2 19:59:31 kernel: pid 87407 (php), uid 0, was killed: out of swap space
    May 2 19:59:31 kernel: pid 86467 (php), uid 0, was killed: out of swap space
    May 2 19:59:26 kernel: pid 51894 (php), uid 0, was killed: out of swap space
    May 2 19:59:26 php: rc.openvpn: OpenVPN: Resync server1 blabla_Mobile_access
    May 2 19:59:26 php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP.
    May 2 19:59:24 kernel: pid 53782 (php), uid 0, was killed: out of swap space
    May 2 19:59:23 kernel: pid 54180 (php), uid 0, was killed: out of swap space
    May 2 19:59:22 login: login on console as root
    May 2 19:59:20 kernel: pid 81802 (php), uid 0, was killed: out of swap space
    May 2 19:59:20 lighttpd[33063]: (mod_fastcgi.c.3346) response not received, request sent: 869 on socket: unix:/tmp/php-fastcgi.socket-0 for /getstats.php?, closing connection
    May 2 19:59:20 lighttpd[33063]: (mod_fastcgi.c.2562) unexpected end-of-file (perhaps the fastcgi process died): pid: 33214 socket: unix:/tmp/php-fastcgi.socket-0
    May 2 19:59:19 kernel: pid 40133 (php), uid 0, was killed: out of swap space
    May 2 19:59:19 check_reload_status: rc.newwanip starting ovpns1
    May 2 19:59:18 lighttpd[33063]: (mod_fastcgi.c.3346) response not received, request sent: 906 on socket: unix:/tmp/php-fastcgi.socket-1 for /diag_logs.php?, closing connection
    May 2 19:59:18 kernel: pid 41176 (php), uid 0, was killed: out of swap space
    May 2 19:59:18 lighttpd[33063]: (mod_fastcgi.c.2562) unexpected end-of-file (perhaps the fastcgi process died): pid: 33406 socket: unix:/tmp/php-fastcgi.socket-1
    May 2 19:59:15 kernel: pid 291 (php), uid 0, was killed: out of swap space
    May 2 19:59:14 kernel: ovpns1: link state changed to UP
    May 2 19:59:13 kernel: pid 24661 (php), uid 0, was killed: out of swap space
    May 2 19:59:10 check_reload_status: Reloading filter

    Out of swap space? Why?

    The errors are consistent, reboot does not cure it. And during previous boots, I had other errors like this one:

    The command '/sbin/mount -u -r -f -o sync,noatime /cf' returned exit code '1', the output was 'mount: not currently mounted /cf'

    In an attempt to narrow it down, I removed all packages so it is as clean as possible, but no avail.
    Next step might be return to factory defaults (anyone did this before? is this just clearing of the config.xml, or true restoring of the nano?), or reinstall?

    Googl'ing shows a varia of post (also here on the forum), none to point me to a clear indication what the cause might be.
    Because these issues came instantly and this one has been working for quite some time, I'm thinking there is one cause, resulting in multiple errors. Wondering if it's a SW or HW issue?

    System is a 2.1.2 running on ALIX 2D13. Open for suggestions here, all feedback is appreciated….

    --edit subject, better description--


  • Netgate Administrator

    The basic cause is that the Alix only has 256MB of RAM.
    Your box is running out of free memory and tries to use swap to compensate but the Nano images do not have a swap partition so it reports 'out of swap'.

    What packages are you running? How many interfaces do you have assigned including VPN, VLANs etc?

    Did this start after attempting the 2.1.3 upgrade?

    Steve



  • Hi Steve… tnx for jumping on topic  :)

    That's the odd thing, it just appeared after a reboot?
    Nothing special here.... just a home FW (but a good one ;)) with dyndns and 2 openvpn server instances.
    I had running autoconfig backup, Client VPN export package, mailreport as packages, but none of them was hungry for memory if I remember correctly. (never saw memory usage beyond  50-60% after startup)
    And, the log as shown in my first post, was with all packages removed... so this would indicate that nanobsd consumed > 256MB during startup? Really wondering what has happened...

    I've put the APU in place now, but still would like to know what is wrong with the ALIX?

    Some things I can try:

    • see if I can do a memory test in bios? (seen that in the APU, but can't find it directly in bios ALIX)
    • upgrade it's bios (it's on 0.99h now)
    • reinstall pfSense from scratch...
      For the last 2 options I need a CF adapter, and seems I left mine at work (so that'll have to wait until Monday)

    Do you happen to know how to check where the memory is going during startup?


  • Netgate Administrator

    Not exactly.
    vmstat -m
    ps -aux
    top -d1

    None of those list the two ramdisks used by /var and /tmp.

    Steve



  • So… this is getting interesting. Tried upgrading to 2.1.3, twice.

    usbus1: Controller shutdown complete
    Rebooting…
    PC Engines ALIX.2 v0.99h
    640 KB Base Memory
    261120 KB Extended Memory

    01F0 Master 044A CF 4GB
    Phys C/H/S 7785/16/63 Log C/H/S 973/128/63

    1  pfSense
    2  pfSense

    F6 PXE
    Boot:  2
    /boot/config: -h
    Consoles: serial port
    BIOS drive C: is disk0
    BIOS 640kB/261120kB available memory

    FreeBSD/x86 bootstrap loader, Revision 1.1
    (root@pf2_1_1_i386.pfsense.org, Thu May  1 16:08:54 EDT 2014)
    Loading /boot/defaults/loader.conf
    /boot/kernel/kernel data=0x91c91c data=0x51dad4+0x9e0c4 syms=[0x4+0x9b090+0x4+0xd5cdf]

    Hit [Enter] to boot immediately, or any other key for command prompt.
    Booting [/boot/kernel/kernel]…

    ... some output omited, otherwise this gets quite long...

    ad0: 3831MB <cf 20110221="" 4gb="">at ata0-master PIO4
    Root mount waiting for: usbus1 usbus0
    uhub0: 4 ports with 4 removable, self powered
    Root mount waiting for: usbus1
    uhub1: 4 ports with 4 removable, self powered
    Trying to mount root from ufs:/dev/ufs/pfsense1
    Configuring crash dumps...
    Mounting filesystems...
    mount: not currently mounted /cf
    umount: /cf: not a file system root directory
    Can't stat /dev/ufs/cf: No such file or directory
    Can't stat /dev/ufs/cf: No such file or directory
    mount: /dev/ufs/cf : No such file or directory
    grep: /cf/conf/config.xml: No such file or directory
    grep: /cf/conf/config.xml: No such file or directory
    grep: /cf/conf/config.xml: No such file or directory
    Setting up memory disks... done.
    Disabling APM on /dev/ad0

    ___
    / f
    / p _
    / Sense
    _

        _
    _/

    Welcome to pfSense 2.1.3-RELEASE  ...

    Creating symlinks....grep: /cf/conf/config.xml: No such file or directory
    grep: /cf/conf/config.xml: No such file or directory
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    [: : bad number
    ..done.

    Under 512 megabytes of ram detected.  Not enabling APC.
    ls: *.xml: No such file or directory
    Config.xml is corrupted and is 0 bytes.  Could not restore a previous backup.Launching the init system… done.
    Initializing..................ls: *.xml: No such file or directory
    Config.xml is corrupted and is 0 bytes.  Could not restore a previous backup.Starting CRON... done.
    ls: *.xml: No such file or directory
    Config.xml is corrupted and is 0 bytes.  Could not restore a previous backup.Bootup complete
    grep: /conf/config.xml: No such file or directory
    [: -gt: unexpected operator

    FreeBSD/i386 (Amnesiac) (console)

    ls: *.xml: No such file or directory
    Config.xml is corrupted and is 0 bytes.  Could not restore a previous backup.

    1. Logout (SSH only)
    2. Assign Interfaces

    ...

    Enter an option:
    [/quote]

    and then… I'm unable to set an Assign Interface, can't Set IP address, can't reboot, ...

    But when I pull the plug, I'm able to boot it on slice 1. It does however also give this each time "Disk is dirty.  Running fsck -y":

    Trying to mount root from ufs:/dev/ufs/pfsense0
    Configuring crash dumps…
    Mounting filesystems...
    mount: not currently mounted /cf
    umount: /cf: not a file system root directory
    Can't stat /dev/ufs/cf: No such file or directory
    Can't stat /dev/ufs/cf: No such file or directory
    mount: /dev/ufs/cf : No such file or directory
    Setting up memory disks... done.
    Disabling APM on /dev/ad0

    ___
    / f
    / p _
    / Sense
    _

        _
    _/

    Welcome to pfSense 2.1.2-RELEASE  ...

    Creating symlinks......done.

    Under 512 megabytes of ram detected.  Not enabling APC.
    External config loader 1.0 is now starting...
    Launching the init system... done.
    Initializing............................. done.
    Disk is dirty.  Running fsck -y
    Starting device manager (devd)...done.
    Loading configuration......done.
    Updating configuration...done.
    Cleaning backup cache........done.
    Setting up extended sysctls...done.
    ...

    So… unless I missed something, I start to think my CF might be bad...
    However, a manual fsck -y doest reveal much. Are there other ways to check the CF?</cf>


  • Netgate Administrator

    Hmm, doesn't look good.  :(
    Possibly related to this thread: https://forum.pfsense.org/index.php?topic=75069.0

    Steve



  • in update:
    I did the Alix firmware upgrade. And I did fresh 2.1.3 install on a new 4G CF card. For now it looks like all is fine again  :)
    I was thinking to do some extensive testing on the old CF, but lacked motivation (After all, all it took was some time + price of the new CF. Though I really hate problems without hard-pointed cause… >:( I guess I'm going to blame the CF anyway and get over it ::))