No space left on device - After upgrade from 2-Beta4 to 2-RC1

caleban

Server with 500GB HD
i386

I upgraded pfSense from 2-Beta4 to 2-RC1 this weekend. This morning the pfSense web interface shows these errors:

Warning: fopen(/tmp/config.lock): failed to open stream: Device not configured in /etc/inc/util.inc on line 123 Warning: flock() expects parameter 1 to be resource, null given in /etc/inc/util.inc on line 134 Warning: fclose(): supplied argument is not a valid stream resource in /etc/inc/util.inc on line 135 Warning: session_start(): open(/var/tmp//sess_beec81ce19ddde13d7d89e1eb420423a, O_RDWR) failed: No space left on device (28) in /etc/inc/auth.inc on line 1211

Warning: Unknown: open(/var/tmp//sess_beec81ce19ddde13d7d89e1eb420423a, O_RDWR) failed: No space left on device (28) in Unknown on line 0 Warning: Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct () in Unknown on line 0

I'm curious if this has anything to do with the upgrade but my real questions are these: How do I resolve this? I can't log in via the web interface or shell or console. I expect to have to power off and on the system. After I recycle the power what settings should I change so this doesn't happen again?

Thanks in advance.

sleeprae

Run a good set of diagnostics on the drive. I had a similar issue, and it turned out that the drive was disappearing and that's the behavior that occurred after it vanished. It could be that your drive is having problems, either failure or some sort of incompatibility with the system. In my case, the drive (an SSD) had a poor NCQ implementation, and everything behaved properly once I moved the drive to a newer system and disabled NCQ support in the system BIOS.

You could also be running out of space, but with a 500G drive, that seems fairly unlikely.

caleban

I rebooted pfSense. Everything worked fine for a few days. Very little of the 500 GB drive was used. Today I see the same error again.

Do you have any recommendations for running diagnostics? Should I boot into a special diagnostics operating system CD etc. or run the diagnostics at the pfSense shell?

mikesamo

running squid?

caleban

I'm not running squid. I do have ntop installed.

The last time this happened I was unable to log into the web interface or shell or console. This time I'm able to log into the shell with admin (pfsense shell) and root (bsd shell).

Below is what I see when I log into the admin pfsense shell:

ssh admin@ip

/: create/symlink failed, no inodes free
PHP Fatal error: PHP Startup: apc_fcntl_create: open(/tmp/.apc.la1Ir1, O_RDWR|O_CREAT, 0666) failed: in Unknown on line 0
PHP Fatal error: PHP Startup: apc_fcntl_lock failed: in Unknown on line 0
PHP Fatal error: PHP Startup: apc_fcntl_unlock failed: in Unknown on line 0
PHP Fatal error: PHP Startup: apc_fcntl_create: open(/tmp/.apc.VrXjD0, O_RDWR|O_CREAT, 0666) failed: in Unknown on line 0
PHP Fatal error: PHP Startup: apc_fcntl_create: open(/tmp/.apc.UmkApX, O_RDWR|O_CREAT, 0666) failed: in Unknown on line 0
PHP Fatal error: PHP Startup: apc_fcntl_lock failed: in Unknown on line 0
PHP Fatal error: PHP Startup: apc_fcntl_unlock failed: in Unknown on line 0
PHP Fatal error: PHP Startup: apc_fcntl_create: open(/tmp/.apc.cV1api, O_RDWR|O_CREAT, 0666) failed: in Unknown on line 0
PHP Fatal error: PHP Startup: apc_fcntl_create: open(/tmp/.apc.PeY0yy, O_RDWR|O_CREAT, 0666) failed: in Unknown on line 0

Fatal error: Unknown: apc_fcntl_lock failed: in Unknown on line 0

Logout (SSH only) 8) Shell
Assign Interfaces 9) pfTop
Set interface(s) IP address 10) Filter Logs
Reset webConfigurator password 11) Restart webConfigurator
Reset to factory defaults 12) pfSense Developer Shell
Reboot system 13) Upgrade from console
Halt system 14) Disable Secure Shell (sshd)
Ping host

Enter an option:

Below is what I see when I log into the root bsd shell:

pfSense/bsd appears to not have access to the hard disk. It's not listed in /dev/
Why is the hard disk disappearing from /dev and why does pfSense start working again normally each of the 3 times I've restarted it?

~ > ssh root@ip

df -ih

Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/ad4s1a 447G 591M 411G 0% 37k 61M 0% /
devfs 1.0K 1.0K 0B 100% 0 0 100% /dev
/dev/md0 3.6M 44K 3.3M 1% 22 744 3% /var/run

diskinfo -t /dev/ad4s1a

diskinfo: Device not configured

ls -al /dev

total 5
dr-xr-xr-x 6 root wheel 512 May 9 21:27 .
drwxr-xr-x 25 root wheel 512 May 9 21:25 ..
crw-r–r-- 1 root wheel 0, 35 May 9 21:27 acpi
crw------- 1 root operator 0, 34 May 9 21:27 ata
crw------- 1 root wheel 0, 36 May 9 21:27 atkbd0
crw------- 1 root kmem 0, 22 May 9 21:27 audit
crw------- 1 root wheel 0, 8 May 9 21:27 bpf

touch /tmp/abcd

/: create/symlink failed, no inodes free
touch: /tmp/abcd: No space left on device

ls -al /usr/bin

ls: addr2line: Device not configured
ls: ar: Device not configured
ls: as: Device not configured
ls: at: Device not configured

dmesg

pid 52441 (ntop), uid 0 inumber 52780131 on /: out of inodes
pid 56662 (mktemp), uid 0 inumber 31983616 on /: out of inodes
pid 11338 (mktemp), uid 0 inumber 31983616 on /: out of inodes
pid 31970 (mktemp), uid 0 inumber 31983616 on /: out of inodes
pid 50668 (mktemp), uid 0 inumber 31983616 on /: out of inodes

jimp

If df lists free inodes, but you get an out of inodes error, it means your media is failing (or has already failed).

caleban

This is interesting

I installed pfSense 2-RC1 on a second Rackable Systems server which had been unused (I wasn't upgrading from pfSense 2 Beta. I wiped Debian and instaled pfSense) . After a few days I see the same issue on that second server.

Is it possible pfSense 2 has a problem with my hardware and my hardware isn't failing? How would I determine that?

I updated both of these servers to pfSense 2-RC2 last night. I'm curious to see if that makes any difference. If they both fail again I'll look into replacing one of the hard drives.

I'm going to install pfSense 2-RC2 on a different make and model of server, a SuperMicro server, and see if I have the same issue there.

jimp

You could run an OS-independent hardware test on it. There are numerous testing programs for various types of hardware out there.

Since that hard drive is so large you might also consider doing a custom install and using a much smaller slice, maybe 8GB for /, and then the rest for /usr and swap, then see what happens.

wallabybob

It might also be useful to do # df -ih after you get the reports of inode exhaustion.

caleban

Thanks. I'll try those suggestions. It looks like http://www.ultimatebootcd.com/ is a boot CD with a lot of diagnostics so I'll try that.

When I run df -ih I see the disk listed and I see many inodes available.

~ > ssh root@ip

df -ih

Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/ad4s1a 447G 591M 411G 0% 37k 61M 0% /
devfs 1.0K 1.0K 0B 100% 0 0 100% /dev
/dev/md0 3.6M 44K 3.3M 1% 22 744 3% /var/run

When I list /dev I see the drive isn't listed

ls -al /dev

total 5
dr-xr-xr-x 6 root wheel 512 May 9 21:27 .
drwxr-xr-x 25 root wheel 512 May 9 21:25 ..
crw-r–r-- 1 root wheel 0, 35 May 9 21:27 acpi
crw------- 1 root operator 0, 34 May 9 21:27 ata
crw------- 1 root wheel 0, 36 May 9 21:27 atkbd0
crw------- 1 root kmem 0, 22 May 9 21:27 audit
crw------- 1 root wheel 0, 8 May 9 21:27 bpf

wallabybob

What sort of disk is ad4 and how is it attached?

Is there anything in the system log about the disk going away or any other problem with the disk?

caleban

Internal
Hitachi Deskstar
500 GB

I just rebooted these firewalls last night so this issue isn't happening at the moment and the logs don't appear to show anything before the reboot so I'll have to look next time.