Ran out of inodes

wallabybob

Hi, using a snapshot around early Dec 2010 ish (2.0 beta4), I have tried (and failed) to login to the web interface.

That is now a fairly old build. I suggest you upgrade to a much more recent snapshot.

Speculation: You have run into a problem where the kernel has exhausted one or more resources. Allocation of some critical resource (for example, a chunk of heap memory) fails and that allocation failure gets reported up the line as "out of inodes" because the allocation failure occurs in some file system related code and the code authors didn't take care to distinguish the cases "out of inodes" and "can't allocate heap memory for inode processing"

Which build are you using: i386 or amd64? If i386, can you use amd64?

sleeprae

While I don't want to complicate matters with an issue that has similar symptoms but perhaps completely unrelated causes, I thought I would share my recent experience as well. In my case, after running for 2 or 3 days, I would get similar messages–no space left on device, etc. In my case, I couldn't even drop into a shell on the console, but a reboot would clear it up. After a couple occurrences, I left an SSH session open. The next time it happened, I looked at dmesg output, and saw the following:

unknown: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=2257839
ata2: timeout waiting to issue command
ata2: error issuing WRITE_DMA command
g_vfs_done():ad4s1a[WRITE(offset=578043904, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=963330048, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=1155956736, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=1541521408, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=1541947392, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=1541980160, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=1733967872, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=1733984256, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=1734000640, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=1734098944, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=1734115328, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=2311847936, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=2311864320, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=2311880704, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=2311946240, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=2697117696, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=2889760768, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=2889777152, length=16384)]error = 6
g_vfs_done():ad4s1a[WRITE(offset=1155973120, length=16384)]error = 5
Device ad4s1a went missing before all of the data could be written to it; expect data loss.
pid 26192 (php), uid 0 inumber 141312 on /: out of inodes
pid 38718 (php), uid 0 inumber 141312 on /: out of inodes
pid 17290 (php), uid 0 inumber 141312 on /: out of inodes
pid 499 (php), uid 0 inumber 141312 on /: out of inodes
pid 43275 (sh), uid 0 inumber 353389 on /: out of inodes
pid 7655 (php), uid 0 inumber 141312 on /: out of inodes
pid 55110 (php), uid 0 inumber 141312 on /: out of inodes
pid 56772 (php), uid 0 inumber 141312 on /: out of inodes
pid 60221 (php), uid 0 inumber 141312 on /: out of inodes
pid 63456 (php), uid 0 inumber 141312 on /: out of inodes
pid 11435 (php), uid 0 inumber 141312 on /: out of inodes
pid 24754 (php), uid 0 inumber 141312 on /: out of inodes
pid 27519 (php), uid 0 inumber 141312 on /: out of inodes
pid 42681 (php), uid 0 inumber 141312 on /: out of inodes
pid 17285 (php), uid 0 inumber 141312 on /: out of inodes
pid 18570 (php), uid 0 inumber 141312 on /: out of inodes

In my case, pfSense was installed on an older nForce3 Ultra board and a new Kingston 8GB V100 series SSD. I was running one of the late December -BETA4 (x64) builds initially, and upgrading to newer -BETA5 builds didn't seem to help. (latest installed: 2.0-BETA5 (amd64) built on Tue Dec 28 03:03:03 EST 2010). Four days ago, I cloned the SSD to a generic magnetic HDD, and have had no problem since. I chalked it up to some incompatibility between the older motherboard and the new SSD, but it's similar enough to your issue that I thought I would at least mention it.

wallabybob

sleeprae: If the system hard drive suddenly goes walkabout all sorts of wierdness will happen. Its not hard to imagine that "the hard drive has gone" might translate into "out of inodes". Thanks for reporting that, I think its similar enough to the orgininally reported problem to be interesting.

wildgoose: are you running off a solid state drive (e.g. a flash card or ssd)?

Cry Havok

ewildgoose, I think you've got hardware problems there.

ewildgoose

Hmm, checking dmesg further back I see the same problem. Snippet from the log files:

em1: link state changed to UP
g_vfs_done():ad0s1a[WRITE(offset=1926676480, length=16384)]error = 6
ug_vfs_done():ad0s1a[WRITE(offset=1926692864, length=16384)]error = 6
ng_vfs_done():ad0s1a[WRITE(offset=k770686976, length=16384)]error = 6
g_vfs_done():ad0s1a[WRITE(offset=1163100160, length=16384)]error = 6
ng_vfs_done():ad0s1a[WRITE(offset=1163771904, length=16384)]error = 6
owg_vfs_done():ad0s1a[WRITE(offset=1166508032, length=16384)]error = 6
n:g_vfs_done():ad0s1a[WRITE(offset=1166704640, length=16384)]error = 6
g_vfs_done():ad0s1a[WRITE(offset=1166901248, length=16384)]error = 6
g_vfs_done():ad0s1a[WRITE(offset=1167097856, length=16384)]error = 6
Tg_vfs_done():ad0s1a[WRITE(offset=1167294464, length=16384)]error = 6
Ig_vfs_done():ad0s1a[WRITE(offset=1167491072, length=16384)]error = 6
MEg_vfs_done():ad0s1a[WRITE(offset=1171144704, length=16384)]error = 6
g_vfs_done():ad0s1a[WRITE(offset=1171341312, length=16384)]error = 6
Og_vfs_done():ad0s1a[WRITE(offset=1168130048, length=16384)]error = 6
Ug_vfs_done():ad0s1a[WRITE(offset=1167687680, length=32768)]error = 6
T g_vfs_done():ad0s1a[WRITE(offset=1167998976, length=32768)]error = 6

g_vfs_done():ad0s1a[WRITE(offset=1168310272, length=32768)]error = 6
Wg_vfs_done():ad0s1a[WRITE(offset=1168637952, length=16384)]error = 6
Rg_vfs_done():ad0s1a[WRITE(offset=1168867328, length=16384)]error = 6
Ig_vfs_done():ad0s1a[WRITE(offset=1169113088, length=16384)]error = 6
g_vfs_done():ad0s1a[WRITE(offset=777910272, length=2048)]error = 6
Tg_vfs_done():ad0s1a[WRITE(offset=777889792, length=6144)]error = 6
g_vfs_done():ad0s1a[WRITE(offset=6144000, length=2048)]error = 6
Eg_vfs_done():ad0s1a[WRITE(offset=65536, length=2048)]error = 6
_g_vfs_done():ad0s1a[READ(offset=1164279808, length=4096)]error = 6
Dvnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 29556 (rrdtool)
Mg_vfs_done():ad0s1a[WRITE(offset=578158592, length=16384)]error = 6
A
pid 29556 (rrdtool), uid 0: exited on signal 11
retrying (1 retry left) LBA=3762959
ata0: timeout waiting to issue command
ata0: error issuing WRITE_DMA command
g_vfs_done():ad0s1a[WRITE(offset=1926594560, length=16384)]error = 5
Device ad0s1a went missing before all of the data could be written to it; expect data loss.
pid 24822 (php), uid 0 inumber 94249 on /: out of inodes
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 46772 (rrdtool)
pid 46772 (rrdtool), uid 0 inumber 70656 on /: out of inodes
pid 46772 (rrdtool), uid 0: exited on signal 11
arp: 192.168.105.56 moved from 58:b0:35:78:0d:f5 to 00:24:36:9e:fe:13 on em1
pid 51725 (php), uid 0 inumber 94249 on /: out of inodes
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 35067 (rrdtool)
pid 35067 (rrdtool), uid 0 inumber 70656 on /: out of inodes
pid 35067 (rrdtool), uid 0: exited on signal 11
pid 58977 (php), uid 0 inumber 94249 on /: out of inodes
vnode_pager_getpages: I/O read error

Yes, I'm using the CF card slot on this Lanner board. I'm using a more expensive SLC card though, it's a brand new card and no reason to think it should have gone bad. Seems more likely that there might be a driver issue with the controller?

atapci0: <intel ich8m="" udma100="" controller="">port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0
ata0: <ata 0="" channel="">on atapci0
ata0: [ITHREAD]

Hmm..</ata></intel>

Cry Havok

It isn't a driver issue, or many other people would have reported problems. Just because your card is new doesn't stop it going bad - hardware can fail at any time.

Try another card. If there are no problems then you know the problem was with that card. If there are then it may be a problem with your motherboard.

cmb

"out of inodes" can indeed be reported on a failing disk/CF, I've seen that on occasion.

jimp

Between the out of inode error and the other errors you are seeing, all signs point toward failing media.

wallabybob

Here are a couple of tests you could do on your "hard drive". These examples assume you hard drive is /dev/ad1 (change name as appropriate for your configuration).

Read the whole hard drive (copy the whole drive to /dev/null, the "null" device):

# dd bs=65536 if=/dev/ad1 of=/dev/null

write zeroes to free space, then free up the space filled with zeroes:

# dd bs=65536 if=/dev/zero of=/tmp/zero; rm /tmp/zero

If the drive is good neither of these tests should produce any error report relating to the drive.

I believe some types of solid state "disks" do some sort of wear levelling which could involve the drive being "busy" for a while. I have no idea what appropriate standards say about this in relation to how long a drive might be allowed to "lock out" i/o requests while it is busy with its housekeeping. Its possible the FreeBSD disk driver might need some tweaking to accommodate some types of solid state media.

I have two pfSense boxes using Transcend DOM 1GB solid state disk modules. I've not seen this sort of problem on them. But these devices are intended for high i/o rate and sustained i/o environments. I suspect commodity type memory cards are not intended for high i/o rate and sustained i/o environments and consequently the designers might have taken some shortcuts.

Tidder

@wallabybob:

wildgoose: are you running off a solid state drive (e.g. a flash card or ssd)?

I just wanted to mention here that we had 2 routers running a recent build of BETA5 working flawless for a couple of weeks. I have switched from a normal platter-based 2.5" hard drive to a Transend 2.5" solid state 8gb IDE drive (SLC) and have installed RC1 fresh onto both of them. Both of them are now exhibiting this behavior. I do believe this problem could be linked to SSDs. Anything I can do to help/fix? Start a new thread and not hijack this one? ;)