Heavy Disk I/O

pdg

Hi,

I see syncer and bufdaemon hitting 100% usage.

Best,
Pd

heper

i'd look for a process that holds the 100% for more then 10-15 seconds.

pdg

I have observed that system response stalls the moment bufdaemon goes 100%.

I find that happening frequently.

Best,

bmeeks

I support the Snort and Suricata packages on pfSense. Neither of those packages will generate much disk I/O unless you are getting huge numbers of alerts per second. If that is true, then they will be busy writing to their log files.

Those two packages can easily be removed and then reinstalled without losing their settings, so you could temporarily remove them to see if that impacted the disk I/O issue.

Bill

pdg

hi Snort and Suricata were my first doubts, and therefore I have disabled them, but the problem persists.

Any other ideas?

charliem

bufdaemon is an internal daemon started by the kernel. With the HW you listed, it should run very quickly, but it appears to be failing while still holding a lock.

Do you see any HW related error messages in your logs? Do you have remote syslogging enabled? As a test you could use only a single HDD rather than the 2 * mirror.

pdg

Hi Charliem,

1. I have tried disabling one of the mirrors, but has not gained any advantage
2. I do not have remote syslogging enabled

Though it may be a longshot but do you think having multiple VLANs on single Physical NIC could cause this effect in anyway, as this machine is also the router for the org. But, considering we have ~200 machines i wonder if it does really matter.

Anything else I could do to narrow down to the root cause.

Thanks!

charliem

Sorry I don't have specific suggestions. Possibilities as I see them are:

Something really is generating so much IO as to starve other the system threads. Standard way to track these hogs down is by using the different systat() displays. It sounds like you are already doing this. As a side note, I see pfSense does not include the gstat tool.

Or alternately something is wrong and interfering with these system threads, not allowing them to complete in a timely manner. This is where I was thinking HW error (disk i/o), network errors, waiting on remote logging, or something similar.

I guess a third possibility is somehow the number of buffers or size of buffers is too high, an estimation based on RAM amount that goes wrong. But 16G should not be a problem AFAIK.

Multiple VLANs on a single NIC should be OK, and not cause pauses like this, but if you have another NIC to throw in there that might be an easy check. Do you see any I/O errors on the interfaces?

pdg

Hi,

I did a netstat -ni, netstat -s to see for errors / failures, but don't find any.

Anything else I could look at .. I am really getting clueless.

Thanks.

Derelict

top -m io

maybe that'll catch something?

Like heper said.

You're looking for processes using I/O.

I know when I do cat /dev/zero > /root/delete.me cat shoots right to the top. :)

mir

What are the numbers for MBUF usage?
I have heard someone mention that multiport nics sometimes consumes all mbuffers.

pdg

Hi
I am not using multiport NICs.
I am attaching the Mbuf graph for the week.

Kernel Value is kern.ipc.nmbclusters="0" as per /boot/loader.conf
and sysctl output is
kern.ipc.nmbclusters: 26584

Looking forward further guidance.

Thanks,
Pd

MBuf.png_thumb