Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Heavy Disk I/O

    Scheduled Pinned Locked Moved General pfSense Questions
    15 Posts 6 Posters 3.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      pdg
      last edited by

      Hi,

      I am new to pfsense, and I am witnessing heavy disk i/o every few minutes. Below is my config and would like to know the direction in which I could diagnose the root cause of this problem.

      Kindly note the disk i/o is so high that system completely stalls.. RRD graphs show gaps, any keyboard action on console has no response. However commands such as top/iostat keep running fine.

      H/w
      Processor: 3.2GHz i5 4th generation (CPU Intel 4570)
      RAM : 16 GB
      HDD : 500 GB * 2 (GEOM Mirror)

      Pkgs Installed:
      arping-2.14_1-amd64
      bandwidthd-2.0.1_6-amd64
      iftop-0.17-amd64
      iperf-2.0.5-amd64
      lightsquid-1.8_2-amd64
      mtr-0.85_1-amd64
      nmap-6.47-amd64
      ntopng-1.2.1-amd64
      p7zip-9.20.1_2-amd64
      sarg-2.3.9-amd64
      snort-2.9.7.0-amd64
      squid-3.4.10_2-amd64
      squidguard-squid3-1.4_7-amd64
      suricata-2.0.6-amd64
      zip-3.0_1-amd64

      Thanks,
      Pd

      1 Reply Last reply Reply Quote 0
      • H
        heper
        last edited by

        goto the shell/console

        type:  top -SH -mio

        usual suspects: squid/lightsquid/ntop/snort/suricata

        tons of processes use (almost) 100% i/o for a very short time … data wants to move as fast a possible

        probably one of them is hogging 100% i/o for a longer time. Find it and let us know ; perhaps someone can help you out with other settings

        1 Reply Last reply Reply Quote 0
        • P
          pdg
          last edited by

          dear heper,

          Thanks for the prompt response. I have run the command, anything specific that I should watch for? in terms of fields or values.

          Thanks,

          1 Reply Last reply Reply Quote 0
          • P
            pdg
            last edited by

            Hi,

            I see syncer and bufdaemon hitting 100% usage.

            Best,
            Pd

            1 Reply Last reply Reply Quote 0
            • H
              heper
              last edited by

              i'd look for a process that holds the 100% for more then 10-15 seconds.

              1 Reply Last reply Reply Quote 0
              • P
                pdg
                last edited by

                I have observed that system response stalls the moment bufdaemon goes 100%.

                I find that happening frequently.

                Best,

                1 Reply Last reply Reply Quote 0
                • bmeeksB
                  bmeeks
                  last edited by

                  I support the Snort and Suricata packages on pfSense.  Neither of those packages will generate much disk I/O unless you are getting huge numbers of alerts per second.  If that is true, then they will be busy writing to their log files.

                  Those two packages can easily be removed and then reinstalled without losing their settings, so you could temporarily remove them to see if that impacted the disk I/O issue.

                  Bill

                  1 Reply Last reply Reply Quote 0
                  • P
                    pdg
                    last edited by

                    hi Snort and Suricata were my first doubts, and therefore I have disabled them, but the problem persists.

                    Any other ideas?

                    1 Reply Last reply Reply Quote 0
                    • C
                      charliem
                      last edited by

                      bufdaemon is an internal daemon started by the kernel.  With the HW you listed, it should run very quickly, but it appears to be failing while still holding a lock.

                      Do you see any HW related error messages in your logs?  Do you have remote syslogging enabled?  As a test you could use only a single HDD rather than the 2 * mirror.

                      1 Reply Last reply Reply Quote 0
                      • P
                        pdg
                        last edited by

                        Hi Charliem,

                        1. I have tried disabling one of the mirrors, but has not gained any advantage
                        2. I do not have remote syslogging enabled

                        Though it may be a longshot but do you think having multiple VLANs on single Physical NIC could cause this effect in anyway, as this machine is also the router for the org. But, considering we have ~200 machines i wonder if it does really matter.

                        Anything else I could do to narrow down to the root cause.

                        Thanks!

                        1 Reply Last reply Reply Quote 0
                        • C
                          charliem
                          last edited by

                          Sorry I don't have specific suggestions.  Possibilities as I see them are:

                          Something really is generating so much IO as to starve other the system threads.  Standard way to track these hogs down is by using the different systat() displays.  It sounds like you are already doing this.  As a side note, I see pfSense does not include the gstat tool.

                          Or alternately something is wrong and interfering with these system threads, not allowing them to complete in a timely manner.  This is where I was thinking HW error (disk i/o), network errors, waiting on remote logging, or something similar.

                          I guess a third possibility is somehow the number of buffers or size of buffers is too high, an estimation based on RAM amount that goes wrong.  But 16G should not be a problem AFAIK.

                          Multiple VLANs on a single NIC should be OK, and not cause pauses like this, but if you have another NIC to throw in there that might be an easy check.  Do you see any I/O errors on the interfaces?

                          1 Reply Last reply Reply Quote 0
                          • P
                            pdg
                            last edited by

                            Hi,

                            I did a netstat -ni, netstat -s to see for errors / failures, but don't find any.

                            Anything else I could look at .. I am really getting clueless.

                            Thanks.

                            1 Reply Last reply Reply Quote 0
                            • DerelictD
                              Derelict LAYER 8 Netgate
                              last edited by

                              top -m io

                              maybe that'll catch something?

                              Like heper said.

                              You're looking for processes using I/O.

                              I know when I do cat /dev/zero > /root/delete.me cat shoots right to the top.  :)

                              Chattanooga, Tennessee, USA
                              A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                              DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                              Do Not Chat For Help! NO_WAN_EGRESS(TM)

                              1 Reply Last reply Reply Quote 0
                              • M
                                mir
                                last edited by

                                What are the numbers for MBUF usage?
                                I have heard someone mention that multiport nics sometimes consumes all mbuffers.

                                1 Reply Last reply Reply Quote 0
                                • P
                                  pdg
                                  last edited by

                                  Hi
                                  I am not using multiport NICs.
                                  I am attaching the Mbuf graph for the week.

                                  Kernel Value is kern.ipc.nmbclusters="0" as per /boot/loader.conf
                                  and sysctl output is
                                  kern.ipc.nmbclusters: 26584

                                  Looking forward further guidance.

                                  Thanks,
                                  Pd

                                  MBuf.png
                                  MBuf.png_thumb

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.