What can cause a box to crash? Swi5 interrupt use high CPU

  • I got three sized of embedded boxes. All  bought with pfsense ready to run pfsense.

    • 500MHz
    • 1.6GHz
    • Quad core 2.6GHz

    The small and the large run dead stable! Never had a crash so far I remenber.
    But… the 1.6GHz crashes over and over. There can be weeks in between if load is generally low. But if load is higher (150 clients) then it wont last a week!

    I have earlier been able to provoke a crash by running lots of data between two computers through the router from LAN to OPT1.

    I am not to sure what happens. I think that traffic will keep running for online users. But new clients will not get a DHCP lease. Later I will not be able to access the webinterface at all.

    I just placed a box at a customer (stupid test) and it crashed after three days. It happened after medium running download of 50Mbit and upload of aprox 10Mbit for a while. I have actually left it at the customer but set up a cron job to restart it once every night. I do not know if that will do the trick or not. We will se in a few days.

    Now… I don't want to through these two boxes out. But I don't want boxes that keep crashing either. I just have no clue how to start looking for a mis-configuration. What is the first thing to look for or steps to take to find out why hardware with pfsense fail?

    Specs are:
    2.0.2-RELEASE (i386)
    built on Fri Dec 7 16:30:51 EST 2012
    FreeBSD 8.1-RELEASE-p13

    nanobsd (512mb)
    Intel(R) Atom(TM) CPU Z530 @ 1.60GHz

    Processor 35%
    States: 8061/198000
    MBUF: 902/25600

    Br. Anders

    There seem to be high CPU load on the 1.6GHz box that often crashed. When going to "System activity" a diference from this box to the other is a "swi5" running that takes up a lot CPU power.

    Is it possible to figure out what this "smi5" is? This post comes with some suggestion. Tried them all but are not skilled enough to read the output.

    Also viewing traffic graphs for LAN and WAN at same time shows some holes in the graphs. But they are not syncronized.. I have asked this question earlier and this should be caused by to little RAM.

    Here is the first rows from the "top -S" command. I guess it it the interupt (PID:11) that should not be there and that is causing the trouble:

       10 root       1 171 ki31     0K     8K    RUN  294:42 51.95% idle
       11 root      13 -48    -     0K   104K   WAIT   48:46 37.99% intr
    54507 root       1  76   20 56020K 20308K accept    0:17  0.98% php
      261 root       1  76   20  3408K  1212K kqread    5:08  0.00% check_reload_status
    57231 nobody     1  45    0  4596K  3020K select    4:39  0.00% darkstat
       13 root       1 -16    -     0K     8K      -    2:59  0.00% yarrow
     7964 root       1  44    0  4956K  2432K select    2:44  0.00% syslogd

    That will say. Traffic graph behaved strangely. Web interface stopped responding. I could not go to reboot. The swi5 process had been climbing up to something like 70% (and maybe more) before the GUI stopped responding.

    I did have s shell open to the router and it still did repsond! So I did a fast "shutdown -r now" and the router is up and running again…


  • A leak and consequent exhaustion of mbufs (network buffers) could cause what you are seeing.

    pfSense shell command```
    netstat -m

    How are the three systems different? Different physical interfaces (maybe a driver leaks mbufs occasionally)? Different traffic types? Different packages?

  • Sorry for this late response to your post.
    Had another crash right after my last post and had to take the box off line and replace with an older router. I only behaves like this when a lot of trafic is flowing through it.

    It is an embedded system shipped to me with pfsense preinstalled. I wonder if I actually can do anything to correct the problem myself.

    searching "freebsd intr processor" gives a lot of results but no answers

    This is is marked solved. But I have not tried any of the changes yet:

    BR. Anders


    Hardware issue with the Realtek nic.

    Jumboframes would make the whole system crash for some reason. Applianceshop.eu did make a fix (custom kernel) for it for the 2.0.x but it should be solved with the normal 2.1 release.

    More information in this thread:

    Fix the link acording to post below. Thanks.

  • Netgate Administrator

Log in to reply