What can cause a box to crash? Swi5 interrupt use high CPU



  • I got three sized of embedded boxes. All  bought with pfsense ready to run pfsense.

    • 500MHz
    • 1.6GHz
    • Quad core 2.6GHz

    The small and the large run dead stable! Never had a crash so far I remenber.
    But… the 1.6GHz crashes over and over. There can be weeks in between if load is generally low. But if load is higher (150 clients) then it wont last a week!

    I have earlier been able to provoke a crash by running lots of data between two computers through the router from LAN to OPT1.

    I am not to sure what happens. I think that traffic will keep running for online users. But new clients will not get a DHCP lease. Later I will not be able to access the webinterface at all.

    I just placed a box at a customer (stupid test) and it crashed after three days. It happened after medium running download of 50Mbit and upload of aprox 10Mbit for a while. I have actually left it at the customer but set up a cron job to restart it once every night. I do not know if that will do the trick or not. We will se in a few days.

    Now… I don't want to through these two boxes out. But I don't want boxes that keep crashing either. I just have no clue how to start looking for a mis-configuration. What is the first thing to look for or steps to take to find out why hardware with pfsense fail?

    Specs are:
    2.0.2-RELEASE (i386)
    built on Fri Dec 7 16:30:51 EST 2012
    FreeBSD 8.1-RELEASE-p13

    nanobsd (512mb)
    Intel(R) Atom(TM) CPU Z530 @ 1.60GHz

    Current:
    Processor 35%
    States: 8061/198000
    MBUF: 902/25600

    Br. Anders



  • UPDATE:
    There seem to be high CPU load on the 1.6GHz box that often crashed. When going to "System activity" a diference from this box to the other is a "swi5" running that takes up a lot CPU power.
    https://dl.dropbox.com/u/1652656/NM/pfsense_smi5_process.png

    Is it possible to figure out what this "smi5" is? This post comes with some suggestion. Tried them all but are not skilled enough to read the output.
    http://forum.pfsense.org/index.php?action=printpage;topic=38426.0

    Also viewing traffic graphs for LAN and WAN at same time shows some holes in the graphs. But they are not syncronized.. I have asked this question earlier and this should be caused by to little RAM.
    https://dl.dropbox.com/u/1652656/NM/pfsense_16GHz_holesInGrapphs.png

    Here is the first rows from the "top -S" command. I guess it it the interupt (PID:11) that should not be there and that is causing the trouble:

      PID USERNAME THR PRI NICE   SIZE    RES  STATE    TIME   WCPU COMMAND
       10 root       1 171 ki31     0K     8K    RUN  294:42 51.95% idle
       11 root      13 -48    -     0K   104K   WAIT   48:46 37.99% intr
    54507 root       1  76   20 56020K 20308K accept    0:17  0.98% php
      261 root       1  76   20  3408K  1212K kqread    5:08  0.00% check_reload_status
    57231 nobody     1  45    0  4596K  3020K select    4:39  0.00% darkstat
       13 root       1 -16    -     0K     8K      -    2:59  0.00% yarrow
     7964 root       1  44    0  4956K  2432K select    2:44  0.00% syslogd
    


  • JUST HAD A CRASH
    That will say. Traffic graph behaved strangely. Web interface stopped responding. I could not go to reboot. The swi5 process had been climbing up to something like 70% (and maybe more) before the GUI stopped responding.

    I did have s shell open to the router and it still did repsond! So I did a fast "shutdown -r now" and the router is up and running again…

    Scary



  • A leak and consequent exhaustion of mbufs (network buffers) could cause what you are seeing.

    pfSense shell command```
    netstat -m

    
    How are the three systems different? Different physical interfaces (maybe a driver leaks mbufs occasionally)? Different traffic types? Different packages?


  • Sorry for this late response to your post.
    Had another crash right after my last post and had to take the box off line and replace with an older router. I only behaves like this when a lot of trafic is flowing through it.

    It is an embedded system shipped to me with pfsense preinstalled. I wonder if I actually can do anything to correct the problem myself.

    searching "freebsd intr processor" gives a lot of results but no answers

    This is is marked solved. But I have not tried any of the changes yet:
    http://forums.freebsd.org/archive/index.php/t-24511.html

    BR. Anders



  • SOLVED.

    Hardware issue with the Realtek nic.

    Jumboframes would make the whole system crash for some reason. Applianceshop.eu did make a fix (custom kernel) for it for the 2.0.x but it should be solved with the normal 2.1 release.

    More information in this thread:
    https://forum.pfsense.org/index.php/topic,62032.0.html

    UPDATTE
    Fix the link acording to post below. Thanks.


  • Netgate Administrator


Log in to reply