What can cause a box to crash? Swi5 interrupt use high CPU
-
I got three sized of embedded boxes. All bought with pfsense ready to run pfsense.
- 500MHz
- 1.6GHz
- Quad core 2.6GHz
The small and the large run dead stable! Never had a crash so far I remenber.
But… the 1.6GHz crashes over and over. There can be weeks in between if load is generally low. But if load is higher (150 clients) then it wont last a week!I have earlier been able to provoke a crash by running lots of data between two computers through the router from LAN to OPT1.
I am not to sure what happens. I think that traffic will keep running for online users. But new clients will not get a DHCP lease. Later I will not be able to access the webinterface at all.
I just placed a box at a customer (stupid test) and it crashed after three days. It happened after medium running download of 50Mbit and upload of aprox 10Mbit for a while. I have actually left it at the customer but set up a cron job to restart it once every night. I do not know if that will do the trick or not. We will se in a few days.
Now… I don't want to through these two boxes out. But I don't want boxes that keep crashing either. I just have no clue how to start looking for a mis-configuration. What is the first thing to look for or steps to take to find out why hardware with pfsense fail?
Specs are:
2.0.2-RELEASE (i386)
built on Fri Dec 7 16:30:51 EST 2012
FreeBSD 8.1-RELEASE-p13nanobsd (512mb)
Intel(R) Atom(TM) CPU Z530 @ 1.60GHzCurrent:
Processor 35%
States: 8061/198000
MBUF: 902/25600Br. Anders
-
UPDATE:
There seem to be high CPU load on the 1.6GHz box that often crashed. When going to "System activity" a diference from this box to the other is a "swi5" running that takes up a lot CPU power.
https://dl.dropbox.com/u/1652656/NM/pfsense_smi5_process.pngIs it possible to figure out what this "smi5" is? This post comes with some suggestion. Tried them all but are not skilled enough to read the output.
http://forum.pfsense.org/index.php?action=printpage;topic=38426.0Also viewing traffic graphs for LAN and WAN at same time shows some holes in the graphs. But they are not syncronized.. I have asked this question earlier and this should be caused by to little RAM.
https://dl.dropbox.com/u/1652656/NM/pfsense_16GHz_holesInGrapphs.pngHere is the first rows from the "top -S" command. I guess it it the interupt (PID:11) that should not be there and that is causing the trouble:
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 10 root 1 171 ki31 0K 8K RUN 294:42 51.95% idle 11 root 13 -48 - 0K 104K WAIT 48:46 37.99% intr 54507 root 1 76 20 56020K 20308K accept 0:17 0.98% php 261 root 1 76 20 3408K 1212K kqread 5:08 0.00% check_reload_status 57231 nobody 1 45 0 4596K 3020K select 4:39 0.00% darkstat 13 root 1 -16 - 0K 8K - 2:59 0.00% yarrow 7964 root 1 44 0 4956K 2432K select 2:44 0.00% syslogd
-
JUST HAD A CRASH
That will say. Traffic graph behaved strangely. Web interface stopped responding. I could not go to reboot. The swi5 process had been climbing up to something like 70% (and maybe more) before the GUI stopped responding.I did have s shell open to the router and it still did repsond! So I did a fast "shutdown -r now" and the router is up and running again…
Scary
-
A leak and consequent exhaustion of mbufs (network buffers) could cause what you are seeing.
pfSense shell command```
netstat -mHow are the three systems different? Different physical interfaces (maybe a driver leaks mbufs occasionally)? Different traffic types? Different packages?
-
Sorry for this late response to your post.
Had another crash right after my last post and had to take the box off line and replace with an older router. I only behaves like this when a lot of trafic is flowing through it.It is an embedded system shipped to me with pfsense preinstalled. I wonder if I actually can do anything to correct the problem myself.
searching "freebsd intr processor" gives a lot of results but no answers
This is is marked solved. But I have not tried any of the changes yet:
http://forums.freebsd.org/archive/index.php/t-24511.htmlBR. Anders
-
SOLVED.
Hardware issue with the Realtek nic.
Jumboframes would make the whole system crash for some reason. Applianceshop.eu did make a fix (custom kernel) for it for the 2.0.x but it should be solved with the normal 2.1 release.
More information in this thread:
https://forum.pfsense.org/index.php/topic,62032.0.htmlUPDATTE
Fix the link acording to post below. Thanks. -
Link should be:
https://forum.pfsense.org/index.php/topic,62032.0.htmlSteve