What can cause a box to crash? Swi5 interrupt use high CPU
I got three sized of embedded boxes. All bought with pfsense ready to run pfsense.
- Quad core 2.6GHz
The small and the large run dead stable! Never had a crash so far I remenber.
But… the 1.6GHz crashes over and over. There can be weeks in between if load is generally low. But if load is higher (150 clients) then it wont last a week!
I have earlier been able to provoke a crash by running lots of data between two computers through the router from LAN to OPT1.
I am not to sure what happens. I think that traffic will keep running for online users. But new clients will not get a DHCP lease. Later I will not be able to access the webinterface at all.
I just placed a box at a customer (stupid test) and it crashed after three days. It happened after medium running download of 50Mbit and upload of aprox 10Mbit for a while. I have actually left it at the customer but set up a cron job to restart it once every night. I do not know if that will do the trick or not. We will se in a few days.
Now… I don't want to through these two boxes out. But I don't want boxes that keep crashing either. I just have no clue how to start looking for a mis-configuration. What is the first thing to look for or steps to take to find out why hardware with pfsense fail?
built on Fri Dec 7 16:30:51 EST 2012
Intel(R) Atom(TM) CPU Z530 @ 1.60GHz
There seem to be high CPU load on the 1.6GHz box that often crashed. When going to "System activity" a diference from this box to the other is a "swi5" running that takes up a lot CPU power.
Is it possible to figure out what this "smi5" is? This post comes with some suggestion. Tried them all but are not skilled enough to read the output.
Also viewing traffic graphs for LAN and WAN at same time shows some holes in the graphs. But they are not syncronized.. I have asked this question earlier and this should be caused by to little RAM.
Here is the first rows from the "top -S" command. I guess it it the interupt (PID:11) that should not be there and that is causing the trouble:
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 10 root 1 171 ki31 0K 8K RUN 294:42 51.95% idle 11 root 13 -48 - 0K 104K WAIT 48:46 37.99% intr 54507 root 1 76 20 56020K 20308K accept 0:17 0.98% php 261 root 1 76 20 3408K 1212K kqread 5:08 0.00% check_reload_status 57231 nobody 1 45 0 4596K 3020K select 4:39 0.00% darkstat 13 root 1 -16 - 0K 8K - 2:59 0.00% yarrow 7964 root 1 44 0 4956K 2432K select 2:44 0.00% syslogd
JUST HAD A CRASH
That will say. Traffic graph behaved strangely. Web interface stopped responding. I could not go to reboot. The swi5 process had been climbing up to something like 70% (and maybe more) before the GUI stopped responding.
I did have s shell open to the router and it still did repsond! So I did a fast "shutdown -r now" and the router is up and running again…
wallabybob last edited by
A leak and consequent exhaustion of mbufs (network buffers) could cause what you are seeing.
pfSense shell command```
How are the three systems different? Different physical interfaces (maybe a driver leaks mbufs occasionally)? Different traffic types? Different packages?
Sorry for this late response to your post.
Had another crash right after my last post and had to take the box off line and replace with an older router. I only behaves like this when a lot of trafic is flowing through it.
It is an embedded system shipped to me with pfsense preinstalled. I wonder if I actually can do anything to correct the problem myself.
searching "freebsd intr processor" gives a lot of results but no answers
This is is marked solved. But I have not tried any of the changes yet:
Hardware issue with the Realtek nic.
Jumboframes would make the whole system crash for some reason. Applianceshop.eu did make a fix (custom kernel) for it for the 2.0.x but it should be solved with the normal 2.1 release.
More information in this thread:
Fix the link acording to post below. Thanks.
Link should be: