MBUF, kernel panics and Alix
-
Here's also the output on netstat -m on both systems, as of right now:
SYSTEM THAT SHOWS MBUF INCREASES:
1568/3052/4620 mbufs in use (current/cache/total) 1481/2617/4098/8640 mbuf clusters in use (current/cache/total/max) 1480/2616 mbuf+clusters out of packet secondary zone in use (current/cache) 0/27/27/4320 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/2160 9k jumbo clusters in use (current/cache/total/max) 0/0/0/1080 16k jumbo clusters in use (current/cache/total/max) 3419K/6105K/9524K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/4/2416 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines
SYSTEM THAT WORKS OK:
430/470/900 mbufs in use (current/cache/total) 426/92/518/8640 mbuf clusters in use (current/cache/total/max) 425/87 mbuf+clusters out of packet secondary zone in use (current/cache) 0/27/27/4320 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/2160 9k jumbo clusters in use (current/cache/total/max) 0/0/0/1080 16k jumbo clusters in use (current/cache/total/max) 959K/409K/1369K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/4/2416 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines
-
May not be a root cause fix, but you may be able to raise the MBUF value to provide relief while researching further. I outlined the procedure a while ago in the linked thread. Hope it helps.
http://forum.pfsense.org/index.php/topic,37754.msg194854.html#msg194854
-
This is driving me crazy ???
This is certainly related to the Wi-Fi, and furthermore, I am pretty sure I have identified the problem: this happens when a Samsung Galaxy S2 connects to the network
WTF?
Does this make any sense to anybody??
-
HOLY SH*T I FINALLY FIGURED IT OUT.
It was the traffic shaper on the ath0 interface.
On my 5884592072th attempt to find something related on the Internet I came across this post (which references this post) on which Ermal suggested to disable the traffic shaper on the interface. Since I did that, no problems whatsoever.
Anyway I can't 100% replicate it, since in my case was not happening with all clients or traffic. For some reason a manager's Android device was triggering the problem most of the times, but I couldn't find a specific pattern.
Now I can stop making up excuses to reset the network connectivity every day =P
Should this be reported on the FreeBSD list? Anyway, I don't have much more info than this…
Regards and thanks to everybody
-
Should this be reported on the FreeBSD list?
It is probably worthwhile at least investigating further.
1. pfSense 2.0.x is based on a now old version (8.1) of FreeBSD. Maybe the problem is fixed in pfSense 2.1 snapshot builds which use FreeBSD 8.3.
2. I haven't seen this problem when my Android phone uses my pfSense 2.0.1 Atheros based AP, but that might be because I don't use my phone "enough" or do the right sort of things on it.
3. It would probably be very helpful if you could run some carefully controlled test to see if you can further identify what activity on the phone provokes this problem. What sorts of things are done on the phone? Do the one at time and watch mbuf usage before during and after.
-
As a further data point I am running an atheros card as an AP in 2.0.1 and use an (android based) nexus 7 extensively. I've never seen mbufs climbing. I have a single antenna setup in case that makes any odds. No traffic shaping on that interface either.
Steve
-
No traffic shaping on that interface either.
This was the key here. I believe there is some leak between the traffic shaper and the ath driver. This must be triggered by some specific traffic, which in my case, originated from that phone.
Could any of you enable the shaper on the WLAN for testing purposes? In my case, I had a very simple setup with PRIQ, with only 1 rule to prioritize SQL traffic going to a server in my LAN (which I don't think has anything to do here, no way that traffic was originating from the phone).
Regards!
-
Just wanted to add to my experience:
Alix 2D13 with USB Wifi adapter based on RT3070 chipset (run driver). At some point in the last couple of weeks I started having random reboots which I think are related to the wireless adapter. Yesterday I opened a thread on the 2.1 forum with more details.
I'm using WPA2/AES and I am seeing a continuous stream of "WPA: EAPOL-Key timeout" from Android devices (Nexus 7s and Nexus S) in the logs. I am also using the traffic shaper on a bridged interface (LAN + WLAN).
In my post in the 2.1 forum I said I couldn't pinpoint when the reboot started, but after finding this, I remembered that the reboots started not long after I enabled traffic shaping.
I will also monitor the MBUF usage and see if it coincides with a reboot.
-
Thanks for all your information. I just want to share mine. Just recently noticed the increasing of MBUF and system lock up. My installation was on late January, since then the router was running fine for 2 months.
Last week I noticed the VOIP quality got a little worse, then I setup the traffic shaper with the wizard. A week later I got my first lock up, then today I saw the MBUF is on 25374/25600.
The system:
Version 2.0.2-RELEASE (i386)
built on Fri Dec 7 16:30:25 EST 2012
FreeBSD 8.1-RELEASE-p13
Platform nanobsd (2g)
NanoBSD Boot Slice pfsense0 / ad0s1
CPU Type VIA Nehemiah
Hardware crypto VIA PadlockWAN VR0
LAN UE0
OPT1 Ath0 -
It is definetely some sort of problem between the ath0 driver and the traffic shaper…
At work we recently started dealing with VoIP traffic. The shaper on the other interfaces works great, but I will need to come back to this and figure it out soon. I need the shaper on the WLAN as well... :-\
I'll keep you updated on any findings.
Regards!