MBUF, kernel panics and Alix
-
MBUFS are basically blocks of kernel memory used for network i/o. Normally MBUF allocation will grow from system startup as network activity occurs then plateau with maybe an small occasional rise.
MBUF usage can continue to grow if there is a programming error such as failing to deallocate a now unused MBUF. If pfSense runs out of MBUFs (it has allocated up to the maximum) it is likely to panic or hang.
What version of pfSense are you running?
What is the brand and model of your WiFi NIC?
Please post the output of the pfSense shell command```
/etc/rc.banner -
This is the output:
*** Welcome to pfSense 2.0.1-RELEASE-nanobsd (i386) on XXX *** WAN (wan) -> vr0 -> XXX.XXX.XXX.XXX LAN (lan) -> vr1 -> 172.21.2.254 LINK (opt1) -> vr2 -> 10.255.255.2 WLAN (opt2) -> ath0_wlan0 -> 172.21.202.254
I am currently using a Wistron DCMA-82 which has an Atheros AR5414 chipset. Running on an Alix 2D3 with the latest BIOS.
Thanks in advance for your help! :)
-
pfSense shell command```
netstat -m
You could write a shell script to (say) append a time stamp to a file, append mbuf usage files to file, sleep for an hours and repeat. The file contents will then show you the mbuf usage trends. You could run it on a couple of systems for comparison. If you see sudden jumps in mbuf usage after (say) a couple of hours uptime you could look in the system log for clues to what might be causing the jump.
-
Thanks for the clues, I'll post whenever I figure out what is going on
Regards!
-
OK, I installed a WLAN Atheros card on my 2nd Alix. Now I have 2 systems configured exacly in the same way, same hardware, same pfSense, same rules, etc (only difference is the subnet they care about).
To my surprise, the system that originally had the problem continues to show it, but the other one does not.
Only difference I can see in the logs by the time the MBUF increases is related to the wireless:
Wireless log for the one that SHOWS MBUFs INCREASE:
Oct 29 13:05:44 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: sending 3/4 msg of 4-Way Handshake Oct 29 13:05:44 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 WPA: received EAPOL-Key frame (4/4 Pairwise) Oct 29 13:05:44 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 IEEE 802.1X: authorizing port Oct 29 13:05:44 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 RADIUS: starting accounting session 508C825B-00000059 Oct 29 13:05:44 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 WPA: pairwise key handshake completed (RSN) Oct 29 13:05:44 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: received EAPOL-Key frame (4/4 Pairwise) Oct 29 13:05:44 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 IEEE 802.1X: authorizing port Oct 29 13:05:44 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 RADIUS: starting accounting session 508C825B-0000005A Oct 29 13:05:44 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: pairwise key handshake completed (RSN) Oct 29 13:05:47 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 IEEE 802.11: associated Oct 29 13:05:47 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: event 1 notification Oct 29 13:05:47 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: event 4 notification Oct 29 13:05:47 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: sending 1/4 msg of 4-Way Handshake Oct 29 13:05:47 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: EAPOL-Key timeout Oct 29 13:05:47 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: sending 1/4 msg of 4-Way Handshake Oct 29 13:05:47 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: received EAPOL-Key frame (2/4 Pairwise) Oct 29 13:05:47 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: sending 3/4 msg of 4-Way Handshake Oct 29 13:05:47 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: received EAPOL-Key frame (4/4 Pairwise) Oct 29 13:05:47 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 IEEE 802.1X: authorizing port Oct 29 13:05:47 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 RADIUS: starting accounting session 508C825B-0000005B Oct 29 13:05:47 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: pairwise key handshake completed (RSN) Oct 29 13:06:27 hostapd: ath0_wlan0: WPA rekeying GTK Oct 29 13:06:27 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:06:27 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:06:28 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: EAPOL-Key timeout Oct 29 13:06:28 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:06:28 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 WPA: EAPOL-Key timeout Oct 29 13:06:28 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:06:28 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: received EAPOL-Key frame (2/2 Group) Oct 29 13:06:28 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: group key handshake completed (RSN) Oct 29 13:06:29 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 WPA: EAPOL-Key timeout Oct 29 13:06:29 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:06:30 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 WPA: EAPOL-Key timeout Oct 29 13:06:30 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:06:31 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 WPA: EAPOL-Key timeout Oct 29 13:06:31 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 IEEE 802.1X: unauthorizing port Oct 29 13:06:31 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 IEEE 802.11: deauthenticated due to local deauth request Oct 29 13:06:31 hostapd: ath0_wlan0: STA 98:0c:82:bd:07:e3 IEEE 802.11: deassociated Oct 29 13:07:27 hostapd: ath0_wlan0: WPA rekeying GTK Oct 29 13:07:27 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:07:28 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: EAPOL-Key timeout Oct 29 13:07:28 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:07:28 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: received EAPOL-Key frame (2/2 Group) Oct 29 13:07:28 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: group key handshake completed (RSN) Oct 29 13:08:27 hostapd: ath0_wlan0: WPA rekeying GTK Oct 29 13:08:27 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:08:28 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: EAPOL-Key timeout Oct 29 13:08:28 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:08:28 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: received EAPOL-Key frame (2/2 Group) Oct 29 13:08:28 hostapd: ath0_wlan0: STA 3c:d0:f8:12:3b:58 WPA: group key handshake completed (RSN)
Log for the one that WORKS OK (no MBUFs increase at all):
Oct 29 13:00:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: EAPOL-Key timeout Oct 29 13:00:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:00:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: received EAPOL-Key frame (2/2 Group) Oct 29 13:00:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: group key handshake completed (RSN) Oct 29 13:01:29 hostapd: ath0_wlan0: WPA rekeying GTK Oct 29 13:01:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:01:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: EAPOL-Key timeout Oct 29 13:01:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:01:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: received EAPOL-Key frame (2/2 Group) Oct 29 13:01:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: group key handshake completed (RSN) Oct 29 13:02:29 hostapd: ath0_wlan0: WPA rekeying GTK Oct 29 13:02:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:02:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: EAPOL-Key timeout Oct 29 13:02:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:02:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: received EAPOL-Key frame (2/2 Group) Oct 29 13:02:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: group key handshake completed (RSN) Oct 29 13:03:29 hostapd: ath0_wlan0: WPA rekeying GTK Oct 29 13:03:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:03:30 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: EAPOL-Key timeout Oct 29 13:03:30 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:03:31 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: EAPOL-Key timeout Oct 29 13:03:31 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:03:31 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: received EAPOL-Key frame (2/2 Group) Oct 29 13:03:31 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: group key handshake completed (RSN) Oct 29 13:04:29 hostapd: ath0_wlan0: WPA rekeying GTK Oct 29 13:04:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:04:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: received EAPOL-Key frame (2/2 Group) Oct 29 13:04:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: group key handshake completed (RSN) Oct 29 13:05:29 hostapd: ath0_wlan0: WPA rekeying GTK Oct 29 13:05:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:05:30 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: EAPOL-Key timeout Oct 29 13:05:30 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:05:31 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: EAPOL-Key timeout Oct 29 13:05:31 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:05:32 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: EAPOL-Key timeout Oct 29 13:05:32 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:05:32 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: received EAPOL-Key frame (2/2 Group) Oct 29 13:05:32 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: group key handshake completed (RSN) Oct 29 13:06:29 hostapd: ath0_wlan0: WPA rekeying GTK Oct 29 13:06:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:06:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: received EAPOL-Key frame (2/2 Group) Oct 29 13:06:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: group key handshake completed (RSN) Oct 29 13:07:29 hostapd: ath0_wlan0: WPA rekeying GTK Oct 29 13:07:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:07:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: received EAPOL-Key frame (2/2 Group) Oct 29 13:07:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: group key handshake completed (RSN) Oct 29 13:08:29 hostapd: ath0_wlan0: WPA rekeying GTK Oct 29 13:08:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: sending 1/2 msg of Group Key Handshake Oct 29 13:08:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: received EAPOL-Key frame (2/2 Group) Oct 29 13:08:29 hostapd: ath0_wlan0: STA e8:39:df:dd:3a:fd WPA: group key handshake completed (RSN)
They certainly look different. The one that increases MBUFs constantly shows 802.1X related entries, every minute or so. I do not have any RADIUS server configured, nor any 802.1X setting in any of the boxes. I really don't know if this is normal or not.
I would appreciate if someone could tell me if this might be related. I'm sure there is something messed up with the config. Otherwise, I guess I will reformat the CF card and reinstall and configure everything from scratch (I'm trying to avoid this since this is happening on a production system).
Thanks!
Regards!
-
Here's also the output on netstat -m on both systems, as of right now:
SYSTEM THAT SHOWS MBUF INCREASES:
1568/3052/4620 mbufs in use (current/cache/total) 1481/2617/4098/8640 mbuf clusters in use (current/cache/total/max) 1480/2616 mbuf+clusters out of packet secondary zone in use (current/cache) 0/27/27/4320 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/2160 9k jumbo clusters in use (current/cache/total/max) 0/0/0/1080 16k jumbo clusters in use (current/cache/total/max) 3419K/6105K/9524K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/4/2416 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines
SYSTEM THAT WORKS OK:
430/470/900 mbufs in use (current/cache/total) 426/92/518/8640 mbuf clusters in use (current/cache/total/max) 425/87 mbuf+clusters out of packet secondary zone in use (current/cache) 0/27/27/4320 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/2160 9k jumbo clusters in use (current/cache/total/max) 0/0/0/1080 16k jumbo clusters in use (current/cache/total/max) 959K/409K/1369K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/4/2416 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines
-
May not be a root cause fix, but you may be able to raise the MBUF value to provide relief while researching further. I outlined the procedure a while ago in the linked thread. Hope it helps.
http://forum.pfsense.org/index.php/topic,37754.msg194854.html#msg194854
-
This is driving me crazy ???
This is certainly related to the Wi-Fi, and furthermore, I am pretty sure I have identified the problem: this happens when a Samsung Galaxy S2 connects to the network
WTF?
Does this make any sense to anybody??
-
HOLY SH*T I FINALLY FIGURED IT OUT.
It was the traffic shaper on the ath0 interface.
On my 5884592072th attempt to find something related on the Internet I came across this post (which references this post) on which Ermal suggested to disable the traffic shaper on the interface. Since I did that, no problems whatsoever.
Anyway I can't 100% replicate it, since in my case was not happening with all clients or traffic. For some reason a manager's Android device was triggering the problem most of the times, but I couldn't find a specific pattern.
Now I can stop making up excuses to reset the network connectivity every day =P
Should this be reported on the FreeBSD list? Anyway, I don't have much more info than this…
Regards and thanks to everybody
-
Should this be reported on the FreeBSD list?
It is probably worthwhile at least investigating further.
1. pfSense 2.0.x is based on a now old version (8.1) of FreeBSD. Maybe the problem is fixed in pfSense 2.1 snapshot builds which use FreeBSD 8.3.
2. I haven't seen this problem when my Android phone uses my pfSense 2.0.1 Atheros based AP, but that might be because I don't use my phone "enough" or do the right sort of things on it.
3. It would probably be very helpful if you could run some carefully controlled test to see if you can further identify what activity on the phone provokes this problem. What sorts of things are done on the phone? Do the one at time and watch mbuf usage before during and after.
-
As a further data point I am running an atheros card as an AP in 2.0.1 and use an (android based) nexus 7 extensively. I've never seen mbufs climbing. I have a single antenna setup in case that makes any odds. No traffic shaping on that interface either.
Steve
-
No traffic shaping on that interface either.
This was the key here. I believe there is some leak between the traffic shaper and the ath driver. This must be triggered by some specific traffic, which in my case, originated from that phone.
Could any of you enable the shaper on the WLAN for testing purposes? In my case, I had a very simple setup with PRIQ, with only 1 rule to prioritize SQL traffic going to a server in my LAN (which I don't think has anything to do here, no way that traffic was originating from the phone).
Regards!
-
Just wanted to add to my experience:
Alix 2D13 with USB Wifi adapter based on RT3070 chipset (run driver). At some point in the last couple of weeks I started having random reboots which I think are related to the wireless adapter. Yesterday I opened a thread on the 2.1 forum with more details.
I'm using WPA2/AES and I am seeing a continuous stream of "WPA: EAPOL-Key timeout" from Android devices (Nexus 7s and Nexus S) in the logs. I am also using the traffic shaper on a bridged interface (LAN + WLAN).
In my post in the 2.1 forum I said I couldn't pinpoint when the reboot started, but after finding this, I remembered that the reboots started not long after I enabled traffic shaping.
I will also monitor the MBUF usage and see if it coincides with a reboot.
-
Thanks for all your information. I just want to share mine. Just recently noticed the increasing of MBUF and system lock up. My installation was on late January, since then the router was running fine for 2 months.
Last week I noticed the VOIP quality got a little worse, then I setup the traffic shaper with the wizard. A week later I got my first lock up, then today I saw the MBUF is on 25374/25600.
The system:
Version 2.0.2-RELEASE (i386)
built on Fri Dec 7 16:30:25 EST 2012
FreeBSD 8.1-RELEASE-p13
Platform nanobsd (2g)
NanoBSD Boot Slice pfsense0 / ad0s1
CPU Type VIA Nehemiah
Hardware crypto VIA PadlockWAN VR0
LAN UE0
OPT1 Ath0 -
It is definetely some sort of problem between the ath0 driver and the traffic shaper…
At work we recently started dealing with VoIP traffic. The shaper on the other interfaces works great, but I will need to come back to this and figure it out soon. I need the shaper on the WLAN as well... :-\
I'll keep you updated on any findings.
Regards!