Em kernel panic
-
Hi,
I'm fairly sure that since updating to 2.1.5, I've been getting fairly regular kernel panics always stopping in em0/1 que. I have an Intel 1000/PT dual port NIC. There doesn't seem to be anything I can do to reliably reproduce the problem and it seems to happen at any time of day, regardless of network traffic. I've submitted a number of error reports already, the latest one being earlier today at around 1000 GMT.
The panic is always a fatal trap 9 or 12.
What could be causing it?
-
Card loose in the slot is something that's caught me out before. Have you moved the box recently or done anything else that might have shifted the card like pull the cables etc?
Steve
-
I'm fairly sure I haven't moved it since it started getting upset with me but you might be on to something there. I'll try to reseat the card and cables to make sure they're firmly in place. Could it be another card that isn't in use but might be loose causing the problems?
-
Could be if that other card is causing some problem on the bus. How old is the machine?
Your crash reports might be more revealing to the devs. Speculation at this point. ;)
What did you update to 2.1.5 from? The em driver was updated for 2.1.2 I think.
Steve
-
It's a repurposed Acer Aspire E380 with the added NIC. Definitely getting on a bit but I've had it running as my router for 4 years without issue. I'm not sure what version I updated from; I don't think it was a large jump. Is there a way to find out? I'm hoping one of the devs can find my crash report and shed some more light on it for me.
-
It crashed again this evening so I managed to get the panic info. I've stripped out some of the info to shorten the post down. Does this shed any additional light on the problem?
Crash report begins. Anonymous machine information:
amd64
8.3-RELEASE-p16
FreeBSD 8.3-RELEASE-p16 #0: Mon Aug 25 08:27:11 EDT 2014 root@pf2_1_1_amd64.pfsense.org:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.8Crash report details:
Filename: /var/crash/bounds
1Filename: /var/crash/info.0
Dump header from device /dev/ad4s1b
Architecture: amd64
Architecture Version: 1
Dump Length: 70144B (0 MB)
Blocksize: 512
Dumptime: Mon Dec 1 21:45:13 2014
Hostname:
Magic: FreeBSD Text Dump
Version String: FreeBSD 8.3-RELEASE-p16 #0: Mon Aug 25 08:27:11 EDT 2014
root@pf2_1_1_amd64.pfsense.org:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.8
Panic String:
Dump Parity: 2139175447
Bounds: 0
Dump Status: goodFilename: /var/crash/textdump.tar.0
ddb.txt06000014000012437160551 7076 ustarrootwheeldb:0:kdb.enter.default> run lockinfo
db:1:lockinfo> show locks
No such command
db:1:locks> show alllocks
No such command
db:1:alllocks> show lockedvnods
Locked vnodes
db:0:kdb.enter.default> show pcpu
cpuid = 1
dynamic pcpu = 0xffffff807ecc0d80
curthread = 0xffffff00026c5460: pid 0 "em0 que"
curpcb = 0xffffff8023bbfd10
fpcurthread = none
idlethread = 0xffffff00022a7460: tid 100003 "idle: cpu1"
curpmap = 0xffffffff81392330
tssp = 0xffffffff81416e68
commontssp = 0xffffffff81416e68
rsp0 = 0xffffff8023bbfd10
gs32p = 0xffffffff81415ca0
ldt = 0xffffffff81415ce0
tss = 0xffffffff81415cd0
db:0:kdb.enter.default> bt
Tracing pid 0 tid 100042 td 0xffffff00026c5460
rn_match() at rn_match+0x25
pfr_match_addr() at pfr_match_addr+0xcb
pf_test_udp() at pf_test_udp+0x8bb
pf_test() at pf_test+0x1ff0
pf_check_in() at pf_check_in+0x39
pfil_run_hooks() at pfil_run_hooks+0xa2
ip_input() at ip_input+0x34e
netisr_dispatch_src() at netisr_dispatch_src+0x7b
ether_demux() at ether_demux+0x169
ether_input() at ether_input+0x191
em_rxeof() at em_rxeof+0x265
em_handle_que() at em_handle_que+0x48
taskqueue_run_locked() at taskqueue_run_locked+0x85
taskqueue_thread_loop() at taskqueue_thread_loop+0x4e
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
–- trap 0, rip = 0, rsp = 0xffffff8023bbfd00, rbp = 0 ---
db:0:kdb.enter.default> ps
pid ppid pgrp uid state wmesg wchan cmd
63589 38729 260 0 RL rrdtool
50106 47835 47835 0 S accept 0xffffff001b051b06 php
61321 48269 48269 0 S accept 0xffffff001b4b285e php
84769 1 84769 0 Ss accept 0xffffff001b051dae qstats
38721 36507 38721 0 S+ ttyin 0xffffff00027a40a8 tcsh
36507 35570 36507 0 S+ wait 0xffffff001c34d000 sh
35570 35103 35570 0 Ss+ wait 0xffffff001c34c000 sh
35103 8978 35103 0 Ss select 0xffffff0002d7a9c0 sshd
94023 1 48269 0 S (threaded) iperf
100207 S ucond 0xffffff001c3c3b00 iperf
100206 S accept 0xffffff0002d8b066 iperf
100090 S ucond 0xffffff0002859300 initial thread
18575 18010 260 0 S bpf 0xffffff001bc99a00 bandwidthd
18513 18010 260 0 S bpf 0xffffff0004695200 bandwidthd
18198 18010 260 0 S bpf 0xffffff0004535400 bandwidthd
18010 1 260 0 S bpf 0xffffff0004554000 bandwidthd
16568 14234 16568 0 S+ ttyin 0xffffff00027a20a8 sh
14234 14160 14234 0 S+ wait 0xffffff001b3d98e0 sh
14160 1 14160 0 Ss+ wait 0xffffff00042e1000 login
12392 11992 260 0 S bpf 0xffffff00045fd400 bandwidthd
12380 11992 260 0 S bpf 0xffffff00045fe800 bandwidthd
12072 11992 260 0 S bpf 0xffffff001bc96c00 bandwidthd
11992 1 260 0 S bpf 0xffffff0004537600 bandwidthd
10433 10131 10131 0 S nanslp 0xffffffff81395ec8 minicron
10131 1 10131 0 Ss wait 0xffffff00045f1470 minicron
9988 9819 9819 0 S nanslp 0xffffffff81395ec8 minicron
9819 1 9819 0 Ss wait 0xffffff001b8cf8e0 minicron
9593 9217 9217 0 S nanslp 0xffffffff81395ec8 minicron
9217 1 9217 0 Ss wait 0xffffff00045f3000 minicron
90166 1 90166 0 Ss (threaded) ntpd
100180 S select 0xffffff0002d7c440 ntpd
67775 1 67775 0 Ss nanslp 0xffffffff81395ec8 cron
41602 1 41602 0 Ss select 0xffffff001b47d140 openvpn
38729 1 260 0 S wait 0xffffff00042e2000 sh
28018 1 28018 0 Ss select 0xffffff001b42f7c0 racoon
9041 95359 9041 0 Ss (threaded) sshlockout_pf
100174 S nanslp 0xffffffff81395ec8 sshlockout_pf
100159 S piperd 0xffffff00029c15b0 initial thread
95359 1 95359 0 Ss select 0xffffff0002b457c0 syslogd
91726 1 91726 0 Ss select 0xffffff0002a081c0 miniupnpd
90528 1 90528 0 Ss select 0xffffff001b622bc0 powerd
77758 76108 76108 0 S accept 0xffffff001b6c485e php
77482 76108 76108 0 S accept 0xffffff001b6c485e php
77244 76108 76108 0 S accept 0xffffff001b6c485e php
76108 68069 76108 0 Ss wait 0xffffff00042128e0 initial thread
75980 74864 74864 0 S accept 0xffffff001bcbcdae php
75728 74864 74864 0 S accept 0xffffff001bcbcdae php
75230 74864 74864 0 S accept 0xffffff001bcbcdae php
75171 1 75171 0 Ss (threaded) filterdns
100142 S ucond 0xffffff0002b3f600 signal-thread
100141 S ucond 0xffffff0002ae7780 filterdns
74864 68069 74864 0 Ss wait 0xffffff000280f8e0 initial thread
74856 74140 74140 0 S accept 0xffffff001bcbcb06 php
74650 74140 74140 0 S accept 0xffffff001bcbcb06 php
74560 74140 74140 0 S accept 0xffffff001bcbcb06 php
74140 68069 74140 0 Ss wait 0xffffff001b3d7470 initial thread
73617 71228 71228 0 S accept 0xffffff001b5c930e php
73405 71228 71228 0 S accept 0xffffff001b5c930e php
73259 71228 71228 0 S accept 0xffffff001b5c930e php
71907 70497 70497 0 S accept 0xffffff001b64b5b6 php
71835 70497 70497 0 S accept 0xffffff001b64b5b6 php
71637 70497 70497 0 S accept 0xffffff001b64b5b6 php
71228 68069 71228 0 Ss wait 0xffffff0002a11000 initial thread
70912 68446 68446 0 S accept 0xffffff001b64b85e php
70884 68446 68446 0 S accept 0xffffff001b64b85e php
70767 68446 68446 0 S accept 0xffffff001b64b85e php
70497 68069 70497 0 Ss wait 0xffffff001b8cc000 initial thread
68988 68782 68782 0 S nanslp 0xffffffff81395ec8 minicron
68782 1 68782 0 Ss wait 0xffffff001b8cd8e0 minicron
68446 68069 68446 0 Ss wait 0xffffff001b3a78e0 initial thread
68069 1 67999 0 S kqread 0xffffff001b316a00 lighttpd
60411 1 60411 1002 Ss select 0xffffff001b430040 dhcpd
56253 1 55998 65534 S select 0xffffff001b430140 dnsmasq
49365 1 49365 0 Ss kqread 0xffffff001ba32900 dhcpleases
48269 47461 48269 0 Ss wait 0xffffff001b3a4000 initial thread
47835 47461 47835 0 Ss wait 0xffffff00028d7000 initial thread
47461 1 47445 0 S kqread 0xffffff001b0cb100 lighttpd
33294 32773 32773 0 S piperd 0xffffff000289f000 rrdtool
32773 1 32773 0 Ss select 0xffffff0002ae6640 apinger
27977 1 27977 0 Ss select 0xffffff0002b3fdc0 inetd
25864 1 26 0 S+ piperd 0xffffff00029c1888 logger
25541 1 26 0 S+ bpf 0xffffff0002a66400 tcpdump
16626 1 16626 65 Ss select 0xffffff0002a59740 dhclient
10600 1 10600 0 Ss select 0xffffff0002b454c0 dhclient
9197 1 9197 0 Ss (threaded) sshlockout_pf
100112 S nanslp 0xffffffff81395ec8 sshlockout_pf
100070 S uwait 0xffffff000285a300 initial thread
8978 1 8978 0 Ss select 0xffffff0002b450c0 sshd
271 1 271 0 Ss select 0xffffff0002ae7d40 devd
262 260 260 0 S kqread 0xffffff0002881900 check_reload_status
260 1 260 0 Ss kqread 0xffffff000286b400 check_reload_status
68 0 0 0 SL mdwait 0xffffff0002822000 [md0]
38 0 0 0 SL (threaded) zfskern
100077 D l2arc_fe 0xffffffff8193e5c0 [l2arc_feed_thread]
100076 D arc_recl 0xffffffff8192e720 [arc_reclaim_thread]
25 0 0 0 SL sdflush 0xffffffff813dcc78 [softdepflush]
24 0 0 0 SL vlruwt 0xffffff0002842470 [vnlru]
23 0 0 0 SL syncer 0xffffffff813ba4c0 [syncer]
22 0 0 0 SL psleep 0xffffffff813b9fe8 [bufdaemon]
21 0 0 0 SL pollid 0xffffffff81394ce8 [idlepoll]
20 0 0 0 SL pgzero 0xffffffff813de74c [pagezero]
19 0 0 0 SL psleep 0xffffffff813ddae8 [vmdaemon]
18 0 0 0 SL psleep 0xffffffff813ddaac [pagedaemon]
9 0 0 0 SL ccb_scan 0xffffffff81358ce0 [xpt_thrd]
8 0 0 0 SL pftm 0xffffffff8021de60 [pfpurge]
7 0 0 0 SL waiting_ 0xffffffff813c6540 [sctp_iterator]
17 0 0 0 SL cooling 0xffffff0002771358 [acpi_cooling0]
16 0 0 0 SL tzpoll 0xffffffff8135f4f0 [acpi_thermal]
15 0 0 0 SL (threaded) usb
100038 D - 0xffffff80002b5e18 [usbus1]
100037 D - 0xffffff80002b5dc0 [usbus1]
100036 D - 0xffffff80002b5d68 [usbus1]
100035 D - 0xffffff80002b5d10 [usbus1]
100033 D - 0xffffff80002ad460 [usbus0]
100032 D - 0xffffff80002ad408 [usbus0]
100031 D - 0xffffff80002ad3b0 [usbus0]
100030 D - 0xffffff80002ad358 [usbus0]
14 0 0 0 SL - 0xffffffff81395bc4 [yarrow]
6 0 0 0 SL crypto_r 0xffffffff813dbae0 [crypto returns]
5 0 0 0 SL crypto_w 0xffffffff813dbaa0 [crypto]
4 0 0 0 SL - 0xffffffff813917e8 [g_down]
3 0 0 0 SL - 0xffffffff813917e0 [g_up]
2 0 0 0 SL - 0xffffffff813917d0 [g_event]
13 0 0 0 SL (threaded) ng_queue
100010 D sleep 0xffffffff81302830 [ng_queue1]
100009 D sleep 0xffffffff81302830 [ng_queue0]
12 0 0 0 RL (threaded) intr
100061 I [swi1: netisr 1]
100049 I [irq1: atkbd0]
100046 I [irq258: mskc0]
100041 I [irq20: atapci1]
100040 I [irq15: ata1]
100039 I [irq14: ata0]
100034 I [irq22: ehci0]
100029 I [irq21: ohci0+]
100028 I [irq9: acpi0]
100026 I [swi5: +]
100024 I [swi2: cambio]
100019 I [swi6: task queue]
100018 I [swi6: Giant taskq]
100008 I [swi3: vm]
100007 I [swi4: clock]
100006 RunQ [swi4: clock]
100005 I [swi1: netisr 0]
11 0 0 0 RL (threaded) idle
100004 CanRun [idle: cpu0]
100003 CanRun [idle: cpu1]
1 0 1 0 SLs wait 0xffffff00022a48e0 [init]
10 0 0 0 SL audit_wo 0xffffffff813dc0d0 [audit]
0 0 0 0 RLs (threaded) kernel
100140 D - 0xffffff0002a59c00 [dummynet]
100075 D - 0xffffff000286eb80 [system_taskq_1]
100074 D - 0xffffff000286eb80 [system_taskq_0]
100045 D - 0xffffff0002778680 [em1 txq]
100044 Run CPU 0 [em1 que]
100043 D - 0xffffff0002762200 [em0 txq]
100042 Run CPU 1 [em0 que]
100027 D - 0xffffff00023d7700 -
So I was looking through my system logs today and I noticed something very strange. The first occurrence of this problem was on the 9th November and it's every few days after that.
I've highlighted all the interesting entries. They are all either at 47 or 17 past the hour or within a few seconds of that. That strikes me as a bit strange. Does pfsense have any cron entries that run with those intervals?
Edit: I've had a quick look in /etc/crontab and there's an entry that fires every 15 minutes, /etc/rc.filter_configure_sync . I'm not going to try and find out what it does but could it be related to the crash dump above?
-
filter_configure_sync would only be in cron if you're using schedules on rules. That seems to cause crashes for some people on 2.1x. I don't think that panic is em-related. My best guess is it's probably schedule-related. That issue doesn't appear to exist in 2.2 (though I could never replicate it on 2.1x), so I'd suggest upgrading to the most recent available snapshot (or RC that'll be heading to the mirrors soon).
The apinger restarts likely coincide with a reconnection or renewal of a dynamic WAN interface and are unrelated.
-
I'm aware that the apinger restarts are unrelated but they're a useful record of the times when it has paniced. I think you may have found the issue! I created a scheduled NAT rule on the 7th Nov, just 2 days before the problems started. Is there anything I can do with my system to help you potentially track down the issue while I still have it? I was planning to upgrade to 2.2 when the RC appeared.
-
Update on this if anyone else has the same problem. Updating to 2.2 RC does appear to have fixed the kernel panics. The router has been up for 6 days without a panic which didn't happen before. Thanks!