Kernel Panic
-
I have a net5501-70, a Pro 1000GT (pci), and a spare pppoe DSL line. Jim, I can set this up with openvpn if you want to test against it. I have no experience with openvpn though, so would need some guidance setting that up.
Also note that my DSL line has pretty poor sync rates right now, and occasional disconnects, which is why it's not in use :/
-
??? JimP, I am a little confused, do you want me to type that in after the crash happens and the db comes up? If not, what would you like from the crash?
Just at a normal command prompt. It will show the chip id of the em card.
-
I have a net5501-70, a Pro 1000GT (pci), and a spare pppoe DSL line. Jim, I can set this up with openvpn if you want to test against it. I have no experience with openvpn though, so would need some guidance setting that up.
Also note that my DSL line has pretty poor sync rates right now, and occasional disconnects, which is why it's not in use :/
I'll keep that in mind, though I was hoping to do it locally since that makes it a lot easier to track down…
-
It might be worth trying without any packages. (the only one I have is OpenVPN client export) to see if it makes a difference.
One thing - for everyone seeing the panic on em(4) devices, the output from
pciconf -lvb
For the em card seen in the crash would be helpful.
Will try later tonight. :)
from my system
em1@pci0:3:0:0: class=0x020000 card=0x060a15d9 chip=0x10d38086 rev=0x00 hdr=0x00
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xfeae0000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0xec00, size 32, enabled
bar [1c] = type Memory, range 32, base 0xfeadc000, size 16384, enabled -
from another system having problems. (i can not get to the console to see what card)
em0@pci0:2:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xfe6e0000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0x9c00, size 32, enabled
bar [1c] = type Memory, range 32, base 0xfe6dc000, size 16384, enabled
em1@pci0:3:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xfe7e0000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0xac00, size 32, enabled
bar [1c] = type Memory, range 32, base 0xfe7dc000, size 16384, enabled
em2@pci0:4:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xfe8e0000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0xbc00, size 32, enabled
bar [1c] = type Memory, range 32, base 0xfe8dc000, size 16384, enabled
em3@pci0:5:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xfe9e0000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0xcc00, size 32, enabled
bar [1c] = type Memory, range 32, base 0xfe9dc000, size 16384, enabled
em4@pci0:6:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xfeae0000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0xdc00, size 32, enabled
bar [1c] = type Memory, range 32, base 0xfeadc000, size 16384, enabled
em5@pci0:7:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xfebe0000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0xec00, size 32, enabled
bar [1c] = type Memory, range 32, base 0xfebdc000, size 16384, enabled -
Chip ID is the same on both of those, and on another one I'm talking to that crashes… Mine that run fine are a different chip.
-
em0@pci0:2:0:0: class=0x020000 card=0x060a15d9 chip=0x10d38086 rev=0x00 hdr=0x00
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xfeae0000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0xdc00, size 32, enabled
bar [1c] = type Memory, range 32, base 0xfeadc000, size 16384, enabled
em1@pci0:3:0:0: class=0x020000 card=0x060a15d9 chip=0x10d38086 rev=0x00 hdr=0x00
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xfebe0000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0xec00, size 32, enabled
bar [1c] = type Memory, range 32, base 0xfebdc000, size 16384, enabledInteresting, my SM board has the same chip ID. My panics come around at 6-8 day intervals with no vpn use, no known factor other than time elapsed.
-
LostInIgnorance - one more test for you since you seem to hit it faster than anyone else - can you try switching your OpenVPN instance to TCP instead of UDP and see if it makes any difference? Just curious.
-
Just another "me too" .. panics in devd on the carp backup host. Just got them while adding firewall aliases on the primary (no recent changes to VIP addrs). I'll report if tomorrow's snapshot doesn't stop them.
-
vito and others with em failing, can you look at the output of this:
sysctl -a | grep 'dev.em.*fail'
And report any non-zero values. Specifically with dev.em.0.mbuf_alloc_fail and dev.em.0.tx_dma_fail
-
No non zero's
here is the output
dev.em.0.mbuf_alloc_fail: 0
dev.em.0.cluster_alloc_fail: 0
dev.em.0.tx_dma_fail: 0
dev.em.0.mac_stats.tso_ctx_fail: 0
dev.em.1.mbuf_alloc_fail: 0
dev.em.1.cluster_alloc_fail: 0
dev.em.1.tx_dma_fail: 0
dev.em.1.mac_stats.tso_ctx_fail: 0 -
Nope, over TCP still did this:
Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex vr1 (network driver) r = 0 (0xc3640aec) locked @ /usr/pfSensesrc/src/sys/dev/vr/if_vr.c:1675 KDB: stack backtrace: X_db_sym_numargs(c0c4e35f,d5341a88,c092ce85,68b,0,...) at X_db_sym_numargs+0x146 kdb_backtrace(68b,0,ffffffff,c11b7554,d5341ac0,...) at kdb_backtrace+0x29 witness_display_spinlock(c0c50877,d5341ad4,4,1,0,...) at witness_display_spinlock+0x75 witness_warn(5,0,c0c823d7,c1981a94,c3556aa0,...) at witness_warn+0x20d trap(d5341b60) at trap+0x172 alltraps(c39bf700,dedeadc0,c39bf700,c39bf700,d5341be8,...) at alltraps+0x1b m_tag_delete_chain(c39bf700,0,c092cc2b,0,0,...) at m_tag_delete_chain+0x3f m_pkthdr_init(c39bf700,100,0,c092cc2b,c0c3acf7,...) at m_pkthdr_init+0x8b5 uma_zfree_arg(c1981a80,c39bf700,0,c3640000,d5341c70,...) at uma_zfree_arg+0x29 m_freem(c39bf700,4,c0c3acf7,5a3,0,...) at m_freem+0x43 ucom_attach(c3640aec,0,c0c3acf7,68b,c3640aec,...) at ucom_attach+0x88f5 ucom_attach(c3640000,d5341cc8,c08d8a54,c107b5c0,c3554238,...) at ucom_attach+0xaa17 intr_event_execute_handlers(c3556aa0,c3554200,c0c4616c,533,c3554270,...) at intr_event_execute_handlers+0x125 intr_event_add_handler(c3644b60,d5341d38,c0c45ecc,344,c3556aa0,...) at intr_event_add_handler+0x42f fork_exit(c08c1b70,c3644b60,d5341d38) at fork_exit+0xb8 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xd5341d70, ebp = 0 --- Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xdedeadc0 fault code = supervisor read, page not present instruction pointer = 0x20:0xc094b038 stack pointer = 0x28:0xd5341ba0 frame pointer = 0x28:0xd5341bb0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 11 (irq5: vr1) [thread] Stopped at m_tag_delete+0x48: movl 0(%ecx),%eax db> bt Tracing pid 11 tid 64025 td 0xc358d780 m_tag_delete(c39bf700,dedeadc0,c39bf700,c39bf700,d5341be8,...) at m_tag_delete+0x48 m_tag_delete_chain(c39bf700,0,c092cc2b,0,0,...) at m_tag_delete_chain+0x3f m_pkthdr_init(c39bf700,100,0,c092cc2b,c0c3acf7,...) at m_pkthdr_init+0x8b5 uma_zfree_arg(c1981a80,c39bf700,0,c3640000,d5341c70,...) at uma_zfree_arg+0x29 m_freem(c39bf700,4,c0c3acf7,5a3,0,...) at m_freem+0x43 ucom_attach(c3640aec,0,c0c3acf7,68b,c3640aec,...) at ucom_attach+0x88f5 ucom_attach(c3640000,d5341cc8,c08d8a54,c107b5c0,c3554238,...) at ucom_attach+0xaa17 intr_event_execute_handlers(c3556aa0,c3554200,c0c4616c,533,c3554270,...) at intr_event_execute_handlers+0x125 intr_event_add_handler(c3644b60,d5341d38,c0c45ecc,344,c3556aa0,...) at intr_event_add_handler+0x42f fork_exit(c08c1b70,c3644b60,d5341d38) at fork_exit+0xb8 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xd5341d70, ebp = 0 --- db> [/thread]
-
It was worth trying/checking…
-
Can anyone of you please test with this kernel http://files.pfsense.org/kernel.gz
Just copy it under /boot/kernel/kernel.gz on pfsense and reboot. -
For those wanting to debug on ALIX/other embedded devices…
/etc/rc.conf_mount_rw fetch http://pingle.org/files/kernel_wrap_Dev.gz tar xzpf kernel_wrap_Dev.gz -C /boot/
And then reboot. It works on my ALIX.
The next snapshot after the one building now should have them in there as well, but not the one building now.
Should I use the above process, just changing the file to http://files.pfsense.org/kernel.gz?
-
That kernel is probably for full installs only though, not embedded.
-
so, not for me? Just for the old dell p4 with the built in em card and everyone else with gig cards.
-
Yeah, for the em and not for the Soekris.
-
Please try with tomorrows snapshots to see if it is fixed.
-
@ermal:
Can anyone of you please test with this kernel http://files.pfsense.org/kernel.gz
Just copy it under /boot/kernel/kernel.gz on pfsense and reboot.ermal,
I installed the kernel and it seems to work.
I copied about 2gb of data with no problems.