Kernel Panic
-
last time I tried to do what PJ2 outlined at the begining of this thread, it locked my system up and wouldn't boot. Help is always greatly appreciated if I can get it installed and get the panic.
EDIT: Would actually prefer having the developer kernel on this embedded device. I am using the 5501-70; sata wd 80g; running full embedded; vr0-wan, vr1-lan, vr2-wrls (friendly wifi for visitors), vr3-dmz; packages squid, lightsquid, nut, havp, nmap, snort; anything else?
-
I just added support to the builder to make an embedded kernel with debug options. I've got a test build going on my box, if one cranks out I'll upload it somewhere this evening or tomorrow. Failing that, the main snapshots should include it from here on. Not the next snapshot, but the one after it, should have them.
-
OMG, THANKS JIMP!!! I am excited to know if it works!
-
For those wanting to debug on ALIX/other embedded devices…
/etc/rc.conf_mount_rw fetch http://pingle.org/files/kernel_wrap_Dev.gz tar xzpf kernel_wrap_Dev.gz -C /boot/
And then reboot. It works on my ALIX.
The next snapshot after the one building now should have them in there as well, but not the one building now.
-
JimP, the above mentioned code worked great on my Soekris net-5501-70 board. Thanks for that, I hope to drive down the road and use my neighbors wifi to see if I can crash it now.
-
Grabbed the panic from the above mentioned Soekris board.
Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex vr1 (network driver) r = 0 (0xc3640aec) locked @ /usr/pfSensesrc/src/sys/dev/vr/if_vr.c:1675 KDB: stack backtrace: X_db_sym_numargs(c0c4e35f,d5341a88,c092ce85,68b,0,...) at X_db_sym_numargs+0x146 kdb_backtrace(68b,0,ffffffff,c11b73f4,d5341ac0,...) at kdb_backtrace+0x29 witness_display_spinlock(c0c50877,d5341ad4,4,1,0,...) at witness_display_spinlock+0x75 witness_warn(5,0,c0c823d7,c1981a94,c3556aa0,...) at witness_warn+0x20d trap(d5341b60) at trap+0x172 alltraps(c3899300,dedeadc0,c3899300,c3899300,d5341be8,...) at alltraps+0x1b m_tag_delete_chain(c3899300,0,c092cc2b,0,0,...) at m_tag_delete_chain+0x3f m_pkthdr_init(c3899300,100,0,c092cc2b,c0c3acf7,...) at m_pkthdr_init+0x8b5 uma_zfree_arg(c1981a80,c3899300,0,c3640000,d5341c70,...) at uma_zfree_arg+0x29 m_freem(c3899300,4,c0c3acf7,5a3,0,...) at m_freem+0x43 ucom_attach(c3640aec,0,c0c3acf7,68b,c3640aec,...) at ucom_attach+0x88f5 ucom_attach(c3640000,d5341cc8,c08d8a54,c107b5c0,c3554238,...) at ucom_attach+0xaa17 intr_event_execute_handlers(c3556aa0,c3554200,c0c4616c,533,c3554270,...) at intr_event_execute_handlers+0x125 intr_event_add_handler(c3644b60,d5341d38,c0c45ecc,344,c3556aa0,...) at intr_event_add_handler+0x42f fork_exit(c08c1b70,c3644b60,d5341d38) at fork_exit+0xb8 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xd5341d70, ebp = 0 --- Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address= 0xdedeadc0 fault code= supervisor read, page not present instruction pointer= 0x20:0xc094b038 stack pointer = 0x28:0xd5341ba0 frame pointer = 0x28:0xd5341bb0 code segment= base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process= 11 (irq5: vr1) [thread] Stopped at m_tag_delete+0x48: movl 0(%ecx),%eax db> bt Tracing pid 11 tid 64025 td 0xc358d780 m_tag_delete(c3899300,dedeadc0,c3899300,c3899300,d5341be8,...) at m_tag_delete+0x48 m_tag_delete_chain(c3899300,0,c092cc2b,0,0,...) at m_tag_delete_chain+0x3f m_pkthdr_init(c3899300,100,0,c092cc2b,c0c3acf7,...) at m_pkthdr_init+0x8b5 uma_zfree_arg(c1981a80,c3899300,0,c3640000,d5341c70,...) at uma_zfree_arg+0x29 m_freem(c3899300,4,c0c3acf7,5a3,0,...) at m_freem+0x43 ucom_attach(c3640aec,0,c0c3acf7,68b,c3640aec,...) at ucom_attach+0x88f5 ucom_attach(c3640000,d5341cc8,c08d8a54,c107b5c0,c3554238,...) at ucom_attach+0xaa17 intr_event_execute_handlers(c3556aa0,c3554200,c0c4616c,533,c3554270,...) at intr_event_execute_handlers+0x125 intr_event_add_handler(c3644b60,d5341d38,c0c45ecc,344,c3556aa0,...) at intr_event_add_handler+0x42f fork_exit(c08c1b70,c3644b60,d5341d38) at fork_exit+0xb8 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xd5341d70, ebp = 0 --- db> EDIT: Deleted crap from HAVP to reflect just panic[/thread]
-
Is vr1 bridged to anything?
-
not that I am aware of, I do force all traffic through the OpenVPN interface (Force all client generated traffic through the tunnel)
-
I did notice this on the console after having to uninstall and reinstall HAVP.
lock order reversal: 1st 0xc4279df4 ufs (ufs) @ /usr/pfSensesrc/src/sys/kern/vfs_mount.c:1204 2nd 0xc474e6a0 syncer (syncer) @ /usr/pfSensesrc/src/sys/kern/vfs_subr.c:2203 KDB: stack backtrace: X_db_sym_numargs(c0c4e35f,d6704a3c,c092ce85,c091d9bb,c0c512cb,...) at X_db_sym_numargs+0x146 kdb_backtrace(c091d9bb,c0c512cb,c3516ee8,c3517020,d6704a98,...) at kdb_backtrace+0x29 witness_display_spinlock(c0c512cb,c474e6a0,c0c5872e,c3517020,c0c585a4,...) at witness_display_spinlock+0x75 witness_checkorder(c474e6a0,9,c0c585a4,89b,0,...) at witness_checkorder+0x839 __lockmgr_args(c474e6a0,80100,c474e6bc,0,0,...) at __lockmgr_args+0x7f5 vop_stdlock(d6704bb4,3,c0c585a4,80100,c474e648,...) at vop_stdlock+0x62 VOP_LOCK1_APV(c1032b00,d6704bb4,c08d9223,c10560a0,c474e648,...) at VOP_LOCK1_APV+0xb5 _vn_lock(c474e648,80100,c0c585a4,89b,0,...) at _vn_lock+0x5e insmntque(d6704c58,c097342e,c474e648,0,c0c57db8,...) at insmntque+0x288 vrele(c474e648,0,c0c57db8,4f9,80,...) at vrele+0x10 dounmount(c37b2000,8080000,c3b3a780,47e,fdf65b4a,...) at dounmount+0x3ce unmount(c3b3a780,d6704cf8,c3b3a780,d6704d2c,206,...) at unmount+0x2bf syscall(d6704d38) at syscall+0x1da Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (22, FreeBSD ELF32, unmount), eip = 0x280dfa9f, esp = 0xbfbfe61c, ebp = 0xbfbfe6e8 ---
-
Don't worry about LORs, they're mostly harmless.
-
Did notice this from both panics though.
Panic from old P4 computer:
Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex em0 (EM TX Lock) r = 0 (0xc2f52580) locked @ /usr/pfSensesrc/src/sys/dev/e1000/if_lem.c:1350
Panic from Soekris board:
Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex vr1 (network driver) r = 0 (0xc3640aec) locked @ /usr/pfSensesrc/src/sys/dev/vr/if_vr.c:1675
Is it just coincidence?
-
Hi, I've reinstalled the secondary machine, it's now running 2.0-BETA5 (amd64) built on Mon Jan 17 22:14:04 EST 2011 (the primary has 2.0-BETA5 (amd64) built on Fri Jan 21 23:51:34 EST 2011).
I disabled the sync, created all the remaining carp vips on the primary (I've 12 of them right now), and re-enabled the sync. The secondary didn't crash.
What shall I do now? I'm a bit scared of upgrading it :-) thanks
-
Just wait for build late build from today and it should be safe to upgrade.
-
ok, and which kernel should I be running? SMP or devel? thanks
-
as of snap
2.0-BETA5 (i386)
built on Mon Jan 24 18:48:13 EST 2011fix did not work.
Maybe fix was not in this build?
-
as of snap
2.0-BETA5 (i386)
built on Mon Jan 24 18:48:13 EST 2011fix did not work.
Maybe fix was not in this build?
It should have been.
So it still crashed the exact same way, with the same panic in the same place?
-
Did notice this from both panics though.
Panic from old P4 computer:
Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex em0 (EM TX Lock) r = 0 (0xc2f52580) locked @ /usr/pfSensesrc/src/sys/dev/e1000/if_lem.c:1350
Panic from Soekris board:
Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex vr1 (network driver) r = 0 (0xc3640aec) locked @ /usr/pfSensesrc/src/sys/dev/vr/if_vr.c:1675
Is it just coincidence?
Hard to say for sure.
I just setup my ALIX and opened a ton of browser tabs and pushed a bunch of traffic through it and I managed to crash Chrome but not the router… I wish I could reproduce this.
-
as of snap
2.0-BETA5 (i386)
built on Mon Jan 24 18:48:13 EST 2011fix did not work.
Maybe fix was not in this build?
It should have been.
So it still crashed the exact same way, with the same panic in the same place?
Here is the new screen shoot of the panic.
Same symptoms, only when using openvpn
-
The key is only when you connect to OpenVPN and then after successful connection, open a browser and the page should time out and you should get disconnected from the OpenVPN connection.
-
I had ~10 browser tabs open to all kinds of sites, streaming a youtube video, and was copying a file over SMB, all over OpenVPN from a client on the WAN side of my ALIX.
Perhaps it's a bug specific to the vr chip in the Soekris.