Kernel Panic
-
@ermal:
Uploaded a new kernel http://files.pfsense.org/kernel_new.gz
Beaware that you need to be updated to the latest snapshot before using this kernel otherwise you will get hangs.the kernel is in the quote, and the snap is the latest at this time in the update manager
-
Just because that is true now doesn't mean that will be true in a couple hours. Timestamps are important.
-
As I had asked before, since things keep getting astray, does the panic I received on my Soekris board after doing this process relevant to anything or should I be ignoring it?
Updated to pfSense-Full-Update-2.0-BETA5-i386-20110126-0422.tgz 26-Jan-2011 09:47 83M
/etc/rc.conf_mount_rw fetch http://files.pfsense.org/kernel_new.gz cp kernel_new.gz /boot/kernel/kernel.gz
Rebooted
Pushed traffic over OpenVPNFatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xd553ebc0 frame pointer = 0x28:0xd553ebcc code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq5: vr1) [thread] Stopped at 0: *** error reading from address 0 *** db> bt Tracing pid 12 tid 64026 td 0xc3aa4780 m_tag_delete(c4dde600,c4de0b3d) at m_tag_delete+0x32 m_tag_delete_chain(c4dde600,0,d553ec2c,c0c94f99,c4dde600,...) at m_tag_delete_chain+0x82 mb_dtor_mbuf(c4dde600,100,0,c0a9041f,c3a92000,...) at mb_dtor_mbuf+0x25 uma_zfree_arg(c1d7b000,c4dde600,0,c3a28000,d553ec70,...) at uma_zfree_arg+0x29 m_freem(c4dde600,c3debe00,c3a92000,7f,7f,...) at m_freem+0x43 vr_txeof(0,c12c69c0,d5530008,0,0,...) at vr_txeof+0x340 vr_intr(c3a28000,0,109,dbd32e12,b15,...) at vr_intr+0x1c7 intr_event_execute_handlers(c39757f8,c3972180,c0e6ba48,52d,c39721f0,...) at intr_event_execute_handlers+0x14b ithread_loop(c3a8e450,d553ed38,6000,f600001b,0,...) at ithread_loop+0x6b fork_exit(c09ae910,c3a8e450,d553ed38) at fork_exit+0x91 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xd553ed70, ebp = 0 --- db>[/thread]
-
On the same link posted another kernel.
Thank you very much to every one helping on this.LostIgnorance its very valuable just that this is not chat and i just do not want to spam with useless comments.
-
No problem for testing!
so should i download just the kernel from the same link with out a snap update?
I am on snap
2.0-BETA5 (i386)
built on Wed Jan 26 09:44:03 EST 2011Just want to make sure.
-
Yeah just download the file after this post i updated it again.
-
FYI i just committed some patches to pfSense repo which should fix the hangs and the panics.
All those on non full installs of i386 grab the next snapshot and test when it comes out. -
Since the snapshot builder won't be done with its current run for a while yet, I ran off an i386 full update on my builder and uploaded it here:
http://cvs.pfsense.org/~jimp/pfSense-Full-Update-2.0-BETA4-i386-20110127-1627.tgzIf someone wants to give that a shot, have at it. The amd64 builder doesn't take quite as long, it should be up in a couple hours, the nanobsd builds probably won't be up until after midnight.
-
Jimp,
does this have the kernel that had to be downloaded included or do we need to still do the update then download the kernel from ermal's post?
Thanks! -
Ermal checked the patches into the repo so this should be equivalent to that, or better.
-
Sorry to be so dense… does this mean that the update that I'm performing right now will have the discussed fixes?
-
Not if you're doing an auto update. The snapshots are still building, probably won't be done for a couple hours.
I posted a URL earlier to a build I made for i386 full installs that should have the fixes, though it didn't fix my problem (FTP still has issues…)
-
I just installed your build from the link jimp…testing now...Update: with the snap from your link the system did panic.
jimp, is the text dump in this build? i just went to look for it and do not see the file.
-
tested one more time and it crashed again. :(
-
Textdumps should be on in those builds… see my other post: http://forum.pfsense.org/index.php/topic,32711.0.html
I guess just wait and try the 'real' snapshot when it uploads later tonight.
-
yep, i saw the post and checked /var/crash
nothing in the folder.will the build on the snap server be different then the one you posted?
-
In theory it should be about the same, but in practice that isn't always the case.
-
quick question, I am in the freeze/hang camp. I don't get a kernel panic on the console, so when I update to the next snapshot should I expect a dump in /var/crash when the system hangs?
-
No - the hangs you can't get out of except by a power cycle. There is no crash dump from a hang, only from a panic. (This thread is really for the panics… the hangs have a separate thread)
-
ya that's what I thought, thanks.
-
Today one of the fw's (running "2.0-BETA5 (amd64) built on Thu Jan 27 19:46:43 EST 2011") managed to output something in /var/crash/ before it died.
See attached files.
-
That seems like another panic related to carp and your link going up down.
So seems like progress, at least to those who do not have intensive carp clusters but just hangs.
I will get back when i hav emore info on this new issue. -
@ermal:
That seems like another panic related to carp and your link going up down.
So seems like progress, at least to those who do not have intensive carp clusters but just hangs.
I will get back when i hav emore info on this new issue.I guess that would explain why the backup spontaneously becomes the master every now and then?
I get this on the backup:
BACKUP -> MASTER (preempting a slower master)and this on the master:
MASTER -> BACKUP (more frequent advertisement received)Adjusting advskew probably won't help here, I guess?
-
I have pushed some patches that should solve even the carp panic Slaygon.
So newest snapshots should have the fix for you.
I would skip the next one coming and get the other one. -
just tried snap
2.0-BETA5 (i386)
built on Fri Jan 28 05:30:15 EST 2011and it crashed. :)
The only one i had any luck with is the first kernel that was posted.
Just so i am clear… the snaps have the correct kernel or do i still need to download from the separate link?
-
@ermal:
I have pushed some patches that should solve even the carp panic Slaygon.
So newest snapshots should have the fix for you.
I would skip the next one coming and get the other one.So the "Fri Jan 28 05:30:15 EST 2011" is not the one you recommend for the carp fixes, but instead wait for the next one?
I'm currently on "Fri Jan 28 00:53:50 EST 2011".Oh, and my backup just became the master spontaneously again.
Excellent work btw. Much appreciated!
-
yes, wait for the next one. It was just restarted to pick up the patches he pushed.
-
@vito please type a bt at that prompt next time.
-
Loaded 2.0-BETA5 (i386) built on Fri Jan 28 05:30:15 EST 2011 running on a dell Intel(R) Pentium(R) 4 CPU 2.40GHz onboard gig nic (em)
Enter an option: Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex em0 (EM TX Lock) r = 0 (0xc2f52580) locked @ /usr/pfSensesrc/src/sys/dev/e1000/if_lem.c:1350 KDB: stack backtrace: db_trace_self_wrapper(c0eb7ccb,ccc3ca90,c0a421c5,546,0,...) at db_trace_self_wrapper+0x26 kdb_backtrace(546,0,ffffffff,c145df04,ccc3cac8,...) at kdb_backtrace+0x29 _witness_debugger(c0eba1e3,ccc3cadc,4,1,0,...) at _witness_debugger+0x25 witness_warn(5,0,c0ef8592,2d05a8c0,c131be40,...) at witness_warn+0x20d trap(ccc3cb68) at trap+0x19e calltrap() at calltrap+0x6 --- trap 0xc, eip = 0xc0a61478, esp = 0xccc3cba8, ebp = 0xccc3cbb8 --- m_tag_delete(c2fef600,dedeadc0,c2fef600,c2fef600,ccc3cbf0,...) at m_tag_delete+0x48 m_tag_delete_chain(c2fef600,0,c0e6f12d,0,c2ed9720,...) at m_tag_delete_chain+0x3f mb_dtor_mbuf(c2fef600,100,0,c0a42c18,df,...) at mb_dtor_mbuf+0x35 uma_zfree_arg(c1d7e380,c2fef600,0,72,ccc3cc84,...) at uma_zfree_arg+0x29 m_freem(c2fef600,4,c0e6f12d,b87,c2f4e000,...) at m_freem+0x43 lem_txeof(c2f52580,0,c0e6f12d,546,c2f525bc,...) at lem_txeof+0x158 lem_handle_rxtx(c2f4e000,1,c0eb959c,4f,c2edb8d8,...) at lem_handle_rxtx+0x60 taskqueue_run(c2edb8c0,c2edb8d8,c0ea6955,0,c0eb2bfb,...) at taskqueue_run+0x103 taskqueue_thread_loop(c2f525ec,ccc3cd38,c0eaf76a,344,c131be40,...) at taskqueue_thread_loop+0x68 fork_exit(c0a3b440,c2f525ec,ccc3cd38) at fork_exit+0xb8 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xccc3cd70, ebp = 0 --- Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xdedeadc0 fault code = supervisor read, page not present instruction pointer = 0x20:0xc0a61478 stack pointer = 0x28:0xccc3cba8 frame pointer = 0x28:0xccc3cbb8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (em0 taskq) [thread] Stopped at m_tag_delete+0x48: movl 0(%ecx),%eax db> bt Tracing pid 0 tid 64050 td 0xc2f4bc80 m_tag_delete(c2fef600,dedeadc0,c2fef600,c2fef600,ccc3cbf0,...) at m_tag_delete+0x48 m_tag_delete_chain(c2fef600,0,c0e6f12d,0,c2ed9720,...) at m_tag_delete_chain+0x3f mb_dtor_mbuf(c2fef600,100,0,c0a42c18,df,...) at mb_dtor_mbuf+0x35 uma_zfree_arg(c1d7e380,c2fef600,0,72,ccc3cc84,...) at uma_zfree_arg+0x29 m_freem(c2fef600,4,c0e6f12d,b87,c2f4e000,...) at m_freem+0x43 lem_txeof(c2f52580,0,c0e6f12d,546,c2f525bc,...) at lem_txeof+0x158 lem_handle_rxtx(c2f4e000,1,c0eb959c,4f,c2edb8d8,...) at lem_handle_rxtx+0x60 taskqueue_run(c2edb8c0,c2edb8d8,c0ea6955,0,c0eb2bfb,...) at taskqueue_run+0x103 taskqueue_thread_loop(c2f525ec,ccc3cd38,c0eaf76a,344,c131be40,...) at taskqueue_thread_loop+0x68 fork_exit(c0a3b440,c2f525ec,ccc3cd38) at fork_exit+0xb8 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xccc3cd70, ebp = 0 --- db> Happened instantly when I tried to OpenVPN in[/thread]
-
it is a hard lock…can't type anything in.
On reboot, still do not see the crash files that jimp added. -
it is a hard lock…can't type anything in.
On reboot, still do not see the crash files that jimp added.As I said before, those won't be created for hard locks, only for panics/crashes.
-
Just seen this come out: "Fri Jan 28 13:06:21 EST 2011"…
Is this the one where the carp sync problems are addressed?If so, it will take me a number of hours to get this applied and verified. 1-4 hours of active monitoring before the bug usually appears, however, I will not be able to start testing all too soon. Expecting about 2 to 8 hours before I can start monitoring.
If anyone could start testing sooner, that would be great!
(for those that don't know, I run two DL-type HP servers with quad Intel gbit NICs in them)
And again, great work pf team! Really appreciated!
-
If the carp patches were also pushed to the AMD version 2.0-BETA5 (amd64)
built on Fri Jan 28 13:06:21 EST 2011 then it's not fixed for me. As soon as I make any change and it syncs the backup firewall locks or panics. It's not a hard lock though, it does bring me to a prompt but not seeing any of crash files in the /var/crash.Andy
-
If the carp patches were also pushed to the AMD version 2.0-BETA5 (amd64)
built on Fri Jan 28 13:06:21 EST 2011 then it's not fixed for me. As soon as I make any change and it syncs the backup firewall locks or panics. It's not a hard lock though, it does bring me to a prompt but not seeing any of crash files in the /var/crash.Andy
It's the amd64 version I am running too.
Not too good news there, Andy. Sorry to hear.
And usually, with the carp errors, there seems to be a freeze, rather than a crash, which will render no trace dumps.
Sometimes, very unusual though, I get a debug prompt at which I can get a backtrace. Mostly the box just hangs.I guess the devs would appreciate any input they can get hold of, so if you do get to a prompt and can type something in, give "bt" (as in backtrace) a go and see if you could provide any more info.
Cheerio.
-
I just tried on my amd64 vm and when I force a manual panic (sysctl debug.kdb.panic=1) I get a textdump, and no db> prompt. I'm not sure why someone on a current snapshot update would still be getting left at a db> prompt when it crashes, they should be gathering the info and rebooting on their own.
Doesn't help with the hangs, though, but the hangs are a different problem (and a different thread :-)
-
Here is what I get when carp tries to sync. This is running snapshot version amd64-20110128-0938.
Thanks,
Andy
-
That snap was before the last fixes went in. Try the newest one.
-
That snap was before the last fixes went in. Try the newest one.
pfSense-Full-Update-2.0-BETA5-amd64-20110128-0938.tgz 28-Jan-2011 13:09 99MThat's the latest I see on the snapshots server. Am I looking in the wrong spot? http://snapshots.pfsense.org/FreeBSD_RELENG_8_1/amd64/pfSense_HEAD/updates/?C=M;O=D
-
It's probably still building right now. Give it some time…
-
still problems on latest snap…
2.0-BETA5 (i386)
built on Sat Jan 29 01:09:59 EST 2011this time, i could type bt (no hard lock)
Here is the out put (sorry had to be a cell pic)