Kernel Panic
-
The snap from 2/3 would be the first to have the carp panic fixes.
There is still an issue (when deleting a VIP) but the panic you're seeing has likely been fixed in the most recent snap that is up now (Thu Feb 3 00:55:19 EST 2011)
-
Hi Jimp - My read of the build logs suggested that Ermal's patch didn't get included in last nights build even though the build was started after the commit was made.
Thu Feb 3 04:21:58 EST 2011 -|- >>> Sleeping for 86400 in between snapshot builder runs. Last known commit 847e5e8257b58906a0d12ce48275cae7162aab47
That commit listed there shows up in redmine as being a couple before ermals commit. Am I reading that wrong?
-
I started the builder by hand after his commit.
The patches are committed in the tools repo - the tools repo isn't tracked by the builder in the function you pasted. -
Hmmm - Just panicked, running 2.0-BETA5 (i386) built on Thu Feb 3 00:55:19 EST 2011 on both master & slave.
Updated a rule on the master, slave panicked.
I've attached images of the panic & bt.
![](http://Carp panic.png)
![](http://Carp bt.png)![Carp panic.png](/public/imported_attachments/1/Carp panic.png)
![Carp panic.png_thumb](/public/imported_attachments/1/Carp panic.png_thumb)
![Carp bt.png](/public/imported_attachments/1/Carp bt.png)
![Carp bt.png_thumb](/public/imported_attachments/1/Carp bt.png_thumb) -
The issue is fixed. Just the patch i committed had some typo from copy/pasto.
-
You should grab the next snapshot that will come out.
AFAIK it has no more such issues. -
I just started a new build after ermal's commit - the next new snapshot should include this fix.
-
Thanks Ermal! I'll give the next snapshot a try when it is available and report back.
-
Running 2.0-BETA5 (i386) built on Thu Feb 3 18:55:08 EST 2011 on both master & slave.
Slave panicked as soon as I updated a rule on the master.
Attached images of both panic & bt.
![](http://Carp Panic.png)
![](http://carp bt.png)
![Carp Panic.png](/public/imported_attachments/1/Carp Panic.png)
![Carp Panic.png_thumb](/public/imported_attachments/1/Carp Panic.png_thumb)
![carp bt.png](/public/imported_attachments/1/carp bt.png)
![carp bt.png_thumb](/public/imported_attachments/1/carp bt.png_thumb) -
Wrong snapshot sorry.
This is the commit that fixes the error https://rcs.pfsense.org/projects/pfsense-tools/repos/mainline/commits/08f1322c7d5d9fae8ef52dc356c75a59d2483263 -
Running 2.0-BETA5 (i386) built on Fri Feb 4 02:36:03 EST 2011 on both Master & Slave.
It ran longer this time before the panic. I changed my configuration a little before it happened this time.
Steps:
1 - Pulled the plug on the master WAN
(Slave became the new CARP master for all carp VIPs)
1a - Verified LAN clients were able to surf the web.
2 - Cleared XMLRPC Sync settings from the old master
3 - Set XMLRPC Sync settings on the new master (old slave)
4 - Changed DHCP settings on new master (old slave)
5 - Old master had a panic.![](http://Carp Panic1.png)
![](http://carp bt1.png)
![Carp Panic1.png](/public/imported_attachments/1/Carp Panic1.png)
![Carp Panic1.png_thumb](/public/imported_attachments/1/Carp Panic1.png_thumb)
![carp bt1.png](/public/imported_attachments/1/carp bt1.png)
![carp bt1.png_thumb](/public/imported_attachments/1/carp bt1.png_thumb) -
@PJ2:
2 - Cleared XMLRPC Sync settings from the old master
3 - Set XMLRPC Sync settings on the new master (old slave)Why on earth would you do that? That isn't necessary at all, and could lead to other issues.
-
Jimp/Ermal
just an update. I installed an updated firmware on another system about 5 days ago.
All seem good!
Thanks for all your help!!!!vito
-
As I was typing I wondered if that might be the response I would get. It's always amazing what crazy and unexpected things an end user can come up with. :)
This is a test environment, so I test things.
Comp1 = initial CARP master, XMLRPC master
Comp2 = initial CARP slave, XMLRPC slaveI wanted to know what would happen if Comp1 had an issue and Comp2 took over. Could I also make Comp2 become the XMLRPC master since it was acting as the CARP master? I did clear the XMLRPC settings on Comp1 first. I wouldn't have thought that reversing the roles would cause a panic.
-
When a master syncs the config to a slave it makes several alterations in the process.
Just flipping a slave to a master in the XMLRPC settings it going to cause you problems because of this.
If you really want to swap their roles, restore the master's config to the slave box and vice versa.
When a CARP slave takes over automatically - if you have configured everything right - it can function that way indefinitely until you bring the master back up, you just can't make any changes to the slave's config you want to keep. But that is all a topic for another thread. It's not the source of the panic - it's just a Bad Thing.
-
Good to know - I'll avoid doing that in the future.
-
Ok so I waited until I thought this snapshot would be okaish… Within 10 minutes of installing this my machine crashed:
2.0-BETA5 (i386)
built on Fri Feb 4 15:47:28 EST 2011Pentium(R) Dual-Core CPU E5400 @ 2.70GHz
[2.0-BETA5][root@fw.home]/root(1): uname -a
FreeBSD fw.home 8.1-RELEASE-p2 FreeBSD 8.1-RELEASE-p2 #1: Fri Feb 4 15:45:20 EST 2011 sullrich@FreeBSD_8.0_pfSense_2.0-snaps.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.8 i386–---------
Did not do anything special on the firewall when it died.
-
Bt backtrace.
-
Sorry, I haven't had time to exactly test the new snapshots till now. I loaded it up with 2.0-BETA5 (i386) built on Sun Feb 6 23:54:00 EST 2011. Crashed when trying to use openVPN. This is the P4 Dell machine that was described prior.
Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex em0 (EM TX Lock) r = 0 (0xc2f52580) locked @ /usr/pfSensesrc/src/sys/dev/e1000/if_lem.c:1350 KDB: stack backtrace: db_trace_self_wrapper(c0eb72bd,ccc3ca90,c0a41f05,546,0,...) at db_trace_self_wrapper+0x26 kdb_backtrace(546,0,ffffffff,c145d504,ccc3cac8,...) at kdb_backtrace+0x29 _witness_debugger(c0eb97d5,ccc3cadc,4,1,0,...) at _witness_debugger+0x25 witness_warn(5,0,c0ef7b9a,14,c131b440,...) at witness_warn+0x20d trap(ccc3cb68) at trap+0x19e calltrap() at calltrap+0x6 --- trap 0xc, eip = 0xc0a611ca, esp = 0xccc3cba8, ebp = 0xccc3cbb8 --- m_tag_delete(c341cd00,dedeadc0,c341cd00,c341cd00,ccc3cbf0,...) at m_tag_delete+0x5a m_tag_delete_chain(c341cd00,0,c0e6e71f,0,c2ed91e0,...) at m_tag_delete_chain+0x3f mb_dtor_mbuf(c341cd00,100,0,c0a42958,df,...) at mb_dtor_mbuf+0x35 uma_zfree_arg(c1d7e380,c341cd00,0,1e,ccc3cc84,...) at uma_zfree_arg+0x29 m_freem(c341cd00,4,c0e6e71f,b87,c2f4e000,...) at m_freem+0x43 lem_txeof(c2f52580,0,c0e6e71f,546,c2f525bc,...) at lem_txeof+0x158 lem_handle_rxtx(c2f4e000,1,c0eb8b8e,4f,c2edb8d8,...) at lem_handle_rxtx+0x60 taskqueue_run(c2edb8c0,c2edb8d8,c0ea5f47,0,c0eb21ed,...) at taskqueue_run+0x103 taskqueue_thread_loop(c2f525ec,ccc3cd38,c0eaed5c,344,c131b440,...) at taskqueue_thread_loop+0x68 fork_exit(c0a3b180,c2f525ec,ccc3cd38) at fork_exit+0xb8 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xccc3cd70, ebp = 0 --- Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address= 0xdedeadc0 fault code= supervisor read, page not present instruction pointer= 0x20:0xc0a611ca stack pointer = 0x28:0xccc3cba8 frame pointer = 0x28:0xccc3cbb8 code segment= base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process= 0 (em0 taskq) [thread] Stopped at m_tag_delete+0x5a: movl 0(%ecx),%eax db> bt Tracing pid 0 tid 64050 td 0xc2f4bc80 m_tag_delete(c341cd00,dedeadc0,c341cd00,c341cd00,ccc3cbf0,...) at m_tag_delete+0x5a m_tag_delete_chain(c341cd00,0,c0e6e71f,0,c2ed91e0,...) at m_tag_delete_chain+0x3f mb_dtor_mbuf(c341cd00,100,0,c0a42958,df,...) at mb_dtor_mbuf+0x35 uma_zfree_arg(c1d7e380,c341cd00,0,1e,ccc3cc84,...) at uma_zfree_arg+0x29 m_freem(c341cd00,4,c0e6e71f,b87,c2f4e000,...) at m_freem+0x43 lem_txeof(c2f52580,0,c0e6e71f,546,c2f525bc,...) at lem_txeof+0x158 lem_handle_rxtx(c2f4e000,1,c0eb8b8e,4f,c2edb8d8,...) at lem_handle_rxtx+0x60 taskqueue_run(c2edb8c0,c2edb8d8,c0ea5f47,0,c0eb21ed,...) at taskqueue_run+0x103 taskqueue_thread_loop(c2f525ec,ccc3cd38,c0eaed5c,344,c131b440,...) at taskqueue_thread_loop+0x68 fork_exit(c0a3b180,c2f525ec,ccc3cd38) at fork_exit+0xb8 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xccc3cd70, ebp = 0 --- db> It is kinda nice though that the pfSense now knows it crashed and asks me if I would like to send the error info to the developers. [quote]We have detected a crash report. Click here for more information.[/quote] though the only thing the tab shows on the error report is this [code]Crash report begins. Anonymous machine information: i386 8.1-RELEASE-p2 FreeBSD 8.1-RELEASE-p2 #1: Sun Feb 6 23:47:16 EST 2011 sullrich@FreeBSD_8.0_pfSense_2.0-snaps.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_Dev.8 Crash report details: 2048 [/code][/thread]
-
I don't have access to my pfSense box(s) to get crash info at the moment, but I can say I am in the same boat as LostInConfusion - still getting panics when I try to access a pfSense box through the VPN (though in my case it's IPSec, not OpenVPN). This is using the most recent snapshot. For what it's worth, the previously mentioned carp lock issue probably wasn't at the root of the issue I am having, since I am not using any carp functionality whatsoever with my setup…