Kernel Panic
-
Could someone who can readily reproduce this panic give this custom firmware build a try?
http://cvs.pfsense.org/~jimp/pfSense-Full-Update-2.0-BETA5-i386-20110114-2041.tgz
It was built without a patch that does the extra mbuf operations that may be triggering the panic.
-
Bad news JimP, still crashes.
Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex em0 (EM TX Lock) r = 0 (0xc2f52580) locked @ /usr/pfSensesrc/src/sys/dev/e1000/if_lem.c:1350 KDB: stack backtrace: X_db_sym_numargs(c0eb72fb,ccc3ca90,c0a41f25,546,0,...) at X_db_sym_numargs+0x146 kdb_backtrace(546,0,ffffffff,c145d42c,ccc3cac8,...) at kdb_backtrace+0x29 witness_display_spinlock(c0eb9813,ccc3cadc,4,1,0,...) at witness_display_spinlock+0x75 witness_warn(5,0,c0ef7bc2,14,c131b3c0,...) at witness_warn+0x20d trap(ccc3cb68) at trap+0x19e alltraps(c2feeb00,dedeadc0,c2feeb00,c2feeb00,ccc3cbf0,...) at alltraps+0x1b m_tag_delete_chain(c2feeb00,0,c0e6e75d,0,c2ed9b50,...) at m_tag_delete_chain+0x3f reallocf(c2feeb00,100,0,c0a42978,df,...) at reallocf+0x8a5 uma_zfree_arg(c1d7e380,c2feeb00,0,b5,ccc3cc84,...) at uma_zfree_arg+0x29 m_freem(c2feeb00,4,c0e6e75d,b87,c2f4e000,...) at m_freem+0x43 ed_probe_RTL80x9(c2f52580,0,c0e6e75d,546,c2f525bc,...) at 0xc06ec4d8 ed_probe_RTL80x9(c2f4e000,1,c0eb8bcc,4f,c2edb918,...) at 0xc06efea0 taskqueue_run(c2edb900,c2edb918,c0ea5f85,0,c0eb222b,...) at taskqueue_run+0x103 taskqueue_thread_loop(c2f525ec,ccc3cd38,c0eaed9a,344,c131b3c0,...) at taskqueue_thread_loop+0x68 fork_exit(c0a3b1a0,c2f525ec,ccc3cd38) at fork_exit+0xb8 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xccc3cd70, ebp = 0 --- Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address= 0xdedeadc0 fault code= supervisor read, page not present instruction pointer= 0x20:0xc0a611c8 stack pointer = 0x28:0xccc3cba8 frame pointer = 0x28:0xccc3cbb8 code segment= base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process= 0 (em0 taskq) [thread] Stopped at m_tag_delete+0x48: movl 0(%ecx),%eax db> [/thread]
-
currently running 2.0-BETA5 (i386) built on Thu Jan 13 19:33:19 EST 201
not sure how far back this happens.in a test network -
2 machines, each w/ 4 intel nics (em0 - em3)
WAN, LAN, Opt1, Opt2 (CARP interface)Running CARP on WAN, LAN, Opt1 interfaces
Syncing on Opt2 interface.Recently started getting panics on box2 when changing settings on box1.
Panic & BackTrace from box2 included below.
Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x1a4 fault code = supervisor read, page not present instruction pointer = 0x20:0xc09ee51d stack pointer = 0x28:0xd670aa54 frame pointer = 0x28:0xd670aa70 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 253 (devd) [thread] Stopped at _mtx_lock_sleep+0x6d: movl 0x1a4(%ecx),%eax db> bt Tracing pid 253 tid 64081 td 0xc4142000 _mtx_lock_sleep(c40f16d0,c4142000,0,c0ecfc57,fd,...) at _mtx_lock_sleep+0x6d _mtx_lock_flags(c40f16d0,0,c0ecfc57,fd,0,...) at _mtx_lock_flags+0xf7 carp6_input(c3ae5800,c0286938,c40f3a00,c0ea9fce,3,...) at carp6_input+0x9bd ifioctl(c46a3b44,c0286938,c40f3a00,c4142000,c40cf900,...) at ifioctl+0x141e soo_ioctl(c412ddc8,c0286938,c40f3a00,c39aa400,c4142000,...) at soo_ioctl+0x415 kern_ioctl(c4142000,f,c0286938,c40f3a00,1a3b7d0,...) at kern_ioctl+0x1fd ioctl(c4142000,d670acf8,c0ef7af5,c0ecdaff,c41a77f8,...) at ioctl+0x134 syscall(d670ad38) at syscall+0x220 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x8088357, esp = 0xbfbfe89c, ebp = 0xbfbfe908 --- db> reboot [/thread]
-
Out of curiosity, what type of network cards do you have in that box? Is it rl and em both? Or just em? or just rl? Or something else?
-
one em network (gig embedded on the board of an old dell p4). All network traffic is VLAN'd on that one interface.
-
OK, just checking… It looks odd to me that the backtrace references ed_probe_RTL80x9 which is a really old realtek chip, but it may just be something weird that I don't know at that level in the kernel/network stack.
We have arranged serial console access with someone who has been able to reproduce the panic so hopefully we'll have a lead on a fix early next week.
-
OK, just checking… It looks odd to me that the backtrace references ed_probe_RTL80x9 which is a really old realtek chip,
Here's an extract from the stack trace:
m_freem(c2feeb00,4,c0e6e75d,b87,c2f4e000,...) at m_freem+0x43 ed_probe_RTL80x9(c2f52580,0,c0e6e75d,546,c2f525bc,...) at 0xc06ec4d8 ed_probe_RTL80x9(c2f4e000,1,c0eb8bcc,4f,c2edb918,...) at 0xc06efea0 taskqueue_run(c2edb900,c2edb918,c0ea5f85,0,c0eb222b,...) at taskqueue_run+0x103
Note the two ed_probe_RTL80x9 references are not accompanied by a symbol name and offset. I suspect ed_probe_RTL80x9 is merely the closest lower value global symbol but its too far away to warrant printing the PC as symbol+offset. If that is the case you shouldn't take too much notice of the ed_probe_RTL80x9.
-
We have arranged serial console access with someone who has been able to reproduce the panic so hopefully we'll have a lead on a fix early next week.
JimP, is there anything I can do to help out?
-
Not that I'm aware of. If the mbuf tag patch isn't the cause, it almost has to be the recent e1000 driver update (em, igb, etc).
-
Someone else had seen that once but so far we've been unable to replicate it so the real cause can be tracked down.
It seemed to be something in the configuration, though.
-
I am afraid to update since I haven't heard anything back. Is it still crashing or has it been fixed?
-
Nothing has changed with the drivers, but there are plenty of other things that have been fixed, it may be worth trying.
-
How can I get the logs (system.log is flushed every boot ?) so I can help targeting the problem ?
I have 4 NICs (5 if), all Intel em, both PCI NIC or MB integrated NIC.
The computer just freezes, no reboot.I have nmap and bandwithd installed. I'm using outboud Multi Wan, DHCP server, no VLAN, no traffic shaper, no VPN.
All was running good with an old snapshot. Freezes started after an upgrade a week ago. I currently have the lastest snapshot installed.
The freezes are random, sometimes pfSense runs some minutes, sometimes some hours.Any FTP transfert aborts with an error (There were problems a week or so with passive FTP, but they were connection problems, here transferts are aborted).
I think this can be linked to the problem if a buffer in the driver is the problem. -
How can I get the logs (system.log is flushed every boot ?) so I can help targeting the problem ?
I have 4 NICs (5 if), all Intel em, both PCI NIC or MB integrated NIC.
The computer just freezes, no reboot.I have nmap and bandwithd installed. I'm using outboud Multi Wan, DHCP server, no VLAN, no traffic shaper, no VPN.
All was running good with an old snapshot. Freezes started after an upgrade a week ago. I currently have the lastest snapshot installed.
The freezes are random, sometimes pfSense runs some minutes, sometimes some hours.Any FTP transfert aborts with an error (There were problems a week or so with passive FTP, but they were connection problems, here transferts are aborted).
I think this can be linked to the problem if a buffer in the driver is the problem.If you are seeing a freeze and not a reset/panic, then this thread isn't related. Start a new thread for that. These drivers haven't changed for several weeks now.
-
2.0-BETA5 (amd64)
built on Wed Jan 12 18:01:47 EST 2011I just experienced my first kernel panic last night after more than 8 days uptime. I'm using a SM X7SPA-H board with only the onboard Intel GBE (Intel 82574L Gigabit Ethernet). I'm not using openvpn, but both NICs have multiple vlans on them and deal only in tagged traffic.
Is there a reasonable chance that updating to the latest snap will resolve this? I don't know that I can reproduce this panic intentionally, as it hasn't happened before and I wasn't doing anything interesting when it happened. I do have clients, but the panic happened at my lowest traffic period of the day.
-
I don't think we have done anything that would have fixed these panics, if it is driver-related.
Are you able to switch to a developer kernel so you can obtain a backtrace?
-
JimP,
I will be trying the old dell with the gigE port today to find out if any changes you mentioned above may have done something. I haven't had the downtime lately to put it back into the system. Took that computer out and used another with 2 100 nics and got it running so I could remote back into the system.
So it wouldn't be worthwhile to put that old dell back in? -
I'll see about a backtrace. Shouldn't be a problem.
-
It may be worth trying again on a current snap. At least on today's snapshot FTP no longer freezes my router :-)
-
Hi,
I think that I'm experiencing the same problem here: I have 2 boxes running latest beta (2.0-BETA5 (amd64) built on Fri Jan 21 00:30:42 EST 2011 ) of pfsense (I have another cluster of pfsense 1.2.3 running) that I'd like to move to production soon (tomorrow or the day after tomorrow actually :-D): I have the sync enabled, and when I add another CARP IP on the primary box, the secondary crashes (I was able to reproduce it 4 times, the last one with a devel kernel).
This happens as soon as I create the new vip on the primary (the sync starts), not after pressing apply.You can see a picture of the crash + backtrace.
An interesting thing: the third time I tried (the first one with the devel kernel) I was able to create the carp ip on the primary, and it was successfully synced on the secondary.
But on the secondary logs I can see something that to me it looks like a "soft" or "recoverable" panic.What do you think?
thanks
Jan 21 17:37:00 check_reload_status: syncing firewall Jan 21 17:37:00 kernel: vip1: link state changed to DOWN Jan 21 17:37:00 kernel: em0: promiscuous mode disabled Jan 21 17:37:00 kernel: vip2: link state changed to DOWN Jan 21 17:37:00 kernel: em1: promiscuous mode disabled Jan 21 17:37:00 kernel: vip3: link state changed to DOWN Jan 21 17:37:00 kernel: em2: promiscuous mode disabled Jan 21 17:37:00 kernel: em2_vlan70: promiscuous mode disabled Jan 21 17:37:00 kernel: carp0: changing name to 'vip1' Jan 21 17:37:00 kernel: em0: promiscuous mode enabled Jan 21 17:37:00 kernel: vip1: INIT -> MASTER (preempting) Jan 21 17:37:00 kernel: vip1: link state changed to UP Jan 21 17:37:00 kernel: carp1: changing name to 'vip2' Jan 21 17:37:00 kernel: em1: promiscuous mode enabled Jan 21 17:37:00 kernel: vip2: INIT -> MASTER (preempting) Jan 21 17:37:00 kernel: vip2: link state changed to UP Jan 21 17:37:00 kernel: carp2: changing name to 'vip3' Jan 21 17:37:00 kernel: em2: promiscuous mode enabled Jan 21 17:37:00 kernel: em2_vlan70: promiscuous mode enabled Jan 21 17:37:00 kernel: vip3: INIT -> MASTER (preempting) Jan 21 17:37:00 kernel: vip3: link state changed to UP Jan 21 17:37:00 php: : CARP sync not being done because of missing sync ip! Jan 21 17:37:00 check_reload_status: syncing firewall Jan 21 17:37:00 kernel: carp3: changing name to 'vip4' Jan 21 17:37:00 kernel: vip4: INIT -> MASTER (preempting) Jan 21 17:37:00 kernel: vip4: link state changed to UP Jan 21 17:37:00 kernel: vip1: link state changed to DOWN Jan 21 17:37:00 kernel: vip1: INIT -> MASTER (preempting) Jan 21 17:37:00 kernel: vip1: link state changed to UP Jan 21 17:37:00 php: : CARP sync not being done because of missing sync ip! Jan 21 17:37:00 check_reload_status: reloading filter Jan 21 17:37:00 kernel: vip2: link state changed to DOWN Jan 21 17:37:00 kernel: vip2: INIT -> MASTER (preempting) Jan 21 17:37:00 kernel: vip2: link state changed to UP Jan 21 17:37:00 kernel: vip3: link state changed to DOWN Jan 21 17:37:00 kernel: vip3: INIT -> MASTER (preempting) Jan 21 17:37:00 kernel: vip3: link state changed to UP Jan 21 17:37:00 kernel: vip4: link state changed to DOWN Jan 21 17:37:00 kernel: vip4: INIT -> MASTER (preempting) Jan 21 17:37:00 kernel: vip4: link state changed to UP Jan 21 17:37:00 php: /xmlrpc.php: ROUTING: change default route to *** Jan 21 17:37:00 php: /xmlrpc.php: Removing static route for monitor*** and adding a new route through *** Jan 21 17:37:00 php: /xmlrpc.php: Removing static route for monitor *** and adding a new route through *** Jan 21 17:37:00 apinger: Exiting on signal 15. Jan 21 17:37:01 apinger: Starting Alarm Pinger, apinger(60120) Jan 21 17:37:01 php: /xmlrpc.php: Resyncing OpenVPN instances. Jan 21 17:37:02 kernel: vip2: MASTER -> BACKUP (more frequent advertisement received) Jan 21 17:37:02 kernel: vip2: link state changed to DOWN Jan 21 17:37:03 dhcpd: Internet Systems Consortium DHCP Server 4.1.1-P1 Jan 21 17:37:03 dhcpd: Copyright 2004-2010 Internet Systems Consortium. Jan 21 17:37:03 dhcpd: All rights reserved. Jan 21 17:37:03 dhcpd: For info, please visit https://www.isc.org/software/dhcp/ Jan 21 17:37:03 dnsmasq[51897]: exiting on receipt of SIGTERM Jan 21 17:37:04 dnsmasq[5180]: started, version 2.55 cachesize 10000 Jan 21 17:37:04 dnsmasq[5180]: compile time options: IPv6 GNU-getopt no-DBus I18N DHCP TFTP Jan 21 17:37:04 dnsmasq[5180]: reading /etc/resolv.conf Jan 21 17:37:04 dnsmasq[5180]: using nameserver 8.8.8.8#53 Jan 21 17:37:04 dnsmasq[5180]: read /etc/hosts - 2 addresses Jan 21 17:37:05 kernel: vip1: MASTER -> BACKUP (more frequent advertisement received) Jan 21 17:37:05 kernel: vip1: link state changed to DOWN Jan 21 17:37:05 dhcpd: Internet Systems Consortium DHCP Server 4.1.1-P1 Jan 21 17:37:05 dhcpd: Copyright 2004-2010 Internet Systems Consortium. Jan 21 17:37:05 dhcpd: All rights reserved. Jan 21 17:37:05 dhcpd: For info, please visit https://www.isc.org/software/dhcp/ Jan 21 17:37:06 kernel: vip3: MASTER -> BACKUP (more frequent advertisement received) Jan 21 17:37:06 kernel: vip3: link state changed to DOWN Jan 21 17:38:34 kernel: lock order reversal: Jan 21 17:38:34 kernel: 1st 0xffffffff8123e520 in_ifaddr_lock (in_ifaddr_lock) @ /usr/pfSensesrc/src/sys/netinet/if_ether.c:541 Jan 21 17:38:34 kernel: 2nd 0xffffff00026d55a0 carp_if (carp_if) @ /usr/pfSensesrc/src/sys/netinet/ip_carp.c:1160 Jan 21 17:38:34 kernel: KDB: stack backtrace: Jan 21 17:38:34 kernel: X_db_sym_numargs() at X_db_sym_numargs+0x15a Jan 21 17:38:34 kernel: witness_display_spinlock() at witness_display_spinlock+0x9e Jan 21 17:38:34 kernel: witness_checkorder() at witness_checkorder+0x81e Jan 21 17:38:34 kernel: _mtx_lock_flags() at _mtx_lock_flags+0x78 Jan 21 17:38:34 kernel: carp_iamatch() at carp_iamatch+0x38 Jan 21 17:38:34 kernel: arprequest() at arprequest+0x4b8 Jan 21 17:38:34 kernel: netisr_dispatch_src() at netisr_dispatch_src+0xb8 Jan 21 17:38:34 kernel: ether_demux() at ether_demux+0x18d Jan 21 17:38:34 kernel: ether_vlanencap() at ether_vlanencap+0x295 Jan 21 17:38:34 kernel: ed_probe_RTL80x9() at ed_probe_RTL80x9+0x7cf8 Jan 21 17:38:34 kernel: ed_probe_RTL80x9() at ed_probe_RTL80x9+0x7ff4 Jan 21 17:38:34 kernel: intr_event_execute_handlers() at intr_event_execute_handlers+0x66 Jan 21 17:38:34 kernel: intr_event_add_handler() at intr_event_add_handler+0x432 Jan 21 17:38:34 kernel: fork_exit() at fork_exit+0x12a Jan 21 17:38:34 kernel: fork_trampoline() at fork_trampoline+0xe Jan 21 17:38:34 kernel: --- trap 0, rip = 0, rsp = 0xffffff80000d6d30, rbp = 0 --- Jan 21 17:38:44 kernel: vip4: MASTER -> BACKUP (more frequent advertisement received) Jan 21 17:38:44 kernel: vip4: link state changed to DOWN Jan 21 17:40:26 check_reload_status: syncing firewall Jan 21 17:40:26 syslogd: exiting on signal 15 Jan 21 17:40:26 syslogd: kernel boot file is /boot/kernel/kernel Jan 21 17:40:26 php: : CARP sync not being done because of missing sync ip!
Except for this incident I must say that it's a pleasure to work with beta 2!