Fatal Trap 12 every few days…
-
Ever since I upgraded to the latest builds, i've experienced fatal trap 12 every few days with the same config.
Can anyone see what module is causing this?
Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0xc050048 fault code = supervisor read, page not present instruction pointer = 0x20:0xc0b0c4d1 stack pointer = 0x28:0xeb07f7d8 frame pointer = 0x28:0xeb07f7fc code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq256: em0:rx 0) [thread] Stopped at rn_match+0x11: movl 0xc(%eax),%ebx db> bt Tracing pid 12 tid 64029 td 0xc4aff000 rn_match(c130852c,c588e600,c5c31388,c52a001e,eb07f8b4,...) at rn_match+0x11 pfr_match_addr(c5c18000,c52a001a,2,e072,eb07f89c,...) at pfr_match_addr+0xe0 pf_test_udp(eb07f978,eb07f974,1,c4c59600,c52b4100,...) at pf_test_udp+0x8aa pf_test(1,c4b37400,eb07fb44,0,0,...) at pf_test+0x242f pf_check_in(0,eb07fb44,c4b37400,1,0,...) at pf_check_in+0x46 pfil_run_hooks(c1353e60,eb07fb94,c4b37400,1,0,...) at pfil_run_hooks+0x93 ip_input(c52b4100,10,982f000,0,0,...) at ip_input+0x359 netisr_dispatch_src(1,0,c52b4100,eb07fc00,c0af9e4f,...) at netisr_dispatch_src+0x70 netisr_dispatch(1,c52b4100,c4ab0700,c4b37400) at netisr_dispatch+0x20 ether_demux(c4b37400,c52b4100,3,0,3,...) at ether_demux+0x19f ether_input(c4b37400,c52b4100,eb07fc58,c4aff000,eb07fc4c,...) at ether_input+0x15d em_rxeof(1,f0de766,c4b33e40,c4b25300,eb07fcc0,...) at em_rxeof+0x184 em_msix_rx(c4ab0700,0,109,496b6ed8,16eaf,...) at em_msix_rx+0x23 intr_event_execute_handlers(c498f7f8,c4b25300,c0ee391c,52d,c4b25370,...) at intr_event_execute_handlers+0xde ithread_loop(c4b39470,eb07fd38,ffffffff,ffffffff,ffffffff,...) at ithread_loop+0x66 fork_exit(c0a11ad0,c4b39470,eb07fd38) at fork_exit+0x88 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xeb07fd70, ebp = 0 --- db>[/thread]
-
Here is vmstat -i…
# vmstat -i interrupt total rate irq4: uart0 1261 2 irq16: ath0 uhci3 25219 49 irq19: em5 uhci1+ 7239 14 cpu0: timer 1010862 1989 irq256: em0:rx 0 22828 44 irq257: em0:tx 0 7124 14 irq258: em0:link 1 0 irq259: em1:rx 0 17745 34 irq260: em1:tx 0 4251 8 irq261: em1:link 3 0 irq262: em2:rx 0 43 0 irq263: em2:tx 0 490 0 irq264: em2:link 2 0 irq265: em3:rx 0 8718 17 irq266: em3:tx 0 12026 23 irq267: em3:link 1 0 cpu1: timer 1010875 1989 Total 2128688 4190
I also disabled hardware checksum offload (all three options) to see if this will help my crash. Funny this is happening now…
-
It seems like something in your local hardware that might be causing this.
Broken RAM? -
I can see in a few days if this happens but it all started after i upgraded to an august build and it seems to happen every few days. If i can get the BT and it looks the same, i don't think it is the ram so much. Thanks for looking though…
-
Nope, i don't think this is ram related. Same error just happened.
Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xc050048 fault code = supervisor read, page not present instruction pointer = 0x20:0xc0b0c4d1 stack pointer = 0x28:0xeb0ba7d8 frame pointer = 0x28:0xeb0ba7fc code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq259: em1:rx 0) [thread] Stopped at rn_match+0x11: movl 0xc(%eax),%ebx db> bt Tracing pid 12 tid 64034 td 0xc4afe280 rn_match(c130852c,c5ada300,c5c3f710,c55f381e,eb0ba8b4,...) at rn_match+0x11 pfr_match_addr(c5be4000,c55f381a,2,e072,eb0ba89c,...) at pfr_match_addr+0xe0 pf_test_udp(eb0ba978,eb0ba974,1,c4c59500,c5829500,...) at pf_test_udp+0x8aa pf_test(1,c4b36c00,eb0bab44,0,0,...) at pf_test+0x242f pf_check_in(0,eb0bab44,c4b36c00,1,0,...) at pf_check_in+0x46 pfil_run_hooks(c1353e60,eb0bab94,c4b36c00,1,0,...) at pfil_run_hooks+0x93 ip_input(c5829500,10,c203000,0,0,...) at ip_input+0x359 netisr_dispatch_src(1,0,c5829500,eb0bac00,c0af9e4f,...) at netisr_dispatch_src+0x70 netisr_dispatch(1,c5829500,c4b35000,c4b36c00) at netisr_dispatch+0x20 ether_demux(c4b36c00,c5829500,3,0,3,...) at ether_demux+0x19f ether_input(c4b36c00,c5829500,eb0bac58,c4afe280,eb0bac4c,...) at ether_input+0x15d em_rxeof(0,b7ad47a,c4b57340,c4b50080,eb0bacc0,...) at em_rxeof+0x184 em_msix_rx(c4b35000,0,109,8c57ea70,117a2,...) at em_msix_rx+0x23 intr_event_execute_handlers(c498f7f8,c4b50080,c0ee391c,52d,c4b500f0,...) at intr_event_execute_handlers+0xde ithread_loop(c4b4ca40,eb0bad38,0,0,0,...) at ithread_loop+0x66 fork_exit(c0a11ad0,c4b4ca40,eb0bad38) at fork_exit+0x88 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xeb0bad70, ebp = 0 --- [/thread]
-
Can you check on the system log what events appear when this happens?
-
Sorry, how am I able to view the system log file from the db> prompt?
-
After you reboot the machine!
-
Hrm, doesn't the system log files get overwritten on every boot?
-
My fw had a similar crash today! psSense 2.0.
I have never experienced this before. If there anyway I can collect information from pfSense after a reboot that makes troubleshooting easier? Logs, coredumps?
I have used pfSense since Rel1 was in alpha stage, and this is the first time I had to physically reset the fw due to instabillity.
Please see the attached image.
-
Can you type "bt" at the prompt?
-
bt: Command not found.
I have restarted the fw though as no traffic was possible after the crash.
-
you must have an embedded platform where the kernel does not allow "back trace"
I believe there is a way to change the kernel on the embedded platform. Someone more knowledgeable can chime in.
-
I do not have embedded plattform. I have a default x86 install: pfSense 2.0-RELEASE-pfSense (i386)
-
You can only run a back trace at the db> prompt, after a crash, not if:
I have restarted the fw though as no traffic was possible after the crash.
Steve
-
Thanks, I suspected that.
It might be good to evaluate features that captures system information that can be used for troubleshooting by the dev team later.
I was surprised that after the boot, there were no traces from the crash at all, and impossible to provide any hard evidence of what have happened.
I doubt that there are many firewalls that can be offline for a long time while consulting support. Most of us need to reboot and have the system back in service right away.Just my two cents.
Back to the problem at hand. Is it possible that the crash can be caused by memory issue (RAM)? I have seen instabillities on other systems being caused by failing RAM.
Thanks
-
We fixed the fact that some crashes do not automatically restart in 2.0.1/2.1, but it's an easy fix:
Edit /etc/ddb.conf and change
script kdb.enter.panic=textdump set; capture on; run lockinfo; show pcpu; bt; ps; alltrace; capture off; call doadump; reset
to
script kdb.enter.default=textdump set; capture on; run lockinfo; show pcpu; bt; ps; alltrace; capture off; call doadump; reset
(So just change kdb.enter.panic to kdb.enter.default)
Then run:
/sbin/ddb /etc/ddb.conf
From that point on it should collect the debug data and reboot itself automatically, and also give you a crash report notice in the GUI that you can use to upload the data to our servers (or grab it from /var/crash yourself)
From that panic it could be faulty hardware, but it's hard to say for sure. Usually if it's bad RAM the crashes would be in a different place every time, not in the exact same path. Though it could be a faulty NIC.
-
Hello,
I am also getting the same error every few days, or sometimes more than once a day. I have recently upgraded from 1.2.3 to 2.0.1. But, the pfSense was crashing and restarting before the upgrade, so it is not "only" associated with 2.0.1 release.
I am attaching the entire crash log (long) that I was able to see on the GUI. I have sent it to pfSense team for further analysis.
Atul.
[Crash Report.txt](/public/imported_attachments/1/Crash Report.txt)
-
Hello,
I am also getting the same error every few days, or sometimes more than once a day. I have recently upgraded from 1.2.3 to 2.0.1. But, the pfSense was crashing and restarting before the upgrade, so it is not "only" associated with 2.0.1 release.
I am attaching the entire crash log (long) that I was able to see on the GUI. I have sent it to pfSense team for further analysis.
Atul.
That crash is in code writing to the filesystem. There is very little likelihood there is a problem in that code, it's been solid for years on FreeBSD.
More likely your HDD or storage media has issues, or it could be cabling/controller/DMA issues, but it's definitely storage.
-
Thanks jimp. I will change the hard disk and check again.
Out of curiosity - how did you know that this is storage related?
Atul.