PfSense crashed on Alix

jlepthien

Hi there,

today my internet suddenly seemed stuck. When I connected the console I saw this:

Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x8
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc08f46ce
stack pointer = 0x28:0xc23787c8
frame pointer = 0x28:0xc23787f4
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 11 (irq10: vr0)

jimp

That's a new one I haven't seen.

If it happens again, get the new message and also type "bt" at that prompt to get a backtrace and paste it all here.

jlepthien

Okay. Will do that if it happens again.

jlepthien

Happened again :-(

Here is the bt…

db> bt
Tracing pid 11 tid 64026 td 0xc2456b40
rn_match(c0cd4fcc,c283f300,0,c2938000,c23fc3cc,...) at rn_match+0x4e
pfr_match_addr(c288c9b0,c27b402a,2,c23fc3b8,c23fc3b4,...) at pfr_match_addr+0x63
pf_test_tcp(c23fc4b4,c23fc4b0,1,c26c2400,c27f3600,...) at pf_test_tcp+0x4cb
pf_test(1,c24fe400,c23fc678,0,0,...) at pf_test+0x8d2
init_pf_mutex(0,c23fc678,c24fe400,1,0,...) at init_pf_mutex+0x5e6
pfil_run_hooks(c0cfd140,c23fc6c8,c24fe400,1,0,...) at pfil_run_hooks+0x7e
ip_input(c27f3600,57d5,c06fd8a0,c23fc70c,c06fd3b2,...) at ip_input+0x278
netisr_dispatch_src(1,0,c27f3600,c23fc738,c0935a54,...) at netisr_dispatch_src+0x89
netisr_dispatch(1,c27f3600,3,0,3,...) at netisr_dispatch+0x20
ng_car_q_event(c27dc700,c27859c0,c0aaf160,c2456b40,c2456d80,...) at ng_car_q_event+0x2d54
ng_rmnode_self(c2614380,5e3e,2,2,3,...) at ng_rmnode_self+0x10f
ng_snd_item(c27859c0,0,c27dcb00,0,7f63,...) at ng_snd_item+0x3ca
ng_parse_get_token(c27dca80,c27859c0,c0cf6c6c,c24566c0,4,...) at ng_parse_get_token+0xcbbf
ng_rmnode_self(c27dc980,1,c23fc8e4,c0aa9c59,0,...) at ng_rmnode_self+0x10f
ng_snd_item(c27859c0,0,c27dc780,0) at ng_snd_item+0x3ca
ng_parse_get_token(0,c23fc938,c0ac3027,c0cd128c,c2631c80,...) at ng_parse_get_token+0x2d28
ng_parse_get_token(0,c23fc948,1daf0008,28,28,...) at ng_parse_get_token+0x2e18
ng_parse_get_token(0,c093f623,c27f3600,1) at ng_parse_get_token+0x2fc3
ng_parse_get_token(0,0,c24566c0,c09433d0,c2456b40,...) at ng_parse_get_token+0x30f8
ng_parse_get_token(c27dd200,c27859c0,80000000,c23fca68,c0ac92f4,...) at ng_parse_get_token+0x548d
ng_rmnode_self(c27b400e,6,c23fca84,c0ac958e,c27b400e,...) at ng_rmnode_self+0x10f
ng_snd_item(c27859c0,0,c27dd280,0,4,...) at ng_snd_item+0x3ca
ng_parse_get_token(c27dd300,c27859c0,5e3e4fcf,6b40,ed7d0000,...) at ng_parse_get_token+0x7ad7
ng_rmnode_self(0,c27f3600,c2534e00,6488,c23fcbc8,...) at ng_rmnode_self+0x10f
ng_snd_item(c27859c0,0,c2631580,0,8864,...) at ng_snd_item+0x3ca
ng_car_q_event(c24fd800,c27f3600,c24fd800,c24fd800,c27b4008,...) at ng_car_q_event+0x27b9
ether_demux(c24fd800,c27f3600,3,0,3,...) at ether_demux+0x20e
ether_vlanencap(c24fd800,c27f3600,0,c23fcc5c,c085400c,...) at ether_vlanencap+0x43f
ucom_attach(c23fcc7c,c23f0008,c23f0028,c2450028,0,...) at ucom_attach+0x542b
ucom_attach(c24aa000,0,109,bee27646,57d5,...) at ucom_attach+0x89d7
intr_event_execute_handlers(c2436aa0,c2434600,c0b5910d,4f6,c2434670,...) at intr_event_execute_handlers+0x14b
intr_getaffinity(c24ffc90,c23fcd38,0,0,0,...) at intr_getaffinity+0x14a
fork_exit(c080dfe0,c24ffc90,c23fcd38) at fork_exit+0x90
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xc23fcd70, ebp = 0 ---
db>

jlepthien

Sorry I couldn't see the part before the bt. I just started up the console and everything looked good with my login prompt. After I put in the admin username I saw the db> prompt…

jlepthien

Again :-(

Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x8
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc08f46ce
stack pointer = 0x28:0xc23787c8
frame pointer = 0x28:0xc23787f4
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 11 (irq10: vr0)

jimp

None of that looks familiar, unfortunately. There had been a few 2.0 panics over the last few months but they were all resolved, this one looks new. It seems to have crashed in two dissimilar code paths as well even though the begin and end the same. One seems to involve netgraph (PPPoE on WAN? PPTP or L2TP server active?) and the other doesn't.

Do you have anything plugged into the ALIX USB ports? It's odd that ucom is showing up there, that's the USB com port driver.

What kind of features do you have going on this setup? Any services or VPNs enabled? Packages installed? Traffic Shaping?

jlepthien

A I have a usb to serial adapter connected to the beast ;) Maybe I should disconnect it from my mac when I am not using it…

jimp

Check the output of "vmstat -i"

I wonder if your usb adapter is sharing an IRQ with a network card, which seems like a recipe for disaster on a low power device.

jlepthien

Hi!

This is the output…

interrupt total rate
irq0: clk 120917 99
irq4: uart0 501 0
irq8: rtc 154794 127
irq9: ath0 73851 61
irq10: vr0 6349 5
irq11: vr1 9958 8
irq12: ohci0 ehci0 1 0
irq14: ata0 11966 9
Total 378337 312

jimp

That looks fine, though is that with or without the USB serial adapter plugged in?

I suppose it may just be a coincidence that it's the irq process that is active when the panic happens, but it's hard to say for certain.

jlepthien

This was taken with the usb adapter inserted. But it is usb on the Macintosh side. It is still serial on the Alix…

jimp

ahh… well then that wouldn't explain the ucom then. The ALIX would only see serial, not a USB device on its end.

jlepthien

Again :-(

Build from 01/06

db> bt
Tracing pid 11 tid 64025 td 0xc2456d80
pim_input(c30a5816,c24fa000,c308cd00,0,0,…) at pim_input+0xb8c
ip_input(c308cd00,246,c24d8700,c2378bcc,c06fd9b1,...) at ip_input+0x604
netisr_dispatch_src(1,0,c308cd00,c2378c04,c08e3ecf,...) at netisr_dispatch_src+0x89
netisr_dispatch(1,c308cd00,c24fa000,c24fa000,c30a5808,...) at netisr_dispatch+0x20
ether_demux(c24fa000,c308cd00,3,0,3,...) at ether_demux+0x16f
ether_vlanencap(c24fa000,c308cd00,c2456d80,c2378c5c,c0853f81,...) at ether_vlanencap+0x43f
ucom_attach(c0d56e6d,c0cd10c0,c2378cb0,c2378c98,0,...) at ucom_attach+0x542b
ucom_attach(c24ab000,0,109,82593edb,132,...) at ucom_attach+0x89d7
intr_event_execute_handlers(c2436aa0,c2434680,c0b5910d,4f6,c24346f0,...) at intr_event_execute_handlers+0x14b
intr_getaffinity(c24f9b50,c2378d38,0,0,0,...) at intr_getaffinity+0x14a
fork_exit(c080dfe0,c24f9b50,c2378d38) at fork_exit+0x90
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xc2378d70, ebp = 0 ---
db>

jlepthien

And again. This is the last things I could grep:

processor eflags = interrupt enabled, resume, IOPL = 0
current process = 11 (irq10: vr0)

jlepthien

Okay. Now again. I am testing igmpproxy right now, perhaps it has something to do with it?

Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x72636524
fault code = supervisor write, page not present
instruction pointer = 0x20:0xc096993c
stack pointer = 0x28:0xc2378b10
frame pointer = 0x28:0xc2378b64
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 11 (irq10: vr0)

cmb

@jlepthien:

Okay. Now again. I am testing igmpproxy right now, perhaps it has something to do with it?

Possibly. Did it happen at all before you started testing it?

We'll get Ermal or someone to take a look at the backtraces when time permits.

jlepthien

Yeah, the first two happened before I think. But at least the first. All the ones from yesterday happened when I tested the igmp proxy…

jlepthien

Another one again. But I can only see stuff starting with the prompt most of the time….

db> bt
Tracing pid 11 tid 64025 td 0xc2456d80
rn_match(c0cd4fcc,c2842200,0,0,c23788a8,...) at rn_match+0x17
pfr_match_addr(c288d9b0,c31e5822,2,c2378894,c2378890,...) at pfr_match_addr+0x63
pf_test_udp(c2378990,c237898c,1,c2562c00,c278a800,...) at pf_test_udp+0x4db
pf_test(1,c24fa000,c2378b54,0,0,...) at pf_test+0xbb5
init_pf_mutex(0,c2378b54,c24fa000,1,0,...) at init_pf_mutex+0x5e6
pfil_run_hooks(c0cfd140,c2378ba4,c24fa000,1,0,...) at pfil_run_hooks+0x7e
ip_input(c278a800,246,c24d2ac0,c2378bcc,c06fd9b1,...) at ip_input+0x278
netisr_dispatch_src(1,0,c278a800,c2378c04,c08e3ecf,...) at netisr_dispatch_src+0x89
netisr_dispatch(1,c278a800,c24fa000,c24fa000,c31e5808,...) at netisr_dispatch+0x20
ether_demux(c24fa000,c278a800,3,0,3,...) at ether_demux+0x16f
ether_vlanencap(c24fa000,c278a800,c2456d80,c2378c5c,c0853f81,...) at ether_vlanencap+0x43f
ucom_attach(c0d56e6d,c0cd10c0,c2378cb0,c2378c98,0,...) at ucom_attach+0x542b
ucom_attach(c24ab000,0,109,cd9a2d5d,38ea,...) at ucom_attach+0x89d7
intr_event_execute_handlers(c2436aa0,c2434680,c0b5910d,4f6,c24346f0,...) at intr_event_execute_handlers+0x14b
intr_getaffinity(c24f9b50,c2378d38,0,0,0,...) at intr_getaffinity+0x14a
fork_exit(c080dfe0,c24f9b50,c2378d38) at fork_exit+0x90
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xc2378d70, ebp = 0 ---
db>

jlepthien

Do we have any info yet? Today this happened again and I really would like to know what this is. I can simply install 1.2.3 again and wait until 2.0 is out of beta, but I want to help the project. So devs, what could be the problem? Anything else I should check?