• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Fatal trap 12: page fault while in kernel mode when connecting PPPoE

Plus 23.09 Development Snapshots (Retired)
4
27
3.5k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • W
    w0w @cmcdonald
    last edited by w0w Sep 23, 2023, 6:32 AM Sep 23, 2023, 6:22 AM

    @cmcdonald

    Crash on primary, leaving CARP maintenance mode.

    db:1:pfs> bt
    Tracing pid 12 tid 100063 td 0xfffffe00c498ae40
    kdb_enter() at kdb_enter+0x32/frame 0xfffffe001b1f1610
    vpanic() at vpanic+0x163/frame 0xfffffe001b1f1740
    panic() at panic+0x43/frame 0xfffffe001b1f17a0
    trap_fatal() at trap_fatal+0x40c/frame 0xfffffe001b1f1800
    trap_pfault() at trap_pfault+0x4f/frame 0xfffffe001b1f1860
    calltrap() at calltrap+0x8/frame 0xfffffe001b1f1860
    --- trap 0xc, rip = 0xffffffff80fb86d7, rsp = 0xfffffe001b1f1930, rbp = 0xfffffe001b1f19f0 ---
    pf_route() at pf_route+0x4e7/frame 0xfffffe001b1f19f0
    pf_test() at pf_test+0xd7b/frame 0xfffffe001b1f1b90
    pf_check_out() at pf_check_out+0x22/frame 0xfffffe001b1f1bb0
    pfil_mbuf_out() at pfil_mbuf_out+0x38/frame 0xfffffe001b1f1be0
    ip_output() at ip_output+0xb4a/frame 0xfffffe001b1f1ce0
    ip_forward() at ip_forward+0x3c2/frame 0xfffffe001b1f1d90
    ip_input() at ip_input+0x6e9/frame 0xfffffe001b1f1df0
    swi_net() at swi_net+0x128/frame 0xfffffe001b1f1e60
    ithread_loop() at ithread_loop+0x257/frame 0xfffffe001b1f1ef0
    fork_exit() at fork_exit+0x7f/frame 0xfffffe001b1f1f30
    fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001b1f1f30
    --- trap 0xaf34f5ea, rip = 0x56a9887e31914e4f, rsp = 0x4506b2da7d808177, rbp = 0xf56feca9dc25ff3d ---
    db:1:pfs>  show registers
    cs                        0x20
    ds                        0x3b
    es                        0x3b
    fs                        0x13
    gs                        0x1b
    ss                        0x28
    rax                       0x12
    rcx         0xffffffff81457767
    rdx         0xfffffe001b1f1250
    rbx                      0x100
    rsp         0xfffffe001b1f1610
    rbp         0xfffffe001b1f1610
    rsi                       0x30
    rdi         0xffffffff82d40298  vt_conswindow+0x10
    r8                           0
    r9                           0
    r10                          0
    r11                          0
    r12                          0
    r13                          0
    r14         0xffffffff813db3a1
    r15         0xfffffe00c498ae40
    rip         0xffffffff80d38812  kdb_enter+0x32
    rflags                    0x86
    kdb_enter+0x32: movq    $0,0x2344ff3(%rip)
    db:1:pfs>  show pcpu
    cpuid        = 0
    dynamic pcpu = 0x111df00
    curthread    = 0xfffffe00c498ae40: pid 12 tid 100063 critnest 1 "swi1: netisr 6"
    curpcb       = 0xfffffe00c498b360
    fpcurthread  = none
    idlethread   = 0xfffffe00c48d73a0: tid 100003 "idle: cpu0"
    self         = 0xffffffff84010000
    curpmap      = 0xffffffff83021ab0
    tssp         = 0xffffffff84010384
    rsp0         = 0xfffffe001b1f2000
    kcr3         = 0x80000000c5d25003
    ucr3         = 0xffffffffffffffff
    scr3         = 0x2f5f9db5f
    gs32p        = 0xffffffff84010404
    ldt          = 0xffffffff84010444
    tss          = 0xffffffff84010434
    curvnet      = 0xfffff80001279200
    db:1:pfs>  run lockinfo
    db:2:lockinfo> show locks
    No such command; use "help" to list available commands
    db:2:lockinfo>  show alllocks
    No such command; use "help" to list available commands
    db:2:lockinfo>  show lockedvnods
    Locked vnodes
    ************************************************************************
    Fatal trap 12: page fault while in kernel mode
    cpuid = 5; apic id = 05
    
    fault virtual address	= 0x50
    
    
    
    
    
    
    fault code		= supervisor read data, page not present
    
    Fatal trap 12: page fault while in kernel mode
    Fatal trap 12: page fault while in kernel mode
    instruction pointer	= 0x20:0xffffffff80fb86d7
    cpuid = 1; 
    Fatal trap 12: page fault while in kernel mode
    apic id = 01
    cpuid = 3; stack pointer	        = 0x28:0xfffffe001b1e2930
    Fatal trap 12: page fault while in kernel mode
    
    frame pointer	        = 0x28:0xfffffe001b1e29f0
    cpuid = 0; apic id = 00
    fault virtual address	= 0x50
    fault code		= supervisor read data, page not present
    fault virtual address	= 0x50
    instruction pointer	= 0x20:0xffffffff80fb86d7
    stack pointer	        = 0x28:0xfffffe001b1f1930
    frame pointer	        = 0x28:0xfffffe001b1f19f0
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    Fatal trap 12: page fault while in kernel mode
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags	= interrupt enabled, resume, IOPL = 0
    cpuid = 6; apic id = 06
    processor eflags	= interrupt enabled, resume, IOPL = 0
    current process		= 12 (swi1: netisr 6)
    rdi: fffff8028743200e rsi: fffffe00c481e82d rdx: 0000000000000000
    rcx: fffffe00c481e370  r8: fffffe001b1f1a50  r9: 0000000000000000
    rax: 0000000000000000 rbx: fffff80317e5b8b8 rbp: fffffe001b1f19f0
    r10: 0000000000000000 r11: fffffe00c481e370 r12: fffffe00c481e370
    r13: 0000000000000002 r14: fffff802f5b13840 r15: fffffe001b1f1c78
    trap number		= 12
    panic: page fault
    cpuid = 0
    time = 1695445416
    KDB: enter: panic
    
    net.isr.numthreads: 1
    net.isr.maxprot: 16
    net.isr.defaultqlimit: 256
    net.isr.maxqlimit: 10240
    net.isr.bindthreads: 0
    net.isr.maxthreads: 1
    net.isr.dispatch: hybrid
    

    When primary booted, I just selected Reboot in GUI and confirmed, at the same time secondary firewall crashed

    db:1:pfs> bt
    Tracing pid 0 tid 100007 td 0xfffffe0020565720
    kdb_enter() at kdb_enter+0x32/frame 0xfffffe001d7da390
    vpanic() at vpanic+0x163/frame 0xfffffe001d7da4c0
    panic() at panic+0x43/frame 0xfffffe001d7da520
    trap_fatal() at trap_fatal+0x40c/frame 0xfffffe001d7da580
    trap_pfault() at trap_pfault+0x4f/frame 0xfffffe001d7da5e0
    calltrap() at calltrap+0x8/frame 0xfffffe001d7da5e0
    --- trap 0xc, rip = 0xffffffff80fb86d7, rsp = 0xfffffe001d7da6b0, rbp = 0xfffffe001d7da770 ---
    pf_route() at pf_route+0x4e7/frame 0xfffffe001d7da770
    pf_test() at pf_test+0xd7b/frame 0xfffffe001d7da910
    pf_check_out() at pf_check_out+0x22/frame 0xfffffe001d7da930
    pfil_mbuf_out() at pfil_mbuf_out+0x38/frame 0xfffffe001d7da960
    ip_output() at ip_output+0xb4a/frame 0xfffffe001d7daa60
    ip_forward() at ip_forward+0x3c2/frame 0xfffffe001d7dab10
    ip_input() at ip_input+0x6e9/frame 0xfffffe001d7dab70
    netisr_dispatch_src() at netisr_dispatch_src+0x22c/frame 0xfffffe001d7dabc0
    ether_demux() at ether_demux+0x149/frame 0xfffffe001d7dabf0
    ether_nh_input() at ether_nh_input+0x36e/frame 0xfffffe001d7dac50
    netisr_dispatch_src() at netisr_dispatch_src+0xaf/frame 0xfffffe001d7daca0
    ether_input() at ether_input+0x69/frame 0xfffffe001d7dad00
    iflib_rxeof() at iflib_rxeof+0xc46/frame 0xfffffe001d7dae00
    _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe001d7dae40
    gtaskqueue_run_locked() at gtaskqueue_run_locked+0x14e/frame 0xfffffe001d7daec0
    gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe001d7daef0
    fork_exit() at fork_exit+0x7f/frame 0xfffffe001d7daf30
    fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001d7daf30
    --- trap 0x5965b2cb, rip = 0x526ba4d649ac9358, rsp = 0xe81fd03ea07c40f8, rbp = 0 ---
    db:1:pfs>  show registers
    cs                        0x20
    ds                        0x3b
    es                        0x3b
    fs                        0x13
    gs                        0x1b
    ss                           0
    rax                       0x12
    rcx         0xffffffff81457767
    rdx         0xfffffe001d7d9fd0
    rbx                      0x100
    rsp         0xfffffe001d7da390
    rbp         0xfffffe001d7da390
    rsi                       0x2d
    rdi         0xffffffff82d40298  vt_conswindow+0x10
    r8                           0
    r9                           0
    r10                          0
    r11                          0
    r12                          0
    r13                          0
    r14         0xffffffff813db3a1
    r15         0xfffffe0020565720
    rip         0xffffffff80d38812  kdb_enter+0x32
    rflags                    0x82
    kdb_enter+0x32: movq    $0,0x2344ff3(%rip)
    db:1:pfs>  show pcpu
    cpuid        = 0
    dynamic pcpu = 0x122bf00
    curthread    = 0xfffffe0020565720: pid 0 tid 100007 critnest 1 "if_io_tqg_0"
    curpcb       = 0xfffffe0020565c40
    fpcurthread  = none
    idlethread   = 0xfffffe00205673a0: tid 100003 "idle: cpu0"
    self         = 0xffffffff84210000
    curpmap      = 0xffffffff83021ab0
    tssp         = 0xffffffff84210384
    rsp0         = 0xfffffe001d7db000
    kcr3         = 0x73a43000
    ucr3         = 0xffffffffffffffff
    scr3         = 0x15ba88000
    gs32p        = 0xffffffff84210404
    ldt          = 0xffffffff84210444
    tss          = 0xffffffff84210434
    curvnet      = 0xfffff80001241400
    db:1:pfs>  run lockinfo
    db:2:lockinfo> show locks
    No such command; use "help" to list available commands
    db:2:lockinfo>  show alllocks
    No such command; use "help" to list available commands
    db:2:lockinfo>  show lockedvnods
    Locked vnodes
    **************************************************************************************
    
    Fatal trap 12: page fault while in kernel mode
    cpuid = 0; apic id = 00
    fault virtual address	= 0x50
    fault code		= supervisor read data, page not present
    instruction pointer	= 0x20:0xffffffff80fb86d7
    stack pointer	        = 0x0:0xfffffe001d7da6b0
    frame pointer	        = 0x0:0xfffffe001d7da770
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags	= interrupt enabled, resume, IOPL = 0
    current process		= 0 (if_io_tqg_0)
    rdi: fffff8011bf9c80e rsi: fffffe002049d82d rdx: 0000000000000000
    rcx: fffffe002049d370  r8: fffffe001d7da7d0  r9: 0000000000000000
    rax: 0000000000000000 rbx: fffff8015b76d3b0 rbp: fffffe001d7da770
    r10: 0000000000000000 r11: fffffe002049d370 r12: fffffe002049d370
    r13: 0000000000000002 r14: fffff802da989c60 r15: fffffe001d7da9f8
    trap number		= 12
    panic: page fault
    cpuid = 0
    time = 1695450285
    KDB: enter: panic
    
    net.isr.numthreads: 1
    net.isr.maxprot: 16
    net.isr.defaultqlimit: 256
    net.isr.maxqlimit: 10240
    net.isr.bindthreads: 0
    net.isr.maxthreads: 1
    net.isr.dispatch: hybrid
    
    1 Reply Last reply Reply Quote 0
    • W
      w0w
      last edited by Sep 23, 2023, 6:43 AM

      If it makes sense, I have re-configured both nodes with LAGGs to exactly match order and interface names, but it did not change anything in this behavior. So far I can't reproduce this in VMs, but one of the VMs was crashed once sometime ago when I tried other pf bug replication, unfortunately I have not saved this crash, but it was similar, fatal trap 12, referring to two exact things:
      fault virtual address = 0x50
      and
      fault code = supervisor read data, page not present

      cmcdonaldC 1 Reply Last reply Sep 25, 2023, 3:57 PM Reply Quote 0
      • cmcdonaldC
        cmcdonald Netgate Developer @w0w
        last edited by Sep 25, 2023, 3:57 PM

        @w0w Can you try disabling pfsync?

        Need help fast? https://www.netgate.com/support

        W 2 Replies Last reply Sep 25, 2023, 4:23 PM Reply Quote 0
        • W
          w0w @cmcdonald
          last edited by Sep 25, 2023, 4:23 PM

          @cmcdonald
          Last time when I disabled pfsync, it stopped to crash. But I need to re-test it.

          1 Reply Last reply Reply Quote 0
          • W
            w0w @cmcdonald
            last edited by Sep 25, 2023, 4:46 PM

            @cmcdonald
            Yes, looks like the problem is limited to “Synchronize states” option.

            K cmcdonaldC 2 Replies Last reply Sep 27, 2023, 1:38 PM Reply Quote 0
            • K
              kprovost @w0w
              last edited by Sep 27, 2023, 1:38 PM

              @w0w I've had a look at that dump, and while I think I've identified what's going wrong I do not understand how we can end up in that situation.

              It'd be interesting to get a full core dump (as opposed to these text dumps). Are you up for reproducing the problem and sharing a core dump (along with the exact version you triggered the crash on, of course)?

              Short version: add a device for a swap partition, ideally at least as large as system RAM. A USB stick should work. (Note you'll lose all data on the stick!)
              If the USB (or other) swap device is da0 do:

              gpart destroy -F da0
              gpart create -s gpt da0
              gpart add -t freebsd-swap da0
              

              Add /dev/da0p1 none swap sw 0 0 to /etc/fstab.
              Edit /etc/pfSense-ddb.conf and change the script kdb.enter.default to script kdb.enter.default=bt ; show registers ; dump ; reset.

              Reboot.

              Future panics should dump a kernel core to the swap partition, which will get saved to /var/crash on the next boot. Those files (along with an exact version number of the system this happened on) should let us dig a bit deeper.

              1 Reply Last reply Reply Quote 1
              • cmcdonaldC
                cmcdonald Netgate Developer @w0w
                last edited by Sep 27, 2023, 2:42 PM

                @w0w What if you restrict pfsync updates from primary to secondary only, a vice-versa...instead of bi-directional syncing?

                Need help fast? https://www.netgate.com/support

                W 1 Reply Last reply Sep 27, 2023, 4:21 PM Reply Quote 1
                • W
                  w0w @cmcdonald
                  last edited by Sep 27, 2023, 4:21 PM

                  @cmcdonald That's what I did last time 😊
                  It looks like it stopped to crash, but maybe it needs further testing, not sure.
                  @kprovost
                  I posted some links with core dumps created privately 🙄

                  cmcdonaldC 1 Reply Last reply Sep 27, 2023, 8:11 PM Reply Quote 0
                  • cmcdonaldC
                    cmcdonald Netgate Developer @w0w
                    last edited by Sep 27, 2023, 8:11 PM

                    @w0w disabling which sync path (primary to secondary or secondary to primary) ?

                    Need help fast? https://www.netgate.com/support

                    W 1 Reply Last reply Sep 28, 2023, 2:15 AM Reply Quote 0
                    • W
                      w0w @cmcdonald
                      last edited by Sep 28, 2023, 2:15 AM

                      @cmcdonald
                      Secondary to primary.

                      1 Reply Last reply Reply Quote 0
                      • W
                        w0w
                        last edited by Oct 7, 2023, 10:30 AM

                        https://redmine.pfsense.org/issues/14804

                        Just for reference, problem solved.

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.