Random panic reboot



  • Hi there,

    Seems after the last update we are having panic reboots a lot.
    Not sure how to read the dump but this is the first thing that pops up.

    <118>Welcome to pfSense 2.4.5-RELEASE (Patch 1)...
    <118>
    <118>savecore: reboot after panic: bpf_mcopy
    <118>savecore: writing core to /var/crash/textdump.tar.0


  • Netgate Administrator

    When you log back into the GUI it should give you an alert with a link for downloading the complete crash report.

    Steve



  • What should I be looking for in the log?

    Search on panic? Im sorry for asking odd questions Im just not sure what to look for.


  • Netgate Administrator

    The important bits are generally at the start of the dump eveything between:
    db:0:kdb.enter.default> show pcpu
    and
    db:0:kdb.enter.default> ps

    And the output from the message buffer leading up to and including the panic text.

    If it crashed repeatedly there may be more than one crash in there. The very first part of the report shows how many are listed.

    Steve



  • Thank you

    I will dig it out today. After almost 12 years of pFsense and this is the first issue we have ever had.



  • I deleted the last crash dump.

    We will wait for it to pop today.


  • Netgate Administrator

    Might be similar to this: https://forum.netgate.com/topic/147078/pfsense-reboot-kernel-panic-bpf_mcopy-v2-4-4-p3

    In which case check the mbuf values.



  • After several hours the thing ran great.
    We started our normal operations back which involves an OpenVPN Client out to a customer.
    We use pFsense to do this task so we arent installing VPN clients all over my finance department.

    Simple: you come from this source you go out this gateway. Simple.

    The system used is an old T5500 Dell workstation with 2 5680 CPUs. Rips AES like a monster.
    Steady network flow over the target OpenVPN link is about 610 megabits which is friggin awesome for us.

    Then the thing reboots. Listed below is the dump from db:0:kdb.enter.default> show pcpu to db:0:kdb.enter.default> ps

    We did change Firewall Maximum Fragment Entries from 5000 to 12000 and it still rebooted on us.

    I did have a look at the link that was provided with the bug report and thats about whats going on here.

    Just bizarre.

    %(#000000)[°db:0:kdb.enter.default> show pcpu
    cpuid = 2
    dynamic pcpu = 0xfffffe1088694580
    curthread = 0xfffff8001bcd6620: pid 78429 "openvpn"
    curpcb = 0xfffffe1051c9acc0
    fpcurthread = 0xfffff8001bcd6620: pid 78429 "openvpn"
    idlethread = 0xfffff8000f37f000: tid 100005 "idle: cpu2"
    curpmap = 0xfffff8001b2fa138
    tssp = 0xffffffff835a33a0
    commontssp = 0xffffffff835a33a0
    rsp0 = 0xfffffe1051c9acc0
    gs32p = 0xffffffff835a9ff8
    ldt = 0xffffffff835aa038
    tss = 0xffffffff835aa028
    tlb gen = 1988154
    db:0:kdb.enter.default> bt
    Tracing pid 78429 tid 100434 td 0xfffff8001bcd6620
    kdb_enter() at kdb_enter+0x3b/frame 0xfffffe1051c9a330
    vpanic() at vpanic+0x19b/frame 0xfffffe1051c9a390
    panic() at panic+0x43/frame 0xfffffe1051c9a3f0
    bpf_buffer_append_mbuf() at bpf_buffer_append_mbuf+0x64/frame 0xfffffe1051c9a420
    catchpacket() at catchpacket+0x4b9/frame 0xfffffe1051c9a4d0
    bpf_mtap() at bpf_mtap+0x200/frame 0xfffffe1051c9a540
    oce_multiq_transmit() at oce_multiq_transmit+0x88/frame 0xfffffe1051c9a5a0
    oce_multiq_start() at oce_multiq_start+0x77/frame 0xfffffe1051c9a5d0
    ether_output_frame() at ether_output_frame+0x98/frame 0xfffffe1051c9a600
    ether_output() at ether_output+0x6d7/frame 0xfffffe1051c9a690
    ip_output() at ip_output+0x138d/frame 0xfffffe1051c9a7c0
    ip_forward() at ip_forward+0x2c3/frame 0xfffffe1051c9a860
    ip_input() at ip_input+0x724/frame 0xfffffe1051c9a8f0
    netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 0xfffffe1051c9a940
    tunwrite() at tunwrite+0x21e/frame 0xfffffe1051c9a980
    devfs_write_f() at devfs_write_f+0xda/frame 0xfffffe1051c9a9f0
    dofilewrite() at dofilewrite+0xc6/frame 0xfffffe1051c9aa40
    kern_writev() at kern_writev+0x68/frame 0xfffffe1051c9aa90
    sys_writev() at sys_writev+0x35/frame 0xfffffe1051c9aac0
    amd64_syscall() at amd64_syscall+0xa86/frame 0xfffffe1051c9abf0
    fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe1051c9abf0
    --- syscall (121, FreeBSD ELF64, sys_writev), rip = 0x8017b02ea, rsp = 0x7fffffffddb8, rbp = 0x7fffffffde00 ---
    db:0:kdb.enter.default> ps]



  • Adding netstat -m.

    [2.4.5-RELEASE][root@mosh-fw.mosh.local]/root: netstat -m
    28772/16528/45300 mbufs in use (current/cache/total)
    19747/9139/28886/1000000 mbuf clusters in use (current/cache/total/max)
    19747/9095 mbuf+clusters out of packet secondary zone in use (current/cache)
    0/29/29/524288 4k (page size) jumbo clusters in use (current/cache/total/max)
    0/0/0/524288 9k jumbo clusters in use (current/cache/total/max)
    0/0/0/340483 16k jumbo clusters in use (current/cache/total/max)
    46687K/22526K/69213K bytes allocated to network (current/cache/total)
    0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
    0/0/0 requests for jumbo clusters denied (4k/9k/16k)
    27 sendfile syscalls
    13 sendfile syscalls completed without I/O request
    11 requests for I/O initiated by sendfile
    101 pages read by sendfile as part of a request
    186 pages were valid at time of a sendfile request
    0 pages were requested for read ahead by applications
    128 pages were read ahead by sendfile
    0 times sendfile encountered an already busy page
    0 requests for sfbufs denied
    0 requests for sfbufs delayed
    [2.4.5-RELEASE][root@mosh-fw.mosh.local]/root:



  • During this function :

    @stevemosher said in Random panic reboot:

    bpf_buffer_append_mbuf() at bpf_buffer_append_mbuf+0x64/frame 0xfffffe1051c9a420

    a panic is thrown.

    So, "mbuf" - whatever that is - are missing, to small, not enough, etc etc etc



  • @stevemosher said in Random panic reboot:

    We did change Firewall Maximum Fragment Entries from 5000 to 12000 and it still rebooted on us.

    We identified that item and tried to fix it by setting max frag to 12000.
    We thinking thats not enough? Or is this where the mbuf issue starts?


  • Netgate Administrator

    That's not the same as the mbuf limits but yours are set to 1M for standard sized packets and 500K for jumbo packets so it's unlikely you'd be exhausting that.
    The netstat output confirms that, 0 denied clusters.

    It does look very similar to the crash on that other thread. Not much more info there though.

    What interface types do you have on that box? Can you switch them out?

    Steve



  • Current card is a broadcom/hp 10G card. OCE0/OCE1

    Trying to see if we can grab an intel x550t2 card and see if that changes anything.

    I do agree it seems to be more bsd related.



  • We went ahead and picked up an Intel X550T2 card. Should be here next week.

    Stay tuned.

    Above all -- Thank you.


  • Netgate Administrator

    Ah, nice. Yeah looking at that the 'oce_multiq` call very close to the panic seems suspicious and is also in the panic on the other thread.

    Steve



  • Follow up.

    The interfaces keep flapping or at least stating they are up or down repeatedly on console.

    The damn thing has ran for 8 days solid now no reboots.
    The Intel card still has yet to show up.


  • Netgate Administrator

    Hmm, seems suspicious though. Maybe that older card is actually suspect if it keeps losing link. Or something else in the connection.



  • We did swap out the card a couple weeks ago thinking that was it.

    We even tried a whole other pc for the foundation.

    Only common thing is the OS and 1 certain make/model HP card.

    Come on Intel card.



  • The Intel X550T2 is installed. We'll beat on it and see how it goes.



  • Pretty convinced our issues are gone with the new card.

    Hell we even get better speeds in and out.

    Thank you to all on this board who help us out.


Log in to reply