Random panic reboot

stevemosher

Hi there,

Seems after the last update we are having panic reboots a lot.
Not sure how to read the dump but this is the first thing that pops up.

<118>Welcome to pfSense 2.4.5-RELEASE (Patch 1)...
<118>
<118>savecore: reboot after panic: bpf_mcopy
<118>savecore: writing core to /var/crash/textdump.tar.0

stephenw10

When you log back into the GUI it should give you an alert with a link for downloading the complete crash report.

Steve

stevemosher

What should I be looking for in the log?

Search on panic? Im sorry for asking odd questions Im just not sure what to look for.

stephenw10

The important bits are generally at the start of the dump eveything between:
db:0:kdb.enter.default> show pcpu
and
db:0:kdb.enter.default> ps

And the output from the message buffer leading up to and including the panic text.

If it crashed repeatedly there may be more than one crash in there. The very first part of the report shows how many are listed.

Steve

stevemosher

Thank you

I will dig it out today. After almost 12 years of pFsense and this is the first issue we have ever had.

stevemosher

I deleted the last crash dump.

We will wait for it to pop today.

stephenw10

Might be similar to this: https://forum.netgate.com/topic/147078/pfsense-reboot-kernel-panic-bpf_mcopy-v2-4-4-p3

In which case check the mbuf values.

stevemosher

After several hours the thing ran great.
We started our normal operations back which involves an OpenVPN Client out to a customer.
We use pFsense to do this task so we arent installing VPN clients all over my finance department.

Simple: you come from this source you go out this gateway. Simple.

The system used is an old T5500 Dell workstation with 2 5680 CPUs. Rips AES like a monster.
Steady network flow over the target OpenVPN link is about 610 megabits which is friggin awesome for us.

Then the thing reboots. Listed below is the dump from db:0:kdb.enter.default> show pcpu to db:0:kdb.enter.default> ps

We did change Firewall Maximum Fragment Entries from 5000 to 12000 and it still rebooted on us.

I did have a look at the link that was provided with the bug report and thats about whats going on here.

Just bizarre.

%(#000000)[°db:0:kdb.enter.default> show pcpu
cpuid = 2
dynamic pcpu = 0xfffffe1088694580
curthread = 0xfffff8001bcd6620: pid 78429 "openvpn"
curpcb = 0xfffffe1051c9acc0
fpcurthread = 0xfffff8001bcd6620: pid 78429 "openvpn"
idlethread = 0xfffff8000f37f000: tid 100005 "idle: cpu2"
curpmap = 0xfffff8001b2fa138
tssp = 0xffffffff835a33a0
commontssp = 0xffffffff835a33a0
rsp0 = 0xfffffe1051c9acc0
gs32p = 0xffffffff835a9ff8
ldt = 0xffffffff835aa038
tss = 0xffffffff835aa028
tlb gen = 1988154
db:0:kdb.enter.default> bt
Tracing pid 78429 tid 100434 td 0xfffff8001bcd6620
kdb_enter() at kdb_enter+0x3b/frame 0xfffffe1051c9a330
vpanic() at vpanic+0x19b/frame 0xfffffe1051c9a390
panic() at panic+0x43/frame 0xfffffe1051c9a3f0
bpf_buffer_append_mbuf() at bpf_buffer_append_mbuf+0x64/frame 0xfffffe1051c9a420
catchpacket() at catchpacket+0x4b9/frame 0xfffffe1051c9a4d0
bpf_mtap() at bpf_mtap+0x200/frame 0xfffffe1051c9a540
oce_multiq_transmit() at oce_multiq_transmit+0x88/frame 0xfffffe1051c9a5a0
oce_multiq_start() at oce_multiq_start+0x77/frame 0xfffffe1051c9a5d0
ether_output_frame() at ether_output_frame+0x98/frame 0xfffffe1051c9a600
ether_output() at ether_output+0x6d7/frame 0xfffffe1051c9a690
ip_output() at ip_output+0x138d/frame 0xfffffe1051c9a7c0
ip_forward() at ip_forward+0x2c3/frame 0xfffffe1051c9a860
ip_input() at ip_input+0x724/frame 0xfffffe1051c9a8f0
netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 0xfffffe1051c9a940
tunwrite() at tunwrite+0x21e/frame 0xfffffe1051c9a980
devfs_write_f() at devfs_write_f+0xda/frame 0xfffffe1051c9a9f0
dofilewrite() at dofilewrite+0xc6/frame 0xfffffe1051c9aa40
kern_writev() at kern_writev+0x68/frame 0xfffffe1051c9aa90
sys_writev() at sys_writev+0x35/frame 0xfffffe1051c9aac0
amd64_syscall() at amd64_syscall+0xa86/frame 0xfffffe1051c9abf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe1051c9abf0
--- syscall (121, FreeBSD ELF64, sys_writev), rip = 0x8017b02ea, rsp = 0x7fffffffddb8, rbp = 0x7fffffffde00 ---
db:0:kdb.enter.default> ps]

stevemosher

Adding netstat -m.

[2.4.5-RELEASE][root@mosh-fw.mosh.local]/root: netstat -m
28772/16528/45300 mbufs in use (current/cache/total)
19747/9139/28886/1000000 mbuf clusters in use (current/cache/total/max)
19747/9095 mbuf+clusters out of packet secondary zone in use (current/cache)
0/29/29/524288 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/524288 9k jumbo clusters in use (current/cache/total/max)
0/0/0/340483 16k jumbo clusters in use (current/cache/total/max)
46687K/22526K/69213K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
27 sendfile syscalls
13 sendfile syscalls completed without I/O request
11 requests for I/O initiated by sendfile
101 pages read by sendfile as part of a request
186 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
128 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed
[2.4.5-RELEASE][root@mosh-fw.mosh.local]/root:

Gertjan

During this function :

@stevemosher said in Random panic reboot:

bpf_buffer_append_mbuf() at bpf_buffer_append_mbuf+0x64/frame 0xfffffe1051c9a420

a panic is thrown.

So, "mbuf" - whatever that is - are missing, to small, not enough, etc etc etc

stevemosher

@stevemosher said in Random panic reboot:

We did change Firewall Maximum Fragment Entries from 5000 to 12000 and it still rebooted on us.

We identified that item and tried to fix it by setting max frag to 12000.
We thinking thats not enough? Or is this where the mbuf issue starts?

stephenw10

That's not the same as the mbuf limits but yours are set to 1M for standard sized packets and 500K for jumbo packets so it's unlikely you'd be exhausting that.
The netstat output confirms that, 0 denied clusters.

It does look very similar to the crash on that other thread. Not much more info there though.

What interface types do you have on that box? Can you switch them out?

Steve

stevemosher

Current card is a broadcom/hp 10G card. OCE0/OCE1

Trying to see if we can grab an intel x550t2 card and see if that changes anything.

I do agree it seems to be more bsd related.

stevemosher

We went ahead and picked up an Intel X550T2 card. Should be here next week.

Stay tuned.

Above all -- Thank you.

stephenw10

Ah, nice. Yeah looking at that the 'oce_multiq` call very close to the panic seems suspicious and is also in the panic on the other thread.

Steve

stevemosher

Follow up.

The interfaces keep flapping or at least stating they are up or down repeatedly on console.

The damn thing has ran for 8 days solid now no reboots.
The Intel card still has yet to show up.

stephenw10

Hmm, seems suspicious though. Maybe that older card is actually suspect if it keeps losing link. Or something else in the connection.

stevemosher

We did swap out the card a couple weeks ago thinking that was it.

We even tried a whole other pc for the foundation.

Only common thing is the OS and 1 certain make/model HP card.

Come on Intel card.

stevemosher

The Intel X550T2 is installed. We'll beat on it and see how it goes.

stevemosher

Pretty convinced our issues are gone with the new card.

Hell we even get better speeds in and out.

Thank you to all on this board who help us out.