Netgate SG-2440 / 21.02.2-RELEASE (amd64) / smart status : dev/da0: Unknown USB bridge [0x0424:0x2240 (0x198)]

lvrmsc

Hello,

On a good old Netgate SG-2440 which received recently the 21.02.2-RELEASE upgrade, I have started to get some crash (about once a week or so) : "page fault".

Now that it happened three times, there clearly is a trend here.

I wanted to check the SMART status of the "disk" device to see if there are signs of wear.

But the page /diag_smart.php returns this when when clicking any of the page buttons (Information - View and others):

smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-STABLE amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/da0: Unknown USB bridge [0x0424:0x2240 (0x198)]
Please specify device type with the -d option.

Use smartctl -h to get a usage summary

I don't mind running the command by hand to try to get further information, but I have no idea what device type to pass to -d option.

Also this is probably a bug that it doesn't pass the right command automatically.

Thanks for any idea or pointers.

stephenw10

The eMMC, which appears as a USB device, doesn't support SMART so you will never see any data there.
You could reinstall to an mSATA drive.

What do the crash reports look like? Near identical backtraces point to a software issue.

Steve

lvrmsc

@stephenw10 Thanks Stephen. Got it regarding eMMC/USB/SMART.
The crash looks like this (extract from the crash report on screen). I have the full text too, but did not download the two files (next time I'll do).

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address	= 0x28
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80ec5851
stack pointer	        = 0x28:0xfffffe0000430590
frame pointer	        = 0x28:0xfffffe00004305b0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 12 (swi4: clock (0))
trap number		= 12
panic: page fault
cpuid = 1
time = 1620227913
KDB: enter: panic

stephenw10

We need to see the back trace section (> bt) really. And compare that between several crashes if possible.

lvrmsc

@stephenw10 Unfortunately, I lost the full dump report of the previous occurrence, but it just happened again. Here is the kind of bt I get:

db:0:kdb.enter.default>  bt
Tracing pid 12 tid 100027 td 0xfffff8000424b000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe0000430250
vpanic() at vpanic+0x197/frame 0xfffffe00004302a0
panic() at panic+0x43/frame 0xfffffe0000430300
trap_fatal() at trap_fatal+0x391/frame 0xfffffe0000430360
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00004303b0
trap() at trap+0x286/frame 0xfffffe00004304c0
calltrap() at calltrap+0x8/frame 0xfffffe00004304c0
--- trap 0xc, rip = 0xffffffff80ec5851, rsp = 0xfffffe0000430590, rbp = 0xfffffe00004305b0 ---
ether_output_frame() at ether_output_frame+0x61/frame 0xfffffe00004305b0
ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430640
ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe0000430680
ng_pppoe_rcvdata() at ng_pppoe_rcvdata+0x24c/frame 0xfffffe0000430710
ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe00004307a0
ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00004307e0
ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430870
ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00004308b0
ng_ppp_link_xmit() at ng_ppp_link_xmit+0x124/frame 0xfffffe0000430900
ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430990
ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00004309d0
ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430a60
ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe0000430aa0
ng_iface_send() at ng_iface_send+0xd2/frame 0xfffffe0000430b20
ng_iface_start() at ng_iface_start+0x62/frame 0xfffffe0000430b60
cbqrestart() at cbqrestart+0x64/frame 0xfffffe0000430b90
rmc_restart() at rmc_restart+0x6f/frame 0xfffffe0000430bc0
softclock_call_cc() at softclock_call_cc+0x141/frame 0xfffffe0000430c70
softclock() at softclock+0x79/frame 0xfffffe0000430c90
ithread_loop() at ithread_loop+0x23c/frame 0xfffffe0000430cf0
fork_exit() at fork_exit+0x7e/frame 0xfffffe0000430d30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000430d30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Traffic shaper issue?

stephenw10

@lvrmsc said in Netgate SG-2440 / 21.02.2-RELEASE (amd64) / smart status : dev/da0: Unknown USB bridge [0x0424:0x2240 (0x198)]:

Traffic shaper issue?

Yes, potentially. Combined with something Netgraph is doing, PPPoE?

Really I would wait for it to crash again and compare the backtraces. If they are close to identical then try disabling or changing the traffic shaping.

Steve

lvrmsc

@stephenw10 Indeed. Wasn't long for another occurrence...

db:0:kdb.enter.default>  bt
Tracing pid 12 tid 100027 td 0xfffff8000424b000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe0000430250
vpanic() at vpanic+0x197/frame 0xfffffe00004302a0
panic() at panic+0x43/frame 0xfffffe0000430300
trap_fatal() at trap_fatal+0x391/frame 0xfffffe0000430360
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00004303b0
trap() at trap+0x286/frame 0xfffffe00004304c0
calltrap() at calltrap+0x8/frame 0xfffffe00004304c0
--- trap 0xc, rip = 0xffffffff80ec5851, rsp = 0xfffffe0000430590, rbp = 0xfffffe00004305b0 ---
ether_output_frame() at ether_output_frame+0x61/frame 0xfffffe00004305b0
ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430640
ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe0000430680
ng_pppoe_rcvdata() at ng_pppoe_rcvdata+0x24c/frame 0xfffffe0000430710
ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe00004307a0
ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00004307e0
ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430870
ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00004308b0
ng_ppp_link_xmit() at ng_ppp_link_xmit+0x124/frame 0xfffffe0000430900
ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430990
ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00004309d0
ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430a60
ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe0000430aa0
ng_iface_send() at ng_iface_send+0xd2/frame 0xfffffe0000430b20
ng_iface_start() at ng_iface_start+0x62/frame 0xfffffe0000430b60
cbqrestart() at cbqrestart+0x64/frame 0xfffffe0000430b90
rmc_restart() at rmc_restart+0x6f/frame 0xfffffe0000430bc0
softclock_call_cc() at softclock_call_cc+0x141/frame 0xfffffe0000430c70
softclock() at softclock+0x79/frame 0xfffffe0000430c90
ithread_loop() at ithread_loop+0x23c/frame 0xfffffe0000430cf0
fork_exit() at fork_exit+0x7e/frame 0xfffffe0000430d30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000430d30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Looks so much identical. This system gets its WAN over PPPOE, nothing changed on that side (that we know of at least), the box had been rock-solid for so long until it upgraded to 21.02 series from 2.4 series. The shaper was doing a nice job before. Maybe some configuration detail isn't OK with the upgrade. I will turn off the shaper for some time, and see how it goes.

stephenw10

Hmm, that looks actually exactly the same even the memory addresses. Are you sure that's not the same crash?

If not that's definitely the issue. Try disabling shaping if you can.

Steve

lvrmsc

@stephenw10 Thanks. Seeing the traces were so identical, I checked twice: no confusion, those were two true distinct crashes/reboots.
I had fully removed the good-old trafic shaper right after my last post.
Seeing no new issues for more than 12 hours, I started rebuilding a new shaper configuration. Looks stable for now
Thanks.