Netgate SG-2440 / 21.02.2-RELEASE (amd64) / smart status : dev/da0: Unknown USB bridge [0x0424:0x2240 (0x198)]
-
Hello,
On a good old Netgate SG-2440 which received recently the 21.02.2-RELEASE upgrade, I have started to get some crash (about once a week or so) : "page fault".
Now that it happened three times, there clearly is a trend here.
I wanted to check the SMART status of the "disk" device to see if there are signs of wear.
But the page /diag_smart.php returns this when when clicking any of the page buttons (Information - View and others):
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-STABLE amd64] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org /dev/da0: Unknown USB bridge [0x0424:0x2240 (0x198)] Please specify device type with the -d option. Use smartctl -h to get a usage summary
I don't mind running the command by hand to try to get further information, but I have no idea what device type to pass to -d option.
Also this is probably a bug that it doesn't pass the right command automatically.
Thanks for any idea or pointers.
-
The eMMC, which appears as a USB device, doesn't support SMART so you will never see any data there.
You could reinstall to an mSATA drive.What do the crash reports look like? Near identical backtraces point to a software issue.
Steve
-
@stephenw10 Thanks Stephen. Got it regarding eMMC/USB/SMART.
The crash looks like this (extract from the crash report on screen). I have the full text too, but did not download the two files (next time I'll do).Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x28 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80ec5851 stack pointer = 0x28:0xfffffe0000430590 frame pointer = 0x28:0xfffffe00004305b0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi4: clock (0)) trap number = 12 panic: page fault cpuid = 1 time = 1620227913 KDB: enter: panic
-
We need to see the back trace section (
> bt
) really. And compare that between several crashes if possible. -
@stephenw10 Unfortunately, I lost the full dump report of the previous occurrence, but it just happened again. Here is the kind of bt I get:
db:0:kdb.enter.default> bt Tracing pid 12 tid 100027 td 0xfffff8000424b000 kdb_enter() at kdb_enter+0x37/frame 0xfffffe0000430250 vpanic() at vpanic+0x197/frame 0xfffffe00004302a0 panic() at panic+0x43/frame 0xfffffe0000430300 trap_fatal() at trap_fatal+0x391/frame 0xfffffe0000430360 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00004303b0 trap() at trap+0x286/frame 0xfffffe00004304c0 calltrap() at calltrap+0x8/frame 0xfffffe00004304c0 --- trap 0xc, rip = 0xffffffff80ec5851, rsp = 0xfffffe0000430590, rbp = 0xfffffe00004305b0 --- ether_output_frame() at ether_output_frame+0x61/frame 0xfffffe00004305b0 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430640 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe0000430680 ng_pppoe_rcvdata() at ng_pppoe_rcvdata+0x24c/frame 0xfffffe0000430710 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe00004307a0 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00004307e0 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430870 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00004308b0 ng_ppp_link_xmit() at ng_ppp_link_xmit+0x124/frame 0xfffffe0000430900 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430990 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00004309d0 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430a60 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe0000430aa0 ng_iface_send() at ng_iface_send+0xd2/frame 0xfffffe0000430b20 ng_iface_start() at ng_iface_start+0x62/frame 0xfffffe0000430b60 cbqrestart() at cbqrestart+0x64/frame 0xfffffe0000430b90 rmc_restart() at rmc_restart+0x6f/frame 0xfffffe0000430bc0 softclock_call_cc() at softclock_call_cc+0x141/frame 0xfffffe0000430c70 softclock() at softclock+0x79/frame 0xfffffe0000430c90 ithread_loop() at ithread_loop+0x23c/frame 0xfffffe0000430cf0 fork_exit() at fork_exit+0x7e/frame 0xfffffe0000430d30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000430d30 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Traffic shaper issue?
-
@lvrmsc said in Netgate SG-2440 / 21.02.2-RELEASE (amd64) / smart status : dev/da0: Unknown USB bridge [0x0424:0x2240 (0x198)]:
Traffic shaper issue?
Yes, potentially. Combined with something Netgraph is doing, PPPoE?
Really I would wait for it to crash again and compare the backtraces. If they are close to identical then try disabling or changing the traffic shaping.
Steve
-
@stephenw10 Indeed. Wasn't long for another occurrence...
db:0:kdb.enter.default> bt Tracing pid 12 tid 100027 td 0xfffff8000424b000 kdb_enter() at kdb_enter+0x37/frame 0xfffffe0000430250 vpanic() at vpanic+0x197/frame 0xfffffe00004302a0 panic() at panic+0x43/frame 0xfffffe0000430300 trap_fatal() at trap_fatal+0x391/frame 0xfffffe0000430360 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00004303b0 trap() at trap+0x286/frame 0xfffffe00004304c0 calltrap() at calltrap+0x8/frame 0xfffffe00004304c0 --- trap 0xc, rip = 0xffffffff80ec5851, rsp = 0xfffffe0000430590, rbp = 0xfffffe00004305b0 --- ether_output_frame() at ether_output_frame+0x61/frame 0xfffffe00004305b0 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430640 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe0000430680 ng_pppoe_rcvdata() at ng_pppoe_rcvdata+0x24c/frame 0xfffffe0000430710 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe00004307a0 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00004307e0 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430870 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00004308b0 ng_ppp_link_xmit() at ng_ppp_link_xmit+0x124/frame 0xfffffe0000430900 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430990 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe00004309d0 ng_apply_item() at ng_apply_item+0x8c/frame 0xfffffe0000430a60 ng_snd_item() at ng_snd_item+0x188/frame 0xfffffe0000430aa0 ng_iface_send() at ng_iface_send+0xd2/frame 0xfffffe0000430b20 ng_iface_start() at ng_iface_start+0x62/frame 0xfffffe0000430b60 cbqrestart() at cbqrestart+0x64/frame 0xfffffe0000430b90 rmc_restart() at rmc_restart+0x6f/frame 0xfffffe0000430bc0 softclock_call_cc() at softclock_call_cc+0x141/frame 0xfffffe0000430c70 softclock() at softclock+0x79/frame 0xfffffe0000430c90 ithread_loop() at ithread_loop+0x23c/frame 0xfffffe0000430cf0 fork_exit() at fork_exit+0x7e/frame 0xfffffe0000430d30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000430d30 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Looks so much identical. This system gets its WAN over PPPOE, nothing changed on that side (that we know of at least), the box had been rock-solid for so long until it upgraded to 21.02 series from 2.4 series. The shaper was doing a nice job before. Maybe some configuration detail isn't OK with the upgrade. I will turn off the shaper for some time, and see how it goes.
-
Hmm, that looks actually exactly the same even the memory addresses. Are you sure that's not the same crash?
If not that's definitely the issue. Try disabling shaping if you can.
Steve
-
@stephenw10 Thanks. Seeing the traces were so identical, I checked twice: no confusion, those were two true distinct crashes/reboots.
I had fully removed the good-old trafic shaper right after my last post.
Seeing no new issues for more than 12 hours, I started rebuilding a new shaper configuration. Looks stable for now
Thanks.