Crash Report after runing traceroute
-
I tried running traceroute from the pf console on a fresh install - not in production - just testing - and each of the 3 times the system hangs on the first hop - then goes into panic, crashes and reboots. I repeated to process and the results were the same.
The machine is an older Dell - i5 with 8G ram and an SSD - running the ill-fated Realtek NICS. Seems to do everything else fairly well. But this was a bit of a surprise. Even in production there will be less than 15 nodes on the entire LAN.See attached.
M
Crash_Report_12_30_24.txt -
I guess I should add that this is the 2.7.2-RELEASE - in mostly the default config with only the WAN/LAN ports configured. Nothing else has been added.
-
Backtrace:
db:0:kdb.enter.default> bt Tracing pid 0 tid 100082 td 0xfffffe008c6da1e0 kdb_enter() at kdb_enter+0x32/frame 0xfffffe0082c90a10 vpanic() at vpanic+0x163/frame 0xfffffe0082c90b40 panic() at panic+0x43/frame 0xfffffe0082c90ba0 trap_fatal() at trap_fatal+0x40c/frame 0xfffffe0082c90c00 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0082c90c60 calltrap() at calltrap+0x8/frame 0xfffffe0082c90c60 --- trap 0xc, rip = 0xffffffff80e24c30, rsp = 0xfffffe0082c90d30, rbp = 0xfffffe0082c90d80 --- ether_input() at ether_input+0x50/frame 0xfffffe0082c90d80 re_rxeof() at re_rxeof+0x2c0/frame 0xfffffe0082c90e00 re_int_task_8125() at re_int_task_8125+0xba/frame 0xfffffe0082c90e40 taskqueue_run_locked() at taskqueue_run_locked+0x182/frame 0xfffffe0082c90ec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe0082c90ef0 fork_exit() at fork_exit+0x7f/frame 0xfffffe0082c90f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0082c90f30 --- trap 0, rip = 0, rsp = 0, rbp = 0xfffffe0082a901e8 --- ??() at 0/frame 0xfffffe0082a901e8 ??() at 0xfffffe0082a90258/frame 0xfffffe0082a90f78 ??() at 0xfffffe0082a901e8/frame 0xfffffe0082a90f08 ??() at 0xfffffe0082a90f78
Panic:
Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x10007 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80e24c30 stack pointer = 0x28:0xfffffe0082cb8d30 frame pointer = 0x28:0xfffffe0082cb8d80 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (re1 taskq) rdi: 0000000000000000 rsi: fffffe008c6daac0 rdx: 0000000000000001 rcx: 0000000000000001 r8: 00000000ffffff31 r9: 0000000000000080 rax: 0000000000000000 rbx: 000000000000ffff rbp: fffffe0082cb8d80 r10: 0000000000000001 r11: fffff80001622000 r12: 0000000000008803 r13: 000000000000ffff r14: fffffe008c6daac0 r15: 0000000000000000 trap number = 12 panic: page fault cpuid = 1 time = 1735585591
What NIC exactly is that?
-
@stephenw10
From dmesg.boot below.The 2.5GbE is a Trendnet TEG-25GECTX - the other is the on-board.
pci1: <ACPI PCI bus> on pcib2 re0: <Realtek PCIe 2.5GbE Family Controller> port 0xe000-0xe0ff mem 0xf7d00000-0xf7d0ffff,0xf7d10000-0xf7d13fff at device 0.0 on pci1 re0: Using Memory Mapping! re0: Using 1 MSI-X message re0: ASPM disabled re0: version:1.98.00 re0: Ethernet address: 78:2d:7e:1e:a3:26 This product is covered by one or more of the following patents: US6,570,884, US6,115,776, and US6,327,625. re0: Ethernet address: 78:2d:7e:1e:a3:26 pcib3: <ACPI PCI-PCI bridge> at device 28.3 on pci0 pci2: <ACPI PCI bus> on pcib3 re1: <Realtek PCIe GbE Family Controller> port 0xd000-0xd0ff mem 0xf7c00000-0xf7c00fff,0xf0000000-0xf0003fff at device 0.0 on pci2 re1: Using Memory Mapping! re1: Using 1 MSI-X message re1: ASPM disabled re1: version:1.98.00 re1: Ethernet address: b0:83:fe:ab:1a:90
pciconf -lv below:
re0@pci0:2:0:0: class=0x020000 rev=0x00 hdr=0x00 vendor=0x10ec device=0x8125 subvendor=0x10ec subdevice=0x0123 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8125 2.5GbE Controller' class = network subclass = ethernet re1@pci0:3:0:0: class=0x020000 rev=0x0c hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x1028 subdevice=0x0612 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet
-
@stephenw10
I forgot to mention that the behavior was the same from the GUI. Clicked on traceroute - same thing happened. Tried it from the console - identical behavior.
M -
Looks like it's re1 giving the issue. Is that assigned as WAN?
If you traceroute to something on the LAN using the other NIC does it fail?
If you trceroute using ICMP instead of UDP does it still panic?
I would normally suggest trying the alternative Realtek driver but it looks like you're already doing that to get the 8125 support I assume.
-
@stephenw10
Yes - I updated the driver before any of this happened.
I'll have to test this tomorrow or on the weekend while everything is closed. The system is now in production.
I will report back - as this is a bit odd. We suspected that it might be a loop on the LAN, given that it happened while in a test environment while connected to a couple of switches instead of directly to the WAN link - but that is just conjecture at this point.
I'll definitely post here once I can test. -
@stephenw10 said in Crash Report after runing traceroute:
Looks like it's re1 giving the issue. Is that assigned as WAN?
re1 is currently the LAN link - but it might have been WAN when this happened.
I'll test and report back. -
@stephenw10
WAN - traceroute -i re0 8.8.8.8 - crash on the first hop
LAN - traceroute -i re1 8.8.8.8 - crash on the first hoptraceroute -1 re0 192.168.1.11 - crash on the first hop
traceroute -1 re0 192.168.1.11 - went 6 hops without crashing - but discovered zero routes to a machine sitting on the LAN.traceroute to 192.168.1.11 (192.168.1.11), 64 hops max, 40 byte packets 1 * * * 2 * * * 3 * * * 4 * * * 5 * * * 6 * *^C
traceroute from GUI - crash - presumably on the first hop
The system is now inline and in production with basic WAN/LAN ip assignments.
M -
Hi
what kind of driver are you using?
The site says only about support for Windowshttps://www.trendnet.com/support/TEG-25GECTX
-
@Konstanti
I'm using the updated drivers described here.
The rest of the system seems stable - VLANS working, etc... -
The re driver supports RealTek RTL8139C+, RTL8169, RTL816xS, RTL811xS, RTL8168,RTL810xE and RTL8111 based Fast Ethernet and Gigabit Ethernet adapters
Your adapter is not on this list (I suspect the problem is with the device driver)
device = 'RTL8125 2.5GbE Controller'
-
@Konstanti
The behavior is the same with both NICS - which of course happen to both be Realtek.
Live and learn I guess.
If I end up building another box - I'll keep an eye out for it in the future.
M -
The in kernel re driver doesn't support rtl8125 but the alternative kmod driver does.
@studeoQ said in Crash Report after runing traceroute:
traceroute -1 re0 192.168.1.11 - crash on the first hop
traceroute -1 re0 192.168.1.11 - went 6 hops without crashing - but discovered zero routes to a machine sitting on the LAN.I assume one of those is a typo? And I expect the second test there should show re1 because testing to the LAN from the LAN IP removes re0 from the route. That would confirm it's actually the re0 driver causing a problem. In which case you could try going back to the in-kernel driver but that will stop the 2.5G NIC working.
-
@stephenw10
Yes - a typo - and I couldn't edit - too much time had passed.
Right now - the "fix" is - don't run traceroute...
Everything else is working as expected - so tempted to leave it alone for now. -
@studeoQ said in Crash Report after runing traceroute:
Everything else is working as expected - so tempted to leave it alone for now.
Your initial plan was the best, as 're' will come back to bite you :
@studeoQ said in Crash Report after runing traceroute:
If I end up building another box
No need to rebuild, go visit the BIOS and do what all 're' merits : disable them.
Then slide in a dual (quad ?) intel NIC, and you'll be good. -
Here, perhaps, I agree.
It is unknown from which source code this driver is built and for which version of Freebsd. If there are already problems that lead to a system failure , then it is better to abandon this idea. With Intel network cards, everything has been functioning for years without problems -
@Konstanti
It's a limitation of the box I built - SFF PC with no real ability to add very many options (single PCIe 1X slot) so - finding the right NIC might be the next and only option. If not - then a new box will be in order. I was pretty impatient when it came to building the first one. I'll be more careful in the future. -
Do you see any issues if you run a traceroute from a client behind pfSense through it?
-
@stephenw10
Other than being a little slower than i think is normal - no - there aren't any obvious issues.
It can be run from nodes residing behind it.
I seriously doubt I would be running it on the device for any reason other than testing.
I just thought a system-wide failure was a little odd.If anyone knows of a PCIe X1 dual NIC that might fit the bill - I'm all ears.
Speed tests are normal and I'm getting the expected symmetrical speeds - so it can't be "that broken".
But - I'm not a big fan of loose ends and known broken stuff just lingering around - so - I'd feel better knowing it wasn't an issue.