Failed 23.01 upgrade of 7100 pair
I've a pair of 7100 setup with CARP/HA with ETH8 as SYNC interface.
I did upgrade the secondary first, no issues.
But after the primary was upgraded ETH8 displays as no carrier and the primary reboots frequently.
The textdump.tar/panic.txt file just displays "page fault"
The last trap in textdump.tar/msgbuf.txt is:
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 10
fault virtual address = 0x0
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff813187ba
stack pointer = 0x0:0xfffffe0130908560
frame pointer = 0x0:0xfffffe0130908560
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 31445 (nginx)
rdi: fffff80142664114 rsi: 0 rdx: 41f
rcx: 41f r8: 1 r9: 9
rax: fffff80142664114 rbx: fffff801c8606900 rbp: fffffe0130908560
r10: 53f562b3 r11: 4c r12: fffff80077efc100
r13: 0 r14: fffff80141b6e400 r15: 1
trap number = 12
panic: page fault
cpuid = 2
time = 1678898975
KDB: enter: panic
Reverting the primary back to 22.05 and ETH8 is active again.
Any ideas on what is the matter with the primary?
Not enough information there to determine what happened, we need to see all of the files from the crash dump not just that part.
I created a support ticket and was told that in 23.01 MDI/MDI-X was turned off for the ETH ports.
Other than that I was told to do a re-image of the device and do a config restore afterwards.
I will need to schedule this but I'll update both here and of course the case.
The MDI-X issue wouldn't cause a panic, it would cause it not to have link on ports that need a different type of cable (straight-through vs crossover patches)
Reinstalling may or may not help with a panic but without the full crash dump it's hard to speculate more.
The full crash report was on your ticket and it shows you are hitting this:
Either of the two workarounds shown there should prevent it.
The patch to disable
sendfileis in the recommended patches list in the system patches package, so that would be the easiest way to apply it.
Thanks @jimp and @stephenw10 !
I can confirm that the sendfile patch removes the issue with panic.
Now is there a way to re-enable MDI-X for the ETH ports?
If not, what was the idea to remove that functionality?
It should not have been removed. It's a bug and has been fixed for 23.05 but would require a rebuild to fix in 23.01
ok I'll do sync over a VLAN untill 23.05 arrives.
Again thanks for you help!
That will work. Or just swap the link out for a cross-over cable.