Watchguard XTM-2520 IGB load fails with checksum error
-
I recently found myself in possession of a retired Watchguard XTM-2520. It was a pull from a clients datacenter, actually one of two from an HA pair. Not wanting to keep paying the price for ongoing Watcghuard subscriptions, I decided to install pfSense on it. That's where things got messy.
As soon as I booted pfSense's installer, it crashed loading the IGB driver for the Intel 1gig nics.igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> mem 0xf7c60000-0xf7c7ffff,0xf7c8c000-0xf7c8ffff irq 19 at device 0.0 on pci7 igb0: Using MSIX interrupts with 9 vectors igb0: The EEPROM Checksum Is Not Valid Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x370 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8059d91f stack pointer = 0x28:0xffffffff825a4300 frame pointer = 0x28:0xffffffff825a4340 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (swapper) [ thread pid 0 tid 100000 ] Stopped at igb_detach+0x1f: cmpq $0,0x370(%r15)
This box has twelve 1gig nics and four 10gigs. The ix driver had loaded successfully, and enumerated the four nics. This was just the 1gigs. To keep this month plus long story short, I beat my brains trying to find a way around this. Searched every little bit of the error message for clues.
Ultimately the real issue was the EEPROM Checksum. These cards are built with the i350-am4 chip. Apparently the Watchguard OS and drivers do not check or ignore the checksum on the nic's EEPROM. This device was fully functional at time of replacement. The FreeBSD drivers which Intel maintains obviously doesn't ignore it. So how does one fix a bad checksum on a proprietary nic card? It is not as simple as pulling it and plugging it in to a handy bench machine and flashing firmware. The PCI-e card edge connector is on the back edge of the card and would be an awkward fit in anything other than the watchguard motherboard and chassis.
I pulled down the latest Intel PREBOOT release, direct from Intel. Made a Freedos bootable CF card. Used the ctty command to redirect console output to the serial com1 port (that took some time to find). Copied the PREBOOT software on to the CF card and booted up good old DOS. This got me a serial console with dos and the Intel tools to flash a new firmware, and obviously write a new checksum as a side effect of updating the firmware. No joy.
Bootutil tells me "FLASH Not Present":
Option ROM area in the flash is not supported for this device on port 6 Port Network Address Location Series WOL Flash Firmware Version ==== =============== ======== ======= === ============================= ======= 1 00907F1F3ABC 3:00.0 10GbE N/A FLASH Not Present 2 00907F1F3ABD 3:00.1 10GbE N/A FLASH Not Present 3 00907F1F3ABE 4:00.0 10GbE N/A FLASH Not Present 4 00907F1F3ABF 4:00.1 10GbE N/A FLASH Not Present 5 00A0C9000000 6:00.1 Gigabit N/A FLASH Not Present 6 00907F1F3AB5 7:00.0 Gigabit NO FLASH Not Present 7 00907F1F3AB7 7:00.1 Gigabit NO FLASH Not Present 8 00907F1F3AB9 7:00.2 Gigabit NO FLASH Not Present 9 00907F1F3ABB 7:00.3 Gigabit NO FLASH Not Present 10 00907F1F3AB4 8:00.0 Gigabit NO FLASH Not Present 11 00907F1F3AB6 8:00.1 Gigabit NO FLASH Not Present 12 00907F1F3AB8 8:00.2 Gigabit NO FLASH Not Present 13 00907F1F3ABA 8:00.3 Gigabit NO FLASH Not Present
I did more searching and brain bashing.
I came across some old discussions about an Intel utility called eeupdate. Eeupdate is used to make changes to the low level settings in the EEPROM. I couldn't find any official release of the tool from Intel. I did some more searching and found a copy hiding on a public ftp server.
With eeupdate now on my CF card, I booted up Freedos again and this time I had the tools I needed. I tested the tool by checking the checksum of one of my 1gig interfaces.
C:\INTEL2~1.2\APPS\TOOLS>eeupdate /nic=9 /test Using: Intel (R) PRO Network Connections SDK v2.25.8 EEUPDATE v5.25.08.01 Copyright (C) 1995 - 2015 Intel Corporation Intel (R) Confidential and not for general distribution. NIC Bus Dev Fun Vendor-Device Branding string === === === === ============= ================================================= 1 3 00 00 8086-10FB Intel(R) 82599 10 Gigabit Dual Port Network Conn 2 3 00 01 8086-10FB Intel(R) 82599 10 Gigabit Dual Port Network Conn 3 4 00 00 8086-10FB Intel(R) 82599 10 Gigabit Dual Port Network Conn 4 4 00 01 8086-10FB Intel(R) 82599 10 Gigabit Dual Port Network Conn 5 6 00 01 8086-0436 DH8900CC Series Gigabit Silicon Default 6 7 00 00 8086-1521 Intel(R) I350 Gigabit Network Connection 7 7 00 01 8086-1521 Intel(R) I350 Gigabit Network Connection 8 7 00 02 8086-1521 Intel(R) I350 Gigabit Network Connection 9 7 00 03 8086-1521 Intel(R) I350 Gigabit Network Connection 9: EEPROM test failed: Incorrect Checksum
Then I told it to update the checksum.
C:\INTEL2~1.2\APPS\TOOLS>eeupdate /nic=9 /CALCCHKSUM Using: Intel (R) PRO Network Connections SDK v2.25.8 EEUPDATE v5.25.08.01 Copyright (C) 1995 - 2015 Intel Corporation Intel (R) Confidential and not for general distribution. NIC Bus Dev Fun Vendor-Device Branding string === === === === ============= ================================================= 1 3 00 00 8086-10FB Intel(R) 82599 10 Gigabit Dual Port Network Conn 2 3 00 01 8086-10FB Intel(R) 82599 10 Gigabit Dual Port Network Conn 3 4 00 00 8086-10FB Intel(R) 82599 10 Gigabit Dual Port Network Conn 4 4 00 01 8086-10FB Intel(R) 82599 10 Gigabit Dual Port Network Conn 5 6 00 01 8086-0436 DH8900CC Series Gigabit Silicon Default 6 7 00 00 8086-1521 Intel(R) I350 Gigabit Network Connection 7 7 00 01 8086-1521 Intel(R) I350 Gigabit Network Connection 8 7 00 02 8086-1521 Intel(R) I350 Gigabit Network Connection 9 7 00 03 8086-1521 Intel(R) I350 Gigabit Network Connection 9: Updating Checksum and CRCs...Done.
C:\INTEL2~1.2\APPS\TOOLS>eeupdate /nic=9 /TEST Using: Intel (R) PRO Network Connections SDK v2.25.8 EEUPDATE v5.25.08.01 Copyright (C) 1995 - 2015 Intel Corporation Intel (R) Confidential and not for general distribution. NIC Bus Dev Fun Vendor-Device Branding string === === === === ============= ================================================= 1 3 00 00 8086-10FB Intel(R) 82599 10 Gigabit Dual Port Network Conn 2 3 00 01 8086-10FB Intel(R) 82599 10 Gigabit Dual Port Network Conn 3 4 00 00 8086-10FB Intel(R) 82599 10 Gigabit Dual Port Network Conn 4 4 00 01 8086-10FB Intel(R) 82599 10 Gigabit Dual Port Network Conn 5 6 00 01 8086-0436 DH8900CC Series Gigabit Silicon Default 6 7 00 00 8086-1521 Intel(R) I350 Gigabit Network Connection 7 7 00 01 8086-1521 Intel(R) I350 Gigabit Network Connection 8 7 00 02 8086-1521 Intel(R) I350 Gigabit Network Connection 9 7 00 03 8086-1521 Intel(R) I350 Gigabit Network Connection 9: EEPROM test passed.
With all the interfaces updated with valid checksums, I booted up pfSense. It sailed right through device load and enumerated all 12 of the 1gig ports and the 4 10gig. pfSense was up and running and ready to be configured.
Valid interfaces are: ix0 00:90:7f:1f:3a:bc (down) Intel(R) PRO/10GbE PCI-Express Network Driver, ix1 00:90:7f:1f:3a:bd (down) Intel(R) PRO/10GbE PCI-Express Network Driver, ix2 00:90:7f:1f:3a:be (down) Intel(R) PRO/10GbE PCI-Express Network Driver, ix3 00:90:7f:1f:3a:bf (down) Intel(R) PRO/10GbE PCI-Express Network Driver, igb0 00:90:7f:1f:3a:b5 (down) Intel(R) PRO/1000 Network Connection, Version igb1 00:90:7f:1f:3a:b7 (down) Intel(R) PRO/1000 Network Connection, Version igb2 00:90:7f:1f:3a:b9 (down) Intel(R) PRO/1000 Network Connection, Version igb3 00:90:7f:1f:3a:bb (down) Intel(R) PRO/1000 Network Connection, Version igb4 00:90:7f:1f:3a:b4 (down) Intel(R) PRO/1000 Network Connection, Version igb5 00:90:7f:1f:3a:b6 (down) Intel(R) PRO/1000 Network Connection, Version igb6 00:90:7f:1f:3a:b8 (down) Intel(R) PRO/1000 Network Connection, Version igb7 00:90:7f:1f:3a:ba (down) Intel(R) PRO/1000 Network Connection, Version igb8 00:90:7f:1f:3a:b0 (down) Intel(R) PRO/1000 Network Connection, Version igb9 00:90:7f:1f:3a:b1 (down) Intel(R) PRO/1000 Network Connection, Version igb10 00:90:7f:1f:3a:b2 (down) Intel(R) PRO/1000 Network Connection, Version igb11 00:90:7f:1f:3a:b3 (down) Intel(R) PRO/1000 Network Connection, Version
I apologize for the length of this post, I promise if I documented all the missteps, it would be a much longer story. If anyone ever runs in to this issue, I hope my documenting this fix keeps them from losing as much time as I did figuring it out. I have intentionally left out the location of the eeupdate tool as I suspect it is not supposed to be out in the wild. I'll trust that if I could find it, so can others if they are in need.
-
I bet that was satisfying.
More fun that when it just works first time! Nice work.
What CPU do those have?
Steve
-
Yeah - the struggle certainly made the win very enjoyable. I'm not sure about fun. If it had just worked, I'm sure I would have been very satisfied with getting it up and running quickly.
As for the internals - these things are beasts:
CPU Type Intel(R) Xeon(R) CPU E3-1275 V2 @ 3.50GHz 8 CPUs: 1 package(s) x 4 core(s) x 2 hardware threads AES-NI CPU Crypto: Yes (active) Hardware crypto AES-CBC,AES-XTS,AES-GCM,AES-ICM CPU: Intel(R) Xeon(R) CPU E3-1275 V2 @ 3.50GHz (3492.14-MHz K8-class CPU) Origin="GenuineIntel" Id=0x306a9 Family=0x6 Model=0x3a Stepping=9 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x7fbae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x1<LAHF> Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS> XSAVE Features=0x1<XSAVEOPT> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 34359738368 (32768 MB) avail memory = 33194868736 (31657 MB)
I'm replacing an ER-8 that can't keep up with a 1gig cable internet feed. I expect this will have no issues even if I turn on full UTM.
Guy
-
Ooo nice. That looks like overkill, yes.
Steve