Watchguard XTM-2520 IGB load fails with checksum error



  • I recently found myself in possession of a retired Watchguard XTM-2520. It was a pull from a clients datacenter, actually one of two from an HA pair. Not wanting to keep paying the price for ongoing Watcghuard subscriptions, I decided to install pfSense on it. That's where things got messy.
    As soon as I booted pfSense's installer, it crashed loading the IGB driver for the Intel 1gig nics.

    igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> mem 0xf7c60000-0xf7c7ffff,0xf7c8c000-0xf7c8ffff irq 19 at device 0.0 on pci7
    igb0: Using MSIX interrupts with 9 vectors
    igb0: The EEPROM Checksum Is Not Valid
    
    Fatal trap 12: page fault while in kernel mode
    cpuid = 2; apic id = 02
    fault virtual address   = 0x370
    fault code              = supervisor read data, page not present
    instruction pointer     = 0x20:0xffffffff8059d91f
    stack pointer           = 0x28:0xffffffff825a4300
    frame pointer           = 0x28:0xffffffff825a4340
    code segment            = base 0x0, limit 0xfffff, type 0x1b
                            = DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags        = interrupt enabled, resume, IOPL = 0
    current process         = 0 (swapper)
    [ thread pid 0 tid 100000 ]
    Stopped at      igb_detach+0x1f:        cmpq    $0,0x370(%r15)
    

    This box has twelve 1gig nics and four 10gigs. The ix driver had loaded successfully, and enumerated the four nics. This was just the 1gigs. To keep this month plus long story short, I beat my brains trying to find a way around this. Searched every little bit of the error message for clues.

    Ultimately the real issue was the EEPROM Checksum. These cards are built with the i350-am4 chip. Apparently the Watchguard OS and drivers do not check or ignore the checksum on the nic's EEPROM. This device was fully functional at time of replacement. The FreeBSD drivers which Intel maintains obviously doesn't ignore it. So how does one fix a bad checksum on a proprietary nic card? It is not as simple as pulling it and plugging it in to a handy bench machine and flashing firmware. The PCI-e card edge connector is on the back edge of the card and would be an awkward fit in anything other than the watchguard motherboard and chassis.

    I pulled down the latest Intel PREBOOT release, direct from Intel. Made a Freedos bootable CF card. Used the ctty command to redirect console output to the serial com1 port (that took some time to find). Copied the PREBOOT software on to the CF card and booted up good old DOS. This got me a serial console with dos and the Intel tools to flash a new firmware, and obviously write a new checksum as a side effect of updating the firmware. No joy.

    Bootutil tells me "FLASH Not Present":

    Option ROM area in the flash is not supported for this device on port 6
    
    Port Network Address Location Series  WOL Flash Firmware                Version
    ==== =============== ======== ======= === ============================= =======
      1   00907F1F3ABC     3:00.0 10GbE   N/A FLASH Not Present
      2   00907F1F3ABD     3:00.1 10GbE   N/A FLASH Not Present
      3   00907F1F3ABE     4:00.0 10GbE   N/A FLASH Not Present
      4   00907F1F3ABF     4:00.1 10GbE   N/A FLASH Not Present
      5   00A0C9000000     6:00.1 Gigabit N/A FLASH Not Present
      6   00907F1F3AB5     7:00.0 Gigabit NO  FLASH Not Present
      7   00907F1F3AB7     7:00.1 Gigabit NO  FLASH Not Present
      8   00907F1F3AB9     7:00.2 Gigabit NO  FLASH Not Present
      9   00907F1F3ABB     7:00.3 Gigabit NO  FLASH Not Present
     10   00907F1F3AB4     8:00.0 Gigabit NO  FLASH Not Present
     11   00907F1F3AB6     8:00.1 Gigabit NO  FLASH Not Present
     12   00907F1F3AB8     8:00.2 Gigabit NO  FLASH Not Present
     13   00907F1F3ABA     8:00.3 Gigabit NO  FLASH Not Present
    

    I did more searching and brain bashing.

    I came across some old discussions about an Intel utility called eeupdate. Eeupdate is used to make changes to the low level settings in the EEPROM. I couldn't find any official release of the tool from Intel. I did some more searching and found a copy hiding on a public ftp server.

    With eeupdate now on my CF card, I booted up Freedos again and this time I had the tools I needed. I tested the tool by checking the checksum of one of my 1gig interfaces.

    C:\INTEL2~1.2\APPS\TOOLS>eeupdate /nic=9 /test
    
    Using: Intel (R) PRO Network Connections SDK v2.25.8
    EEUPDATE v5.25.08.01
    Copyright (C) 1995 - 2015 Intel Corporation
    Intel (R) Confidential and not for general distribution.
    
    
    NIC Bus Dev Fun Vendor-Device  Branding string
    === === === === ============= =================================================
      1   3  00  00   8086-10FB    Intel(R) 82599 10 Gigabit Dual Port Network Conn
      2   3  00  01   8086-10FB    Intel(R) 82599 10 Gigabit Dual Port Network Conn
      3   4  00  00   8086-10FB    Intel(R) 82599 10 Gigabit Dual Port Network Conn
      4   4  00  01   8086-10FB    Intel(R) 82599 10 Gigabit Dual Port Network Conn
      5   6  00  01   8086-0436    DH8900CC Series Gigabit Silicon Default
      6   7  00  00   8086-1521    Intel(R) I350 Gigabit Network Connection
      7   7  00  01   8086-1521    Intel(R) I350 Gigabit Network Connection
      8   7  00  02   8086-1521    Intel(R) I350 Gigabit Network Connection
      9   7  00  03   8086-1521    Intel(R) I350 Gigabit Network Connection
    
     9: EEPROM test failed: Incorrect Checksum
    

    Then I told it to update the checksum.

    C:\INTEL2~1.2\APPS\TOOLS>eeupdate /nic=9 /CALCCHKSUM
    
    Using: Intel (R) PRO Network Connections SDK v2.25.8
    EEUPDATE v5.25.08.01
    Copyright (C) 1995 - 2015 Intel Corporation
    Intel (R) Confidential and not for general distribution.
    
    
    NIC Bus Dev Fun Vendor-Device  Branding string
    === === === === ============= =================================================
      1   3  00  00   8086-10FB    Intel(R) 82599 10 Gigabit Dual Port Network Conn
      2   3  00  01   8086-10FB    Intel(R) 82599 10 Gigabit Dual Port Network Conn
      3   4  00  00   8086-10FB    Intel(R) 82599 10 Gigabit Dual Port Network Conn
      4   4  00  01   8086-10FB    Intel(R) 82599 10 Gigabit Dual Port Network Conn
      5   6  00  01   8086-0436    DH8900CC Series Gigabit Silicon Default
      6   7  00  00   8086-1521    Intel(R) I350 Gigabit Network Connection
      7   7  00  01   8086-1521    Intel(R) I350 Gigabit Network Connection
      8   7  00  02   8086-1521    Intel(R) I350 Gigabit Network Connection
      9   7  00  03   8086-1521    Intel(R) I350 Gigabit Network Connection
    
     9:  Updating Checksum and CRCs...Done.
    
    C:\INTEL2~1.2\APPS\TOOLS>eeupdate /nic=9 /TEST
    
    Using: Intel (R) PRO Network Connections SDK v2.25.8
    EEUPDATE v5.25.08.01
    Copyright (C) 1995 - 2015 Intel Corporation
    Intel (R) Confidential and not for general distribution.
    
    
    NIC Bus Dev Fun Vendor-Device  Branding string
    === === === === ============= =================================================
      1   3  00  00   8086-10FB    Intel(R) 82599 10 Gigabit Dual Port Network Conn
      2   3  00  01   8086-10FB    Intel(R) 82599 10 Gigabit Dual Port Network Conn
      3   4  00  00   8086-10FB    Intel(R) 82599 10 Gigabit Dual Port Network Conn
      4   4  00  01   8086-10FB    Intel(R) 82599 10 Gigabit Dual Port Network Conn
      5   6  00  01   8086-0436    DH8900CC Series Gigabit Silicon Default
      6   7  00  00   8086-1521    Intel(R) I350 Gigabit Network Connection
      7   7  00  01   8086-1521    Intel(R) I350 Gigabit Network Connection
      8   7  00  02   8086-1521    Intel(R) I350 Gigabit Network Connection
      9   7  00  03   8086-1521    Intel(R) I350 Gigabit Network Connection
    
     9: EEPROM test passed.
    

    With all the interfaces updated with valid checksums, I booted up pfSense. It sailed right through device load and enumerated all 12 of the 1gig ports and the 4 10gig. pfSense was up and running and ready to be configured.

    Valid interfaces are:
    
    ix0     00:90:7f:1f:3a:bc (down) Intel(R) PRO/10GbE PCI-Express Network Driver,
    ix1     00:90:7f:1f:3a:bd (down) Intel(R) PRO/10GbE PCI-Express Network Driver,
    ix2     00:90:7f:1f:3a:be (down) Intel(R) PRO/10GbE PCI-Express Network Driver,
    ix3     00:90:7f:1f:3a:bf (down) Intel(R) PRO/10GbE PCI-Express Network Driver,
    igb0    00:90:7f:1f:3a:b5 (down) Intel(R) PRO/1000 Network Connection, Version
    igb1    00:90:7f:1f:3a:b7 (down) Intel(R) PRO/1000 Network Connection, Version
    igb2    00:90:7f:1f:3a:b9 (down) Intel(R) PRO/1000 Network Connection, Version
    igb3    00:90:7f:1f:3a:bb (down) Intel(R) PRO/1000 Network Connection, Version
    igb4    00:90:7f:1f:3a:b4 (down) Intel(R) PRO/1000 Network Connection, Version
    igb5    00:90:7f:1f:3a:b6 (down) Intel(R) PRO/1000 Network Connection, Version
    igb6    00:90:7f:1f:3a:b8 (down) Intel(R) PRO/1000 Network Connection, Version
    igb7    00:90:7f:1f:3a:ba (down) Intel(R) PRO/1000 Network Connection, Version
    igb8    00:90:7f:1f:3a:b0 (down) Intel(R) PRO/1000 Network Connection, Version
    igb9    00:90:7f:1f:3a:b1 (down) Intel(R) PRO/1000 Network Connection, Version
    igb10   00:90:7f:1f:3a:b2 (down) Intel(R) PRO/1000 Network Connection, Version
    igb11   00:90:7f:1f:3a:b3 (down) Intel(R) PRO/1000 Network Connection, Version
    

    I apologize for the length of this post, I promise if I documented all the missteps, it would be a much longer story. If anyone ever runs in to this issue, I hope my documenting this fix keeps them from losing as much time as I did figuring it out. I have intentionally left out the location of the eeupdate tool as I suspect it is not supposed to be out in the wild. I'll trust that if I could find it, so can others if they are in need.


  • Netgate Administrator

    I bet that was satisfying. 😀

    More fun that when it just works first time! Nice work.

    What CPU do those have?

    Steve



  • Yeah - the struggle certainly made the win very enjoyable. I'm not sure about fun. If it had just worked, I'm sure I would have been very satisfied with getting it up and running quickly.

    As for the internals - these things are beasts:

    CPU Type 	Intel(R) Xeon(R) CPU E3-1275 V2 @ 3.50GHz
    8 CPUs: 1 package(s) x 4 core(s) x 2 hardware threads
    AES-NI CPU Crypto: Yes (active) 
    Hardware crypto 	AES-CBC,AES-XTS,AES-GCM,AES-ICM
    
    CPU: Intel(R) Xeon(R) CPU E3-1275 V2 @ 3.50GHz (3492.14-MHz K8-class CPU)
      Origin="GenuineIntel"  Id=0x306a9  Family=0x6  Model=0x3a  Stepping=9
      Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
      Features2=0x7fbae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
      AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
      AMD Features2=0x1<LAHF>
      Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS>
      XSAVE Features=0x1<XSAVEOPT>
      VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
      TSC: P-state invariant, performance statistics
    real memory  = 34359738368 (32768 MB)
    avail memory = 33194868736 (31657 MB)
    

    I'm replacing an ER-8 that can't keep up with a 1gig cable internet feed. I expect this will have no issues even if I turn on full UTM.

    Guy


  • Netgate Administrator

    Ooo nice. That looks like overkill, yes. 😉

    Steve


Log in to reply