Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Intel I350-T4 Errors

    Scheduled Pinned Locked Moved Hardware
    18 Posts 2 Posters 834 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Hmm, that sure looks like a problem then!

      Anything in the sysctl stats showing the error type?: sysctl dev.igb.0

      1 Reply Last reply Reply Quote 0
      • B
        bigjme93
        last edited by

        Annoyingly i had to restart the VM to try and check firmware versions but will wait for it to crash again and re-test, its likely to be later today but if not, i will grab the output and see what's going on

        Is there anything specific i should be looking for in that output?

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          There are a number of sysctls for different errors like:

          dev.igb.0.watchdog_timeouts: 0
          dev.igb.0.rx_overruns: 0
          ...
          dev.igb.0.dropped: 0
          ...
          dev.igb.0.mac_stats.coll_ext_errs: 0
          dev.igb.0.mac_stats.alignment_errs: 0
          dev.igb.0.mac_stats.crc_errs: 0
          dev.igb.0.mac_stats.recv_errs: 0
          dev.igb.0.mac_stats.recv_jabber: 0
          dev.igb.0.mac_stats.recv_oversize: 0
          dev.igb.0.mac_stats.recv_fragmented: 0
          dev.igb.0.mac_stats.recv_undersize: 0
          dev.igb.0.mac_stats.recv_no_buff: 0
          dev.igb.0.mac_stats.recv_length_errors: 0
          dev.igb.0.mac_stats.missed_packets: 0
          dev.igb.0.mac_stats.defer_count: 0
          
          1 Reply Last reply Reply Quote 0
          • B
            bigjme93
            last edited by

            As a quick update for anyone keeping an eye on this, as per usual, you ask for help and the system refuses to play ball when you need it to

            After days of repeatedly failing after 6 to 9 hours, the system is currently at 24 hours with no errors
            The host has not been rebooted, it has not been re-seated or anything else like that, no configs have been altered.

            The only thing i did yesterday when checking the firmware version was boot it into a live ubuntu, turn on "FLASHENABLE" as it was turned off and hid the version, and reflash the PXE firmware to the latest intel one, i did not flash the NVM as i was not sure how to do that

            All of this was done in the same vm just booting a ubuntu iso, no passthrough changes so it was identical. Once done i shut down the vm, removed the iso, and started it again so no host changes

            In theory, i am not using PXE so that firmware should make no difference but right now, this is the longest its managed and i have run 200GB of data through it

            Hopefully this fixes it but i'm unsure why it would if it does

            Regards,
            Jamie

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Hmm, hard to imagine a PXE firmware having any effect but....

              1 Reply Last reply Reply Quote 0
              • B
                bigjme93
                last edited by

                I completely agree

                The only consideration i did have was if enabling flashing made a difference
                This is more of an old KVM thing i have had to do in the past but sometimes it runs a shadow rom to pass through devices so it resets correctly

                I used to have to do this for GPU's back on ubuntu but i'm unsure how Truenas does this
                If they are trying a shadow rom and the device is blocking flashing and detecting some kind of ROM change, i wonder if that could cause it?

                Its a massive stretch though at this point, but i know the I340 i had is much less picky with firmware than the I350's

                1 Reply Last reply Reply Quote 0
                • B
                  bigjme93
                  last edited by

                  Ok so i thought it was going well but sadly it crashed last night, i have left the vm up in case i need to run further commands but because of this, i have no easy way to copy the command outputs so apologies in advance for the screenshots

                  75b90f9a-e52b-4bf1-922e-3eeada2b428c-image.png

                  197e9551-b655-46fe-8e5e-d55d57cfc20e-image.png

                  c185235c-439a-4ee9-aec6-e99b0e5b1185-image.png

                  7c077113-2f7c-42f1-a1ed-413bf94ea172-image.png

                  75681b37-182a-4cb6-a546-5bbe459bf7a1-image.png

                  cfde7e8a-20cf-4ece-80cc-1fdf0e5f4c59-image.png

                  b4444862-b4d3-4a2e-9ef7-ab732710c57e-image.png

                  04898619-6838-45f1-9b4a-176ac05fd071-image.png

                  ac023f63-85f8-41e5-aa7c-ba880ee23cc1-image.png

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Hmm, those values for interrupts and mac_stats look bogus. Like the driver or firmware are not returning real data or are returning bad data. That 'feels' like the firmware is failing.

                    Does it show bogus values for all 4 ports?

                    Does it show real data after a reboot? Does it require a full power cycle?

                    1 Reply Last reply Reply Quote 0
                    • B
                      bigjme93
                      last edited by

                      a04779af-4dbd-4f6c-b456-dad716fdf36e-image.png

                      f8f6f1d9-9754-427e-880a-847d2efa660b-image.png

                      855dae20-e7a2-44ae-84a6-88485f85d890-image.png

                      a45e4d32-ce01-4850-b7d8-2e7224e185e7-image.png

                      Yep, bogus values on all 4 ports and they all start at the exact same time as far as I can tell.
                      That makes me think its less likely to be a driver but I may be wrong.

                      Simply restarting the VM fixes the issues and it will go back to normal until it happens again.
                      To my knowledge, even a vm reboot does trigger the device to reset so it would make sense that this fixes it if it was firmware.

                      I have left the machine up with the nic broken just in case its worth me running some commands to see if anything can be ruled out.
                      For example, rebinding/resetting the igb driver etc. to see if it magically fixes without a device reset

                      Its not something that I am familiar with so I'm hoping for a little guidance from the experts here in hopes that the debugging may help someone else in a similar situation down the line.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Hmm, seems like a driver problem if a reboot fixes it. If it was firmware I'd expect it to require a full power cycle. Unless the driver resets the card somehow (and is still able to do so).

                        I'm not really sure what we could do here though. There is a kmod igb driver but we don't build it so it would tricky to test.

                        The i350-T4 is a pretty common NIC. Seeing two fail the same way seems unlikely. It seems likely either a problem with the virtualisation or something low level with the hardware it's installed in.

                        Can you try running it without the hardware pass-though? That would at least confirm the hardware compatibility.

                        1 Reply Last reply Reply Quote 0
                        • B
                          bigjme93
                          last edited by bigjme93

                          I could potentially try leaving the nic on the host and adding them as virtual nics if that's what you mean?

                          Edit: For the sake of sanity, i have left the nic on the host and passed each port through to the vm using virtio drivers so they shopw up as vtnet. I guess if this passes then we know its got to be an issue with the passthrough to pfsense?

                          As you say, its a common nic so should not be an issue within pfsense itself

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Yup exactly that. If it's some low level hardware issue it will still fail.

                            Yes, I doubt this is a pfSense/FreeBSD issue directly because i350 is commonly used. At one time that was probably the recommended choice.

                            1 Reply Last reply Reply Quote 0
                            • B
                              bigjme93
                              last edited by

                              Thanks for all the replies and quick help with this, its really appreciated!

                              I will wait and see what happens in this instance and will report back as soon as I hit an issue or it starts hitting long uptime counts

                              1 Reply Last reply Reply Quote 1
                              • B
                                bigjme93
                                last edited by

                                Time for an update, sadly not a good one, i know this is getting a little off topic for pfSense directly but it may still help others

                                Less than 48 hours and the same issues has happened on the host, spot the quad nic
                                d8105e6b-78ab-43b7-ab02-cbf4166731dc-image.png

                                b6386497-d0a1-4317-b43d-afed61305b9d-image.png

                                33c6e351-c653-4b5f-86f8-6b0a301a13fe-image.png

                                db3b4058-ab83-4209-a480-81a0883729f7-image.png

                                enp4s0f0 and f1 are my intel 10gb NICs but they use ixgbe. I tried reloading igb and had no luck
                                Looking through dmesg i get the following:

                                [168683.759683] igb 0000:05:00.3 enp5s0f3: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168685.039818] igb 0000:05:00.0 enp5s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168685.807653] igb 0000:05:00.1 enp5s0f1: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168687.087693] igb 0000:05:00.2 enp5s0f2: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168687.855687] igb 0000:05:00.3 enp5s0f3: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168688.879965] igb 0000:05:00.0 enp5s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168689.903630] igb 0000:05:00.1 enp5s0f1: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168690.927699] igb 0000:05:00.2 enp5s0f2: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168691.951708] igb 0000:05:00.3 enp5s0f3: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168692.975871] igb 0000:05:00.0 enp5s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168693.999640] igb 0000:05:00.1 enp5s0f1: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168695.023693] igb 0000:05:00.2 enp5s0f2: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168695.791663] igb 0000:05:00.3 enp5s0f3: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168696.815772] igb 0000:05:00.0 enp5s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168697.839653] igb 0000:05:00.1 enp5s0f1: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168698.863696] igb 0000:05:00.2 enp5s0f2: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168699.887665] igb 0000:05:00.3 enp5s0f3: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168700.437650] igb 0000:05:00.3: removed PHC on enp5s0f3
                                [168700.911656] igb 0000:05:00.0 enp5s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168701.935637] igb 0000:05:00.1 enp5s0f1: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168702.179403] igb 0000:05:00.3 enp5s0f3 (unregistering): left allmulticast mode
                                [168702.179406] igb 0000:05:00.3 enp5s0f3 (unregistering): left promiscuous mode
                                [168702.468717] igb 0000:05:00.2: removed PHC on enp5s0f2
                                [168702.959649] igb 0000:05:00.2 enp5s0f2: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168704.702443] igb 0000:05:00.2 enp5s0f2 (unregistering): left allmulticast mode
                                [168704.702447] igb 0000:05:00.2 enp5s0f2 (unregistering): left promiscuous mode
                                [168704.952707] igb 0000:05:00.1: removed PHC on enp5s0f1
                                [168705.007729] igb 0000:05:00.0 enp5s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168705.775759] igb 0000:05:00.1 enp5s0f1: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168707.776784] igb 0000:05:00.0: removed PHC on enp5s0f0
                                [168708.847740] igb 0000:05:00.0 enp5s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                                [168710.590634] igb 0000:05:00.0 enp5s0f0 (unregistering): left allmulticast mode
                                [168710.590637] igb 0000:05:00.0 enp5s0f0 (unregistering): left promiscuous mode
                                [168712.799086] igb: Intel(R) Gigabit Ethernet Network Driver
                                [168712.799089] igb: Copyright (c) 2007-2014 Intel Corporation.
                                [168712.799121] igb 0000:05:00.0: enabling device (0000 -> 0002)
                                [168712.799290] igb 0000:05:00.0 0000:05:00.0 (uninitialized): PCIe link lost
                                [168712.799522] igb: Failed to read reg 0x18!
                                [168712.799610] WARNING: CPU: 2 PID: 1106121 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x7c/0x90 [igb]
                                [168712.799706] Modules linked in: igb(E+) vhost_net(E) tun(E) vhost(E) vhost_iotlb(E) macvtap(E) macvlan(E) tap(E) xt_nat(E) xt_tcpudp(E) veth(E) xt_conntrack(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xfrm_user(E) xt_addrtype(E) nft_compat(E) nf_tables(E) libcrc32c(E) crc32c_generic(E) nfnetlink(E) br_netfilter(E) bridge(E) stp(E) llc(E) nvme_fabrics(E) overlay(E) sunrpc(E) binfmt_misc(E) ntb_netdev(E) ntb_transport(E) ntb_split(E) ntb(E) ioatdma(E) ib_core(E) edac_mce_amd(E) kvm_amd(E) kvm(E) irqbypass(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) sha1_ssse3(E) amdgpu(E) aesni_intel(E) crypto_simd(E) cryptd(E) snd_hda_codec_hdmi(E) rapl(E) drm_exec(E) amdxcp(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) snd_hda_intel(E) cec(E) snd_usb_audio(E) snd_intel_dspcfg(E) snd_usbmidi_lib(E) rc_core(E) snd_hda_codec(E) snd_rawmidi(E) snd_seq_device(E) snd_hda_core(E) mc(E) snd_hwdep(E) drm_ttm_helper(E)
                                [168712.799722]  snd_pcm(E) wmi_bmof(E) ttm(E) snd_timer(E) sp5100_tco(E) pcspkr(E) drm_kms_helper(E) snd(E) k10temp(E) watchdog(E) soundcore(E) ccp(E) cdc_mbim(E) joydev(E) cdc_wdm(E) sg(E) button(E) evdev(E) loop(E) drm(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) zfs(POE) spl(OE) efivarfs(E) sr_mod(E) cdrom(E) hid_generic(E) cdc_ncm(E) cdc_ether(E) usbnet(E) usbhid(E) uas(E) hid(E) usb_storage(E) mii(E) sd_mod(E) ahci(E) nvme(E) ahciem(E) xhci_pci(E) libahci(E) xhci_hcd(E) ixgbe(E) nvme_core(E) libata(E) xfrm_algo(E) t10_pi(E) crc32_pclmul(E) mdio_devres(E) usbcore(E) scsi_mod(E) crc32c_intel(E) crc64_rocksoft(E) i2c_piix4(E) usb_common(E) scsi_common(E) crc64(E) crc_t10dif(E) i2c_algo_bit(E) libphy(E) dca(E) crct10dif_generic(E) mdio(E) crct10dif_pclmul(E) crct10dif_common(E) video(E) wmi(E) gpio_amdpt(E) gpio_generic(E) [last unloaded: igb(E)]
                                [168712.800964] RIP: 0010:igb_rd32+0x7c/0x90 [igb]
                                [168712.803497]  ? igb_rd32+0x7c/0x90 [igb]
                                [168712.803880]  ? igb_rd32+0x7c/0x90 [igb]
                                [168712.805033]  ? igb_rd32+0x7c/0x90 [igb]
                                [168712.805228]  ? igb_rd32+0x7c/0x90 [igb]
                                [168712.805422]  igb_get_invariants_82575+0xa6/0xf00 [igb]
                                [168712.805620]  igb_probe+0x3be/0x1520 [igb]
                                [168712.807796]  ? __pfx_igb_init_module+0x10/0x10 [igb]
                                [168713.136674] igb 0000:05:00.0: PHY reset is blocked due to SOL/IDER session.
                                [168714.866546] igb 0000:05:00.0: The NVM Checksum Is Not Valid
                                [168715.129000] igb: probe of 0000:05:00.0 failed with error -5
                                [168715.129479] igb 0000:05:00.1: enabling device (0000 -> 0002)
                                [168715.129619] igb 0000:05:00.1 0000:05:00.1 (uninitialized): PCIe link lost
                                [168715.130133] igb: Failed to read reg 0x18!
                                [168715.130381] WARNING: CPU: 10 PID: 1106121 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x7c/0x90 [igb]
                                [168715.130642] Modules linked in: igb(E+)
                                [168715.130922]  snd_timer(E) sp5100_tco(E) pcspkr(E) drm_kms_helper(E) snd(E) k10temp(E) watchdog(E) soundcore(E) ccp(E) cdc_mbim(E) joydev(E) cdc_wdm(E) sg(E) button(E) evdev(E) loop(E) drm(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) zfs(POE) spl(OE) efivarfs(E) sr_mod(E) cdrom(E) hid_generic(E) cdc_ncm(E) cdc_ether(E) usbnet(E) usbhid(E) uas(E) hid(E) usb_storage(E) mii(E) sd_mod(E) ahci(E) nvme(E) ahciem(E) xhci_pci(E) libahci(E) xhci_hcd(E) ixgbe(E) nvme_core(E) libata(E) xfrm_algo(E) t10_pi(E) crc32_pclmul(E) mdio_devres(E) usbcore(E) scsi_mod(E) crc32c_intel(E) crc64_rocksoft(E) i2c_piix4(E) usb_common(E) scsi_common(E) crc64(E) crc_t10dif(E) i2c_algo_bit(E) libphy(E) dca(E) crct10dif_generic(E) mdio(E) crct10dif_pclmul(E) crct10dif_common(E) video(E) wmi(E) gpio_amdpt(E) gpio_generic(E) [last unloaded: igb(E)]
                                [168715.133023] RIP: 0010:igb_rd32+0x7c/0x90 [igb]
                                [168715.137255]  ? igb_rd32+0x7c/0x90 [igb]
                                [168715.137508]  ? igb_rd32+0x7c/0x90 [igb]
                                [168715.138247]  ? igb_rd32+0x7c/0x90 [igb]
                                [168715.138250]  ? igb_rd32+0x7c/0x90 [igb]
                                [168715.139694]  igb_get_invariants_82575+0xa6/0xf00 [igb]
                                [168715.139700]  igb_probe+0x3be/0x1520 [igb]
                                [168715.142610]  ? __pfx_igb_init_module+0x10/0x10 [igb]
                                [168715.472667] igb 0000:05:00.1: PHY reset is blocked due to SOL/IDER session.
                                [168717.202621] igb 0000:05:00.1: The NVM Checksum Is Not Valid
                                [168717.400883] igb: probe of 0000:05:00.1 failed with error -5
                                [168717.401370] igb 0000:05:00.2: enabling device (0000 -> 0002)
                                [168717.401509] igb 0000:05:00.2 0000:05:00.2 (uninitialized): PCIe link lost
                                [168717.402106] igb: Failed to read reg 0x18!
                                [168717.402357] WARNING: CPU: 2 PID: 1106121 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x7c/0x90 [igb]
                                [168717.402619] Modules linked in: igb(E+) vhost_net(E) tun(E) vhost(E) vhost_iotlb(E) macvtap(E) macvlan(E) tap(E) xt_nat(E) xt_tcpudp(E) veth(E) xt_conntrack(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xfrm_user(E) xt_addrtype(E) nft_compat(E) nf_tables(E) libcrc32c(E) crc32c_generic(E) nfnetlink(E) br_netfilter(E) bridge(E) stp(E) llc(E) nvme_fabrics(E) overlay(E) sunrpc(E) binfmt_misc(E) ntb_netdev(E) ntb_transport(E) ntb_split(E) ntb(E) ioatdma(E) ib_core(E) edac_mce_amd(E) kvm_amd(E) kvm(E) irqbypass(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) sha1_ssse3(E) amdgpu(E) aesni_intel(E) crypto_simd(E) cryptd(E) snd_hda_codec_hdmi(E) rapl(E) drm_exec(E) amdxcp(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) snd_hda_intel(E) cec(E) snd_usb_audio(E) snd_intel_dspcfg(E) snd_usbmidi_lib(E) rc_core(E) snd_hda_codec(E) snd_rawmidi(E) snd_seq_device(E) snd_hda_core(E) mc(E) snd_hwdep(E) drm_ttm_helper(E)
                                [168717.402644]  snd_pcm(E) wmi_bmof(E) ttm(E) snd_timer(E) sp5100_tco(E) pcspkr(E) drm_kms_helper(E) snd(E) k10temp(E) watchdog(E) soundcore(E) ccp(E) cdc_mbim(E) joydev(E) cdc_wdm(E) sg(E) button(E) evdev(E) loop(E) drm(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) zfs(POE) spl(OE) efivarfs(E) sr_mod(E) cdrom(E) hid_generic(E) cdc_ncm(E) cdc_ether(E) usbnet(E) usbhid(E) uas(E) hid(E) usb_storage(E) mii(E) sd_mod(E) ahci(E) nvme(E) ahciem(E) xhci_pci(E) libahci(E) xhci_hcd(E) ixgbe(E) nvme_core(E) libata(E) xfrm_algo(E) t10_pi(E) crc32_pclmul(E) mdio_devres(E) usbcore(E) scsi_mod(E) crc32c_intel(E) crc64_rocksoft(E) i2c_piix4(E) usb_common(E) scsi_common(E) crc64(E) crc_t10dif(E) i2c_algo_bit(E) libphy(E) dca(E) crct10dif_generic(E) mdio(E) crct10dif_pclmul(E) crct10dif_common(E) video(E) wmi(E) gpio_amdpt(E) gpio_generic(E) [last unloaded: igb(E)]
                                [168717.405253] RIP: 0010:igb_rd32+0x7c/0x90 [igb]
                                [168717.408900]  ? igb_rd32+0x7c/0x90 [igb]
                                [168717.409393]  ? igb_rd32+0x7c/0x90 [igb]
                                [168717.410845]  ? igb_rd32+0x7c/0x90 [igb]
                                [168717.411088]  ? igb_rd32+0x7c/0x90 [igb]
                                [168717.411330]  igb_get_invariants_82575+0xa6/0xf00 [igb]
                                [168717.411577]  igb_probe+0x3be/0x1520 [igb]
                                [168717.414240]  ? __pfx_igb_init_module+0x10/0x10 [igb]
                                [168717.744687] igb 0000:05:00.2: PHY reset is blocked due to SOL/IDER session.
                                [168719.474678] igb 0000:05:00.2: The NVM Checksum Is Not Valid
                                [168719.664847] igb: probe of 0000:05:00.2 failed with error -5
                                [168719.665336] igb 0000:05:00.3: enabling device (0000 -> 0002)
                                [168719.665475] igb 0000:05:00.3 0000:05:00.3 (uninitialized): PCIe link lost
                                [168719.666111] igb: Failed to read reg 0x18!
                                [168719.666361] WARNING: CPU: 10 PID: 1106121 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x7c/0x90 [igb]
                                [168719.666623] Modules linked in: igb(E+) vhost_net(E) tun(E) vhost(E) vhost_iotlb(E) macvtap(E) macvlan(E) tap(E) xt_nat(E) xt_tcpudp(E) veth(E) xt_conntrack(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xfrm_user(E) xt_addrtype(E) nft_compat(E) nf_tables(E) libcrc32c(E) crc32c_generic(E) nfnetlink(E) br_netfilter(E) bridge(E) stp(E) llc(E) nvme_fabrics(E) overlay(E) sunrpc(E) binfmt_misc(E) ntb_netdev(E) ntb_transport(E) ntb_split(E) ntb(E) ioatdma(E) ib_core(E) edac_mce_amd(E) kvm_amd(E) kvm(E) irqbypass(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) sha1_ssse3(E) amdgpu(E) aesni_intel(E) crypto_simd(E) cryptd(E) snd_hda_codec_hdmi(E) rapl(E) drm_exec(E) amdxcp(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) snd_hda_intel(E) cec(E) snd_usb_audio(E) snd_intel_dspcfg(E) snd_usbmidi_lib(E) rc_core(E) snd_hda_codec(E) snd_rawmidi(E) snd_seq_device(E) snd_hda_core(E) mc(E) snd_hwdep(E) drm_ttm_helper(E)
                                [168719.666648]  snd_pcm(E) wmi_bmof(E) ttm(E) snd_timer(E) sp5100_tco(E) pcspkr(E) drm_kms_helper(E) snd(E) k10temp(E) watchdog(E) soundcore(E) ccp(E) cdc_mbim(E) joydev(E) cdc_wdm(E) sg(E) button(E) evdev(E) loop(E) drm(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) zfs(POE) spl(OE) efivarfs(E) sr_mod(E) cdrom(E) hid_generic(E) cdc_ncm(E) cdc_ether(E) usbnet(E) usbhid(E) uas(E) hid(E) usb_storage(E) mii(E) sd_mod(E) ahci(E) nvme(E) ahciem(E) xhci_pci(E) libahci(E) xhci_hcd(E) ixgbe(E) nvme_core(E) libata(E) xfrm_algo(E) t10_pi(E) crc32_pclmul(E) mdio_devres(E) usbcore(E) scsi_mod(E) crc32c_intel(E) crc64_rocksoft(E) i2c_piix4(E) usb_common(E) scsi_common(E) crc64(E) crc_t10dif(E) i2c_algo_bit(E) libphy(E) dca(E) crct10dif_generic(E) mdio(E) crct10dif_pclmul(E) crct10dif_common(E) video(E) wmi(E) gpio_amdpt(E) gpio_generic(E) [last unloaded: igb(E)]
                                [168719.669264] RIP: 0010:igb_rd32+0x7c/0x90 [igb]
                                [168719.672935]  ? igb_rd32+0x7c/0x90 [igb]
                                [168719.673429]  ? igb_rd32+0x7c/0x90 [igb]
                                [168719.674884]  ? igb_rd32+0x7c/0x90 [igb]
                                [168719.675127]  ? igb_rd32+0x7c/0x90 [igb]
                                [168719.675369]  igb_get_invariants_82575+0xa6/0xf00 [igb]
                                [168719.675616]  igb_probe+0x3be/0x1520 [igb]
                                [168719.678284]  ? __pfx_igb_init_module+0x10/0x10 [igb]
                                [168720.008929] igb 0000:05:00.3: PHY reset is blocked due to SOL/IDER session.
                                [168721.738903] igb 0000:05:00.3: The NVM Checksum Is Not Valid
                                [168721.916849] igb: probe of 0000:05:00.3 failed with error -5
                                

                                My last port of call now is to pull the nic and try it in a difference PCI-e slot
                                The current slot is a gen 3 1x slot so should be fine but i will try it in a gen 5 16x on the same system just in case its causing issues. I doubt it but it is indeed looking like some kind of hardware issue now

                                Regards,
                                Jamie

                                1 Reply Last reply Reply Quote 1
                                • B
                                  bigjme93
                                  last edited by

                                  For anyone interested in the exciting conclusions... it worked fine in the 16x slot for 2 weeks and is still in there now
                                  I put an I340-T4 in the 1x slot at the same time and left that running and that has been perfectly fine as well

                                  It seems to be an incompatibility between the 1x slot and the I350 specifically but i'm not sure why. In either case, the issue seems to be resolved

                                  It may be something specific to AM5 and the I350 in the 1x, or just the I350 and the 1x alone but if anyone else for some reason tries the same, at least you know what symptoms manifest and what the cause was

                                  Thanks again for those that helped and commented

                                  1 Reply Last reply Reply Quote 1
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.