Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Intel I350-T4 Errors

    Scheduled Pinned Locked Moved Hardware
    18 Posts 2 Posters 837 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B
      bigjme93
      last edited by

      Annoyingly i had to restart the VM to try and check firmware versions but will wait for it to crash again and re-test, its likely to be later today but if not, i will grab the output and see what's going on

      Is there anything specific i should be looking for in that output?

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        There are a number of sysctls for different errors like:

        dev.igb.0.watchdog_timeouts: 0
        dev.igb.0.rx_overruns: 0
        ...
        dev.igb.0.dropped: 0
        ...
        dev.igb.0.mac_stats.coll_ext_errs: 0
        dev.igb.0.mac_stats.alignment_errs: 0
        dev.igb.0.mac_stats.crc_errs: 0
        dev.igb.0.mac_stats.recv_errs: 0
        dev.igb.0.mac_stats.recv_jabber: 0
        dev.igb.0.mac_stats.recv_oversize: 0
        dev.igb.0.mac_stats.recv_fragmented: 0
        dev.igb.0.mac_stats.recv_undersize: 0
        dev.igb.0.mac_stats.recv_no_buff: 0
        dev.igb.0.mac_stats.recv_length_errors: 0
        dev.igb.0.mac_stats.missed_packets: 0
        dev.igb.0.mac_stats.defer_count: 0
        
        1 Reply Last reply Reply Quote 0
        • B
          bigjme93
          last edited by

          As a quick update for anyone keeping an eye on this, as per usual, you ask for help and the system refuses to play ball when you need it to

          After days of repeatedly failing after 6 to 9 hours, the system is currently at 24 hours with no errors
          The host has not been rebooted, it has not been re-seated or anything else like that, no configs have been altered.

          The only thing i did yesterday when checking the firmware version was boot it into a live ubuntu, turn on "FLASHENABLE" as it was turned off and hid the version, and reflash the PXE firmware to the latest intel one, i did not flash the NVM as i was not sure how to do that

          All of this was done in the same vm just booting a ubuntu iso, no passthrough changes so it was identical. Once done i shut down the vm, removed the iso, and started it again so no host changes

          In theory, i am not using PXE so that firmware should make no difference but right now, this is the longest its managed and i have run 200GB of data through it

          Hopefully this fixes it but i'm unsure why it would if it does

          Regards,
          Jamie

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Hmm, hard to imagine a PXE firmware having any effect but....

            1 Reply Last reply Reply Quote 0
            • B
              bigjme93
              last edited by

              I completely agree

              The only consideration i did have was if enabling flashing made a difference
              This is more of an old KVM thing i have had to do in the past but sometimes it runs a shadow rom to pass through devices so it resets correctly

              I used to have to do this for GPU's back on ubuntu but i'm unsure how Truenas does this
              If they are trying a shadow rom and the device is blocking flashing and detecting some kind of ROM change, i wonder if that could cause it?

              Its a massive stretch though at this point, but i know the I340 i had is much less picky with firmware than the I350's

              1 Reply Last reply Reply Quote 0
              • B
                bigjme93
                last edited by

                Ok so i thought it was going well but sadly it crashed last night, i have left the vm up in case i need to run further commands but because of this, i have no easy way to copy the command outputs so apologies in advance for the screenshots

                75b90f9a-e52b-4bf1-922e-3eeada2b428c-image.png

                197e9551-b655-46fe-8e5e-d55d57cfc20e-image.png

                c185235c-439a-4ee9-aec6-e99b0e5b1185-image.png

                7c077113-2f7c-42f1-a1ed-413bf94ea172-image.png

                75681b37-182a-4cb6-a546-5bbe459bf7a1-image.png

                cfde7e8a-20cf-4ece-80cc-1fdf0e5f4c59-image.png

                b4444862-b4d3-4a2e-9ef7-ab732710c57e-image.png

                04898619-6838-45f1-9b4a-176ac05fd071-image.png

                ac023f63-85f8-41e5-aa7c-ba880ee23cc1-image.png

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Hmm, those values for interrupts and mac_stats look bogus. Like the driver or firmware are not returning real data or are returning bad data. That 'feels' like the firmware is failing.

                  Does it show bogus values for all 4 ports?

                  Does it show real data after a reboot? Does it require a full power cycle?

                  1 Reply Last reply Reply Quote 0
                  • B
                    bigjme93
                    last edited by

                    a04779af-4dbd-4f6c-b456-dad716fdf36e-image.png

                    f8f6f1d9-9754-427e-880a-847d2efa660b-image.png

                    855dae20-e7a2-44ae-84a6-88485f85d890-image.png

                    a45e4d32-ce01-4850-b7d8-2e7224e185e7-image.png

                    Yep, bogus values on all 4 ports and they all start at the exact same time as far as I can tell.
                    That makes me think its less likely to be a driver but I may be wrong.

                    Simply restarting the VM fixes the issues and it will go back to normal until it happens again.
                    To my knowledge, even a vm reboot does trigger the device to reset so it would make sense that this fixes it if it was firmware.

                    I have left the machine up with the nic broken just in case its worth me running some commands to see if anything can be ruled out.
                    For example, rebinding/resetting the igb driver etc. to see if it magically fixes without a device reset

                    Its not something that I am familiar with so I'm hoping for a little guidance from the experts here in hopes that the debugging may help someone else in a similar situation down the line.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Hmm, seems like a driver problem if a reboot fixes it. If it was firmware I'd expect it to require a full power cycle. Unless the driver resets the card somehow (and is still able to do so).

                      I'm not really sure what we could do here though. There is a kmod igb driver but we don't build it so it would tricky to test.

                      The i350-T4 is a pretty common NIC. Seeing two fail the same way seems unlikely. It seems likely either a problem with the virtualisation or something low level with the hardware it's installed in.

                      Can you try running it without the hardware pass-though? That would at least confirm the hardware compatibility.

                      1 Reply Last reply Reply Quote 0
                      • B
                        bigjme93
                        last edited by bigjme93

                        I could potentially try leaving the nic on the host and adding them as virtual nics if that's what you mean?

                        Edit: For the sake of sanity, i have left the nic on the host and passed each port through to the vm using virtio drivers so they shopw up as vtnet. I guess if this passes then we know its got to be an issue with the passthrough to pfsense?

                        As you say, its a common nic so should not be an issue within pfsense itself

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Yup exactly that. If it's some low level hardware issue it will still fail.

                          Yes, I doubt this is a pfSense/FreeBSD issue directly because i350 is commonly used. At one time that was probably the recommended choice.

                          1 Reply Last reply Reply Quote 0
                          • B
                            bigjme93
                            last edited by

                            Thanks for all the replies and quick help with this, its really appreciated!

                            I will wait and see what happens in this instance and will report back as soon as I hit an issue or it starts hitting long uptime counts

                            1 Reply Last reply Reply Quote 1
                            • B
                              bigjme93
                              last edited by

                              Time for an update, sadly not a good one, i know this is getting a little off topic for pfSense directly but it may still help others

                              Less than 48 hours and the same issues has happened on the host, spot the quad nic
                              d8105e6b-78ab-43b7-ab02-cbf4166731dc-image.png

                              b6386497-d0a1-4317-b43d-afed61305b9d-image.png

                              33c6e351-c653-4b5f-86f8-6b0a301a13fe-image.png

                              db3b4058-ab83-4209-a480-81a0883729f7-image.png

                              enp4s0f0 and f1 are my intel 10gb NICs but they use ixgbe. I tried reloading igb and had no luck
                              Looking through dmesg i get the following:

                              [168683.759683] igb 0000:05:00.3 enp5s0f3: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168685.039818] igb 0000:05:00.0 enp5s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168685.807653] igb 0000:05:00.1 enp5s0f1: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168687.087693] igb 0000:05:00.2 enp5s0f2: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168687.855687] igb 0000:05:00.3 enp5s0f3: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168688.879965] igb 0000:05:00.0 enp5s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168689.903630] igb 0000:05:00.1 enp5s0f1: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168690.927699] igb 0000:05:00.2 enp5s0f2: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168691.951708] igb 0000:05:00.3 enp5s0f3: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168692.975871] igb 0000:05:00.0 enp5s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168693.999640] igb 0000:05:00.1 enp5s0f1: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168695.023693] igb 0000:05:00.2 enp5s0f2: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168695.791663] igb 0000:05:00.3 enp5s0f3: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168696.815772] igb 0000:05:00.0 enp5s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168697.839653] igb 0000:05:00.1 enp5s0f1: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168698.863696] igb 0000:05:00.2 enp5s0f2: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168699.887665] igb 0000:05:00.3 enp5s0f3: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168700.437650] igb 0000:05:00.3: removed PHC on enp5s0f3
                              [168700.911656] igb 0000:05:00.0 enp5s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168701.935637] igb 0000:05:00.1 enp5s0f1: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168702.179403] igb 0000:05:00.3 enp5s0f3 (unregistering): left allmulticast mode
                              [168702.179406] igb 0000:05:00.3 enp5s0f3 (unregistering): left promiscuous mode
                              [168702.468717] igb 0000:05:00.2: removed PHC on enp5s0f2
                              [168702.959649] igb 0000:05:00.2 enp5s0f2: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168704.702443] igb 0000:05:00.2 enp5s0f2 (unregistering): left allmulticast mode
                              [168704.702447] igb 0000:05:00.2 enp5s0f2 (unregistering): left promiscuous mode
                              [168704.952707] igb 0000:05:00.1: removed PHC on enp5s0f1
                              [168705.007729] igb 0000:05:00.0 enp5s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168705.775759] igb 0000:05:00.1 enp5s0f1: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168707.776784] igb 0000:05:00.0: removed PHC on enp5s0f0
                              [168708.847740] igb 0000:05:00.0 enp5s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
                              [168710.590634] igb 0000:05:00.0 enp5s0f0 (unregistering): left allmulticast mode
                              [168710.590637] igb 0000:05:00.0 enp5s0f0 (unregistering): left promiscuous mode
                              [168712.799086] igb: Intel(R) Gigabit Ethernet Network Driver
                              [168712.799089] igb: Copyright (c) 2007-2014 Intel Corporation.
                              [168712.799121] igb 0000:05:00.0: enabling device (0000 -> 0002)
                              [168712.799290] igb 0000:05:00.0 0000:05:00.0 (uninitialized): PCIe link lost
                              [168712.799522] igb: Failed to read reg 0x18!
                              [168712.799610] WARNING: CPU: 2 PID: 1106121 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x7c/0x90 [igb]
                              [168712.799706] Modules linked in: igb(E+) vhost_net(E) tun(E) vhost(E) vhost_iotlb(E) macvtap(E) macvlan(E) tap(E) xt_nat(E) xt_tcpudp(E) veth(E) xt_conntrack(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xfrm_user(E) xt_addrtype(E) nft_compat(E) nf_tables(E) libcrc32c(E) crc32c_generic(E) nfnetlink(E) br_netfilter(E) bridge(E) stp(E) llc(E) nvme_fabrics(E) overlay(E) sunrpc(E) binfmt_misc(E) ntb_netdev(E) ntb_transport(E) ntb_split(E) ntb(E) ioatdma(E) ib_core(E) edac_mce_amd(E) kvm_amd(E) kvm(E) irqbypass(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) sha1_ssse3(E) amdgpu(E) aesni_intel(E) crypto_simd(E) cryptd(E) snd_hda_codec_hdmi(E) rapl(E) drm_exec(E) amdxcp(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) snd_hda_intel(E) cec(E) snd_usb_audio(E) snd_intel_dspcfg(E) snd_usbmidi_lib(E) rc_core(E) snd_hda_codec(E) snd_rawmidi(E) snd_seq_device(E) snd_hda_core(E) mc(E) snd_hwdep(E) drm_ttm_helper(E)
                              [168712.799722]  snd_pcm(E) wmi_bmof(E) ttm(E) snd_timer(E) sp5100_tco(E) pcspkr(E) drm_kms_helper(E) snd(E) k10temp(E) watchdog(E) soundcore(E) ccp(E) cdc_mbim(E) joydev(E) cdc_wdm(E) sg(E) button(E) evdev(E) loop(E) drm(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) zfs(POE) spl(OE) efivarfs(E) sr_mod(E) cdrom(E) hid_generic(E) cdc_ncm(E) cdc_ether(E) usbnet(E) usbhid(E) uas(E) hid(E) usb_storage(E) mii(E) sd_mod(E) ahci(E) nvme(E) ahciem(E) xhci_pci(E) libahci(E) xhci_hcd(E) ixgbe(E) nvme_core(E) libata(E) xfrm_algo(E) t10_pi(E) crc32_pclmul(E) mdio_devres(E) usbcore(E) scsi_mod(E) crc32c_intel(E) crc64_rocksoft(E) i2c_piix4(E) usb_common(E) scsi_common(E) crc64(E) crc_t10dif(E) i2c_algo_bit(E) libphy(E) dca(E) crct10dif_generic(E) mdio(E) crct10dif_pclmul(E) crct10dif_common(E) video(E) wmi(E) gpio_amdpt(E) gpio_generic(E) [last unloaded: igb(E)]
                              [168712.800964] RIP: 0010:igb_rd32+0x7c/0x90 [igb]
                              [168712.803497]  ? igb_rd32+0x7c/0x90 [igb]
                              [168712.803880]  ? igb_rd32+0x7c/0x90 [igb]
                              [168712.805033]  ? igb_rd32+0x7c/0x90 [igb]
                              [168712.805228]  ? igb_rd32+0x7c/0x90 [igb]
                              [168712.805422]  igb_get_invariants_82575+0xa6/0xf00 [igb]
                              [168712.805620]  igb_probe+0x3be/0x1520 [igb]
                              [168712.807796]  ? __pfx_igb_init_module+0x10/0x10 [igb]
                              [168713.136674] igb 0000:05:00.0: PHY reset is blocked due to SOL/IDER session.
                              [168714.866546] igb 0000:05:00.0: The NVM Checksum Is Not Valid
                              [168715.129000] igb: probe of 0000:05:00.0 failed with error -5
                              [168715.129479] igb 0000:05:00.1: enabling device (0000 -> 0002)
                              [168715.129619] igb 0000:05:00.1 0000:05:00.1 (uninitialized): PCIe link lost
                              [168715.130133] igb: Failed to read reg 0x18!
                              [168715.130381] WARNING: CPU: 10 PID: 1106121 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x7c/0x90 [igb]
                              [168715.130642] Modules linked in: igb(E+)
                              [168715.130922]  snd_timer(E) sp5100_tco(E) pcspkr(E) drm_kms_helper(E) snd(E) k10temp(E) watchdog(E) soundcore(E) ccp(E) cdc_mbim(E) joydev(E) cdc_wdm(E) sg(E) button(E) evdev(E) loop(E) drm(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) zfs(POE) spl(OE) efivarfs(E) sr_mod(E) cdrom(E) hid_generic(E) cdc_ncm(E) cdc_ether(E) usbnet(E) usbhid(E) uas(E) hid(E) usb_storage(E) mii(E) sd_mod(E) ahci(E) nvme(E) ahciem(E) xhci_pci(E) libahci(E) xhci_hcd(E) ixgbe(E) nvme_core(E) libata(E) xfrm_algo(E) t10_pi(E) crc32_pclmul(E) mdio_devres(E) usbcore(E) scsi_mod(E) crc32c_intel(E) crc64_rocksoft(E) i2c_piix4(E) usb_common(E) scsi_common(E) crc64(E) crc_t10dif(E) i2c_algo_bit(E) libphy(E) dca(E) crct10dif_generic(E) mdio(E) crct10dif_pclmul(E) crct10dif_common(E) video(E) wmi(E) gpio_amdpt(E) gpio_generic(E) [last unloaded: igb(E)]
                              [168715.133023] RIP: 0010:igb_rd32+0x7c/0x90 [igb]
                              [168715.137255]  ? igb_rd32+0x7c/0x90 [igb]
                              [168715.137508]  ? igb_rd32+0x7c/0x90 [igb]
                              [168715.138247]  ? igb_rd32+0x7c/0x90 [igb]
                              [168715.138250]  ? igb_rd32+0x7c/0x90 [igb]
                              [168715.139694]  igb_get_invariants_82575+0xa6/0xf00 [igb]
                              [168715.139700]  igb_probe+0x3be/0x1520 [igb]
                              [168715.142610]  ? __pfx_igb_init_module+0x10/0x10 [igb]
                              [168715.472667] igb 0000:05:00.1: PHY reset is blocked due to SOL/IDER session.
                              [168717.202621] igb 0000:05:00.1: The NVM Checksum Is Not Valid
                              [168717.400883] igb: probe of 0000:05:00.1 failed with error -5
                              [168717.401370] igb 0000:05:00.2: enabling device (0000 -> 0002)
                              [168717.401509] igb 0000:05:00.2 0000:05:00.2 (uninitialized): PCIe link lost
                              [168717.402106] igb: Failed to read reg 0x18!
                              [168717.402357] WARNING: CPU: 2 PID: 1106121 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x7c/0x90 [igb]
                              [168717.402619] Modules linked in: igb(E+) vhost_net(E) tun(E) vhost(E) vhost_iotlb(E) macvtap(E) macvlan(E) tap(E) xt_nat(E) xt_tcpudp(E) veth(E) xt_conntrack(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xfrm_user(E) xt_addrtype(E) nft_compat(E) nf_tables(E) libcrc32c(E) crc32c_generic(E) nfnetlink(E) br_netfilter(E) bridge(E) stp(E) llc(E) nvme_fabrics(E) overlay(E) sunrpc(E) binfmt_misc(E) ntb_netdev(E) ntb_transport(E) ntb_split(E) ntb(E) ioatdma(E) ib_core(E) edac_mce_amd(E) kvm_amd(E) kvm(E) irqbypass(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) sha1_ssse3(E) amdgpu(E) aesni_intel(E) crypto_simd(E) cryptd(E) snd_hda_codec_hdmi(E) rapl(E) drm_exec(E) amdxcp(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) snd_hda_intel(E) cec(E) snd_usb_audio(E) snd_intel_dspcfg(E) snd_usbmidi_lib(E) rc_core(E) snd_hda_codec(E) snd_rawmidi(E) snd_seq_device(E) snd_hda_core(E) mc(E) snd_hwdep(E) drm_ttm_helper(E)
                              [168717.402644]  snd_pcm(E) wmi_bmof(E) ttm(E) snd_timer(E) sp5100_tco(E) pcspkr(E) drm_kms_helper(E) snd(E) k10temp(E) watchdog(E) soundcore(E) ccp(E) cdc_mbim(E) joydev(E) cdc_wdm(E) sg(E) button(E) evdev(E) loop(E) drm(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) zfs(POE) spl(OE) efivarfs(E) sr_mod(E) cdrom(E) hid_generic(E) cdc_ncm(E) cdc_ether(E) usbnet(E) usbhid(E) uas(E) hid(E) usb_storage(E) mii(E) sd_mod(E) ahci(E) nvme(E) ahciem(E) xhci_pci(E) libahci(E) xhci_hcd(E) ixgbe(E) nvme_core(E) libata(E) xfrm_algo(E) t10_pi(E) crc32_pclmul(E) mdio_devres(E) usbcore(E) scsi_mod(E) crc32c_intel(E) crc64_rocksoft(E) i2c_piix4(E) usb_common(E) scsi_common(E) crc64(E) crc_t10dif(E) i2c_algo_bit(E) libphy(E) dca(E) crct10dif_generic(E) mdio(E) crct10dif_pclmul(E) crct10dif_common(E) video(E) wmi(E) gpio_amdpt(E) gpio_generic(E) [last unloaded: igb(E)]
                              [168717.405253] RIP: 0010:igb_rd32+0x7c/0x90 [igb]
                              [168717.408900]  ? igb_rd32+0x7c/0x90 [igb]
                              [168717.409393]  ? igb_rd32+0x7c/0x90 [igb]
                              [168717.410845]  ? igb_rd32+0x7c/0x90 [igb]
                              [168717.411088]  ? igb_rd32+0x7c/0x90 [igb]
                              [168717.411330]  igb_get_invariants_82575+0xa6/0xf00 [igb]
                              [168717.411577]  igb_probe+0x3be/0x1520 [igb]
                              [168717.414240]  ? __pfx_igb_init_module+0x10/0x10 [igb]
                              [168717.744687] igb 0000:05:00.2: PHY reset is blocked due to SOL/IDER session.
                              [168719.474678] igb 0000:05:00.2: The NVM Checksum Is Not Valid
                              [168719.664847] igb: probe of 0000:05:00.2 failed with error -5
                              [168719.665336] igb 0000:05:00.3: enabling device (0000 -> 0002)
                              [168719.665475] igb 0000:05:00.3 0000:05:00.3 (uninitialized): PCIe link lost
                              [168719.666111] igb: Failed to read reg 0x18!
                              [168719.666361] WARNING: CPU: 10 PID: 1106121 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x7c/0x90 [igb]
                              [168719.666623] Modules linked in: igb(E+) vhost_net(E) tun(E) vhost(E) vhost_iotlb(E) macvtap(E) macvlan(E) tap(E) xt_nat(E) xt_tcpudp(E) veth(E) xt_conntrack(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xfrm_user(E) xt_addrtype(E) nft_compat(E) nf_tables(E) libcrc32c(E) crc32c_generic(E) nfnetlink(E) br_netfilter(E) bridge(E) stp(E) llc(E) nvme_fabrics(E) overlay(E) sunrpc(E) binfmt_misc(E) ntb_netdev(E) ntb_transport(E) ntb_split(E) ntb(E) ioatdma(E) ib_core(E) edac_mce_amd(E) kvm_amd(E) kvm(E) irqbypass(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) sha1_ssse3(E) amdgpu(E) aesni_intel(E) crypto_simd(E) cryptd(E) snd_hda_codec_hdmi(E) rapl(E) drm_exec(E) amdxcp(E) drm_buddy(E) gpu_sched(E) drm_suballoc_helper(E) drm_display_helper(E) snd_hda_intel(E) cec(E) snd_usb_audio(E) snd_intel_dspcfg(E) snd_usbmidi_lib(E) rc_core(E) snd_hda_codec(E) snd_rawmidi(E) snd_seq_device(E) snd_hda_core(E) mc(E) snd_hwdep(E) drm_ttm_helper(E)
                              [168719.666648]  snd_pcm(E) wmi_bmof(E) ttm(E) snd_timer(E) sp5100_tco(E) pcspkr(E) drm_kms_helper(E) snd(E) k10temp(E) watchdog(E) soundcore(E) ccp(E) cdc_mbim(E) joydev(E) cdc_wdm(E) sg(E) button(E) evdev(E) loop(E) drm(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) zfs(POE) spl(OE) efivarfs(E) sr_mod(E) cdrom(E) hid_generic(E) cdc_ncm(E) cdc_ether(E) usbnet(E) usbhid(E) uas(E) hid(E) usb_storage(E) mii(E) sd_mod(E) ahci(E) nvme(E) ahciem(E) xhci_pci(E) libahci(E) xhci_hcd(E) ixgbe(E) nvme_core(E) libata(E) xfrm_algo(E) t10_pi(E) crc32_pclmul(E) mdio_devres(E) usbcore(E) scsi_mod(E) crc32c_intel(E) crc64_rocksoft(E) i2c_piix4(E) usb_common(E) scsi_common(E) crc64(E) crc_t10dif(E) i2c_algo_bit(E) libphy(E) dca(E) crct10dif_generic(E) mdio(E) crct10dif_pclmul(E) crct10dif_common(E) video(E) wmi(E) gpio_amdpt(E) gpio_generic(E) [last unloaded: igb(E)]
                              [168719.669264] RIP: 0010:igb_rd32+0x7c/0x90 [igb]
                              [168719.672935]  ? igb_rd32+0x7c/0x90 [igb]
                              [168719.673429]  ? igb_rd32+0x7c/0x90 [igb]
                              [168719.674884]  ? igb_rd32+0x7c/0x90 [igb]
                              [168719.675127]  ? igb_rd32+0x7c/0x90 [igb]
                              [168719.675369]  igb_get_invariants_82575+0xa6/0xf00 [igb]
                              [168719.675616]  igb_probe+0x3be/0x1520 [igb]
                              [168719.678284]  ? __pfx_igb_init_module+0x10/0x10 [igb]
                              [168720.008929] igb 0000:05:00.3: PHY reset is blocked due to SOL/IDER session.
                              [168721.738903] igb 0000:05:00.3: The NVM Checksum Is Not Valid
                              [168721.916849] igb: probe of 0000:05:00.3 failed with error -5
                              

                              My last port of call now is to pull the nic and try it in a difference PCI-e slot
                              The current slot is a gen 3 1x slot so should be fine but i will try it in a gen 5 16x on the same system just in case its causing issues. I doubt it but it is indeed looking like some kind of hardware issue now

                              Regards,
                              Jamie

                              1 Reply Last reply Reply Quote 1
                              • B
                                bigjme93
                                last edited by

                                For anyone interested in the exciting conclusions... it worked fine in the 16x slot for 2 weeks and is still in there now
                                I put an I340-T4 in the 1x slot at the same time and left that running and that has been perfectly fine as well

                                It seems to be an incompatibility between the 1x slot and the I350 specifically but i'm not sure why. In either case, the issue seems to be resolved

                                It may be something specific to AM5 and the I350 in the 1x, or just the I350 and the 1x alone but if anyone else for some reason tries the same, at least you know what symptoms manifest and what the cause was

                                Thanks again for those that helped and commented

                                1 Reply Last reply Reply Quote 1
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.