Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Intel X710-T4L issue - WARNING: queue <num> appears to be hung!

    Scheduled Pinned Locked Moved Hardware
    11 Posts 2 Posters 2.1k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • L Offline
      Louis89
      last edited by

      I've been struggling with an Intel X710-T4L adapter for a few months now and I'm out of ideas, any help would be much appreciated! Thanks in advance!

      System configuration
      • pfSense 24.03
      • pfSense is a QEMU guest with 16 cores assigned, running on Proxmox 8.2.4. I've passed through the entire X710-T4L device to the guest.
      • X710-T4L firmware v9.5
      • ixl driver v1.14.2
      • Two ports are in use on the X710-T4L:
        ixl0 - WAN
        ixl1 - MODEM
        I am one of those AT&T customers who bypasses their router gateway. I use the following authbridge setup between ixl0 (ONT), and ixl1 (ISP modem).
        I had this exact same authbridge setup on a bare metal machine using an Intel I350 adapter running for years without any issues.
      Problem description and logs

      Every so often ixl0 randomly stops passing packets, the system logs are filled with messages like "ixl0: WARNING: queue 0 appears to be hung!". No hung queue messages are shown for the other ixl ports (ixl1 is the only other port in use). I cannot find any other error/warning logs other than the gateway alarm for packet loss on WAN (ixl0), shortly before the hung queue messages start. I cannot find any logs at all containing ixl before the "queue appears to be hung!" messages appear.
      This issue seems to occur randomly. The system is stable for at least 1 day, but sometimes as long as 30 days, or any random amount of time between ~1 and 30 days. The last period of stability lasted 13 days before ixl0 gave out. The system is not under any special load, as far as I can tell, when ixl0 quits. It happens at random times of the day.
      ifconfig up/down has no effect, I've only found a full reboot returns the system to normal.

      Relevant boot logs for ixl...
      This one appeared after I installed the latest version of the ixl driver, I'm assuming nothing to worry about here.

      Jul 17 13:40:31	kernel		Module pci/ixl failed to register: 17
      Jul 17 13:40:31	kernel		module_register: cannot register pci/ixl from kernel; already loaded from if_ixl.ko
      

      The SR-IOV init failure seems odd, but I don't need to use that feature so I ignore it. The 8 queues also seems odd, since this machine has been given 16 cores, more odd, the ixl driver version that comes with pfSense 24.03 assigns 16 queues, the latest Intel driver assigns 8. Regardless, performance seems better on the latest driver, so I'm not too worried about this either.
      The same logs repeat for ixl0 - ixl3

      Jul 17 13:40:31	kernel		ixl0: The device is not iWARP enabled
      Jul 17 13:40:31	kernel		ixl0: Failed to initialize SR-IOV (error=2)
      Jul 17 13:40:31	kernel		ixl0: PCI Express Bus: Speed 8.0GT/s Width x8
      Jul 17 13:40:31	kernel		ixl0: Ethernet address: f0:b2:b9:0d:61:64
      Jul 17 13:40:31	kernel		ixl0: Allocating 8 queues for PF LAN VSI; 8 queues active
      Jul 17 13:40:31	kernel		ixl0: Using MSI-X interrupts with 9 vectors
      Jul 17 13:40:31	kernel		ixl0: PF-ID[0]: VFs 32, MSI-X 129, VF MSI-X 5, QPs 384, MDIO shared
      Jul 17 13:40:31	kernel		ixl0: fw 9.150.77492 api 1.15 nvm 9.50 etid 8000f160 oem 1.270.0
      Jul 17 13:40:31	kernel		ixl0: using 1024 tx descriptors and 1024 rx descriptors
      Jul 17 13:40:31	kernel		ixl0: <Intel(R) Ethernet Connection 700 Series PF Driver, Version - 1.14.2> mem 0x380003000000-0x380003ffffff,0x380004018000-0x38000401ffff irq 16 at device 0.0 on pci1
      

      All other mentions of ixl in the boot log appear to be normal operations.

      Attempts to fix
      • Flashed latest firmware v9.5
      • Installed latest ixl driver v1.14.2
      • Installed the same latest driver on the hypervisor just for the heck of it
      • Under System > Advanced > Networking, I've tested with all hardware offload settings enabled and disabled. No change in behavior.
      • Tried with hardware flow control completely disabled and full flow control enabled. No change in behavior.
      • I've been through the following posts that describe similar symptoms, but these issues seem to already be fixed and I don't see the same logs other than the "queue appears to be hung!" messages.
        -Intel X710 troubles
        -Issues with an Intel x710 and pfsense 2.4.5-p1
        -Bug 221919 - ixl: TX queue hang when using TSO and having a high and mixed network load
      1 Reply Last reply Reply Quote 0
      • stephenw10S Offline
        stephenw10 Netgate Administrator
        last edited by

        Reboot the VM or reboot the host?

        Does it do it if don't pass the hardware through and just use vmxnet?

        Steve

        L 1 Reply Last reply Reply Quote 0
        • L Offline
          Louis89 @stephenw10
          last edited by

          Good question, rebooting the VM alone is not enough, I've been rebooting the Proxmox host to resolve the issue. Thankfully, the other VMs this Proxmox instance is managing are not a high priority. A reboot of the pfSense VM alone results in aq_errors as the adapter is initializing during boot. It never passes packets from the start, no hung queue errors though. Unfortunately, I only tried this once and I didn't collect adequate logs to share. The tenants involved with this system are quite annoyed when it fails at inopportune times, the priority in the moment was to restore service as quickly as possible. Since this happens randomly it may be little or some time before the issue returns so I can try rebooting the pfSense VM alone again and collect the relevant logs to share.
          Noticing this behavior is what lead me to try installing the latest drivers for the adapter on Proxmox too, even though the whole device is being passed through to the VM, it made it smell like it might be a virtualization issue.

          I have not tried vmxnet. I think that is an VMware ESXi only thing. Proxmox uses Virtio, vtnet. Please correct me if I am wrong. I did test vtnet before putting this system into service and found performance to be noticeably worse, lower throughput, higher latency, and significantly higher host resource usage. I was also concerned the requirements of the authbridge wouldn't play nice with vtnet, though I didn't test that.

          Thanks, Louis

          1 Reply Last reply Reply Quote 0
          • stephenw10S Offline
            stephenw10 Netgate Administrator
            last edited by

            Yes, sorry I meant vtnet. Though Promox can also present NICs as vmxnet.

            I assume the card did not stop responding when using the Promox driver though?

            Is it linked at 10G?

            L 1 Reply Last reply Reply Quote 0
            • L Offline
              Louis89 @stephenw10
              last edited by

              @stephenw10

              I assume the card did not stop responding when using the Promox driver though?

              Correct, though it's not confirmation of anything. Since this issue takes days or sometimes weeks to manifest, I may not have worked with vtnet long enough to see any issue.

              Is it linked at 10G?

              WAN (ixl0) is a 1G link, but while testing the X710 with vtnet I was using a 10G link.

              My thoughts on using vtnet... I spent about month trying various hardware configurations and tuning options to see what was easiest to manage and provided the best performance. Tested through pfSense from a machine on LAN to machine on WAN where each machine had 100G links into the switch that pfSense's X710 was attached to with 10G links (WAN and LAN networks partitioned on the switch with VLANs). With vtnet I couldn't push more than 5G through pfSense with 2-3ms latency. From what I've read I think it should have done better than that, but I couldn't figure it out. Passthrough with all hardware offloading enabled seemed to provide the best results, full 10G, <=1ms, ~25% less CPU usage on the host relative to vtnet. I also like that passthrough is a bit simpler configuration. Unfortunately, testing for long term stability wasn't practical before putting the system into production.
              I could try switching to vtnet long term, but I would prefer to see passthrough work correctly. I went with the X710 instead of a cheaper 1G adapter for the option to upgrade the current 1G link someday.

              1 Reply Last reply Reply Quote 0
              • stephenw10S Offline
                stephenw10 Netgate Administrator
                last edited by stephenw10

                Mmm, I would certainly expect best performance with hardware pass-through.

                However you might consider adding a 1G NIC for the WAN if that's what it links at. That queue hung error is specific to the ixl driver so if your is only 1G anyway you could just use a different one.

                I assume you're not seeing the malicious driver event logs? https://redmine.pfsense.org/issues/13003

                L 1 Reply Last reply Reply Quote 0
                • L Offline
                  Louis89 @stephenw10
                  last edited by

                  @stephenw10

                  However you might consider adding a 1G NIC for the WAN if that's what it links at. That queue hung error is specific to the ixl driver so if your is only 1G anyway you could just use a different one.

                  I considered swapping the x710 with an i350 since I had igb working well for years. But there are a few things that complicate this idea for me...

                  1. It feels a bit like kicking the can down the road, it would be nice if I could get ixl working reliably.
                  2. I was previously using an i350 on a bare metal install of pfSense, I've never tested it long term in a Proxmox VM install. I might find some other issue with this setup too.
                  3. In order to setup and test the authbridge, I need to enable ethernet filtering which is only available with a pfSense+ license. Swapping the network controller will change the NDI and require the current license to be transferred, but I've already reconfigured the network controllers once before and had to transfer the license. Since the license can only be transferred once, I would have to get another license too. Or I could switch to CE and try manually configuring ethernet filtering, it sounds like just the GUI components are missing from CE, just annoying extra work.

                  I assume you're not seeing the malicious driver event logs? https://redmine.pfsense.org/issues/13003

                  Correct. I don't see any relevant logs before or after the hung queue messages start.

                  I'll wait until this happens again, try rebooting the VM alone and collect the relevant logs in a post here. Hopefully that will shed a little more light on what's going on.
                  Thanks for your consideration so far.

                  1 Reply Last reply Reply Quote 0
                  • L Offline
                    Louis89
                    last edited by

                    Failed again, but in a new way. No hung queue messages with just an unresponsive ixl adapter in pfSense. Instead, the entire Proxmox machine crashed. The crash report from Proxmox and pfSense both indicate it was caused by an issue with the X710 adapter.
                    This is the first time in 6 months of testing I've seen the problematic X710 adapter cause a crash at the hypervisor level.

                    Following are the crash logs from Proxmox and pfSense. There are no logs from Proxmox or pfSense indicating any sort of issue prior to the crash. pfSense indicates a crash dump time of 11:22:11, and Proxmox indicates a crash dump time of 11:23:07. Indication the issue started at the VM level and spread to the hypervisor?

                    Proxmox crash report:
                    pci 0000:01:00 is the X710 adapter I've been having issues with.

                    Jul 22 11:23:07 pve kernel: vfio-pci 0000:01:00.3: can't update enabled VF BAR1 [??? 0x00000000 flags 0x0]
                    Jul 22 11:23:07 pve kernel: WARNING: CPU: 32 PID: 6028 at drivers/pci/iov.c:966 pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:07 pve kernel: Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter softdog nf_tables bonding tls iavf sunrpc binfmt_misc nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd ipmi_ssif kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd ib_uverbs dax_hmem cxl_acpi acpi_ipmi rapl ast ipmi_si pcspkr cxl_core ib_core ipmi_devintf i2c_algo_bit ccp k10temp ipmi_msghandler joydev input_leds mac_hid vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c rndis_host cdc_ether usbnet mii hid_generic usbmouse usbhid hid xhci_pci ice(OE) xhci_pci_renesas nvme crc32_pclmul xhci_hcd gnss nvme_core bnxt_en i2c_piix4 i40e(OE) nvme_auth
                    Jul 22 11:23:07 pve kernel: CPU: 32 PID: 6028 Comm: kvm Tainted: P           OE      6.8.8-2-pve #1
                    Jul 22 11:23:07 pve kernel: Hardware name: Supermicro AS -2015SV-WTNRT/H13SVW-NT, BIOS 1.1b 12/20/2023
                    Jul 22 11:23:07 pve kernel: RIP: 0010:pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:07 pve kernel: Code: 8b b3 c8 00 00 00 48 8d bb c8 00 00 00 e8 04 8c 1c 00 4d 89 e0 44 89 e9 4c 89 f2 48 89 c6 48 c7 c7 d0 43 40 b9 e8 fc b2 7d ff <0f> 0b e9 4a ff ff ff e8 00 e4 82 00 90 90 90 90 90 90 90 90 90 90
                    Jul 22 11:23:07 pve kernel: RSP: 0018:ff72c6bbc51a7920 EFLAGS: 00010246
                    Jul 22 11:23:07 pve kernel: RAX: 0000000000000000 RBX: ff1f06c7c6735000 RCX: 0000000000000000
                    Jul 22 11:23:07 pve kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
                    Jul 22 11:23:07 pve kernel: RBP: ff72c6bbc51a7960 R08: 0000000000000000 R09: 0000000000000000
                    Jul 22 11:23:07 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ff1f06c7c67355d0
                    Jul 22 11:23:07 pve kernel: R13: 0000000000000001 R14: ff1f06c7c64ecee0 R15: ff1f06c85b0f8000
                    Jul 22 11:23:07 pve kernel: FS:  00007f9b48ded480(0000) GS:ff1f07258a800000(0000) knlGS:0000000000000000
                    Jul 22 11:23:07 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
                    Jul 22 11:23:07 pve kernel: CR2: 00005763393242c8 CR3: 00000002029f8004 CR4: 0000000000f71ef0
                    Jul 22 11:23:07 pve kernel: PKRU: 55555554
                    Jul 22 11:23:07 pve kernel: Call Trace:
                    Jul 22 11:23:07 pve kernel:  <TASK>
                    Jul 22 11:23:07 pve kernel:  ? show_regs+0x6d/0x80
                    Jul 22 11:23:07 pve kernel:  ? __warn+0x89/0x160
                    Jul 22 11:23:07 pve kernel:  ? pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:07 pve kernel:  ? report_bug+0x17e/0x1b0
                    Jul 22 11:23:07 pve kernel:  ? handle_bug+0x46/0x90
                    Jul 22 11:23:07 pve kernel:  ? exc_invalid_op+0x18/0x80
                    Jul 22 11:23:07 pve kernel:  ? asm_exc_invalid_op+0x1b/0x20
                    Jul 22 11:23:07 pve kernel:  ? pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:07 pve kernel:  pci_update_resource+0x27/0x50
                    Jul 22 11:23:07 pve kernel:  pci_restore_iov_state+0xb4/0x150
                    Jul 22 11:23:07 pve kernel:  pci_restore_state.part.0+0x204/0x3a0
                    Jul 22 11:23:07 pve kernel:  pci_dev_restore+0x58/0x80
                    Jul 22 11:23:07 pve kernel:  pci_try_reset_function+0x6a/0xa0
                    Jul 22 11:23:07 pve kernel:  vfio_pci_core_ioctl+0x7bc/0xe80 [vfio_pci_core]
                    Jul 22 11:23:07 pve kernel:  ? kvm_vm_ioctl_irq_line+0x27/0x60 [kvm]
                    Jul 22 11:23:07 pve kernel:  vfio_device_fops_unl_ioctl+0xa8/0x850 [vfio]
                    Jul 22 11:23:07 pve kernel:  ? __pfx_kvm_set_pic_irq+0x10/0x10 [kvm]
                    Jul 22 11:23:07 pve kernel:  __x64_sys_ioctl+0xa0/0xf0
                    Jul 22 11:23:07 pve kernel:  x64_sys_call+0xa68/0x24b0
                    Jul 22 11:23:07 pve kernel:  do_syscall_64+0x81/0x170
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? __mod_memcg_lruvec_state+0x87/0x140
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? release_pages+0x152/0x4c0
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? __mod_memcg_lruvec_state+0x87/0x140
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? __mod_lruvec_state+0x36/0x50
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? __lruvec_stat_mod_folio+0x70/0xc0
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? set_ptes.constprop.0+0x2b/0xb0
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? do_anonymous_page+0x3a8/0x740
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? __pte_offset_map+0x1c/0x1b0
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? __handle_mm_fault+0xc32/0xee0
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? __count_memcg_events+0x6f/0xe0
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? count_memcg_events.constprop.0+0x2a/0x50
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? handle_mm_fault+0xad/0x380
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? do_user_addr_fault+0x343/0x6b0
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? irqentry_exit_to_user_mode+0x7e/0x260
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? irqentry_exit+0x43/0x50
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? exc_page_fault+0x94/0x1b0
                    Jul 22 11:23:07 pve kernel:  entry_SYSCALL_64_after_hwframe+0x78/0x80
                    Jul 22 11:23:07 pve kernel: RIP: 0033:0x7f9b4bb6fc5b
                    Jul 22 11:23:07 pve kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
                    Jul 22 11:23:07 pve kernel: RSP: 002b:00007ffce3c437f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
                    Jul 22 11:23:07 pve kernel: RAX: ffffffffffffffda RBX: 0000562efb229c70 RCX: 00007f9b4bb6fc5b
                    Jul 22 11:23:07 pve kernel: RDX: 0000000000000000 RSI: 0000000000003b6f RDI: 0000000000000051
                    Jul 22 11:23:07 pve kernel: RBP: 0000562efb229cf4 R08: 0000000000000000 R09: 0000000000000000
                    Jul 22 11:23:07 pve kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000562ef985a630
                    Jul 22 11:23:07 pve kernel: R13: 0000562ef74d3e10 R14: 0000562ef96ef790 R15: 00000000000000c8
                    Jul 22 11:23:07 pve kernel:  </TASK>
                    Jul 22 11:23:07 pve kernel: ---[ end trace 0000000000000000 ]---
                    Jul 22 11:23:07 pve kernel: ------------[ cut here ]------------
                    Jul 22 11:23:07 pve kernel: vfio-pci 0000:01:00.3: can't update enabled VF BAR2 [??? 0x00000000 flags 0x0]
                    Jul 22 11:23:07 pve kernel: WARNING: CPU: 32 PID: 6028 at drivers/pci/iov.c:966 pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:07 pve kernel: Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter softdog nf_tables bonding tls iavf sunrpc binfmt_misc nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd ipmi_ssif kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd ib_uverbs dax_hmem cxl_acpi acpi_ipmi rapl ast ipmi_si pcspkr cxl_core ib_core ipmi_devintf i2c_algo_bit ccp k10temp ipmi_msghandler joydev input_leds mac_hid vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c rndis_host cdc_ether usbnet mii hid_generic usbmouse usbhid hid xhci_pci ice(OE) xhci_pci_renesas nvme crc32_pclmul xhci_hcd gnss nvme_core bnxt_en i2c_piix4 i40e(OE) nvme_auth
                    Jul 22 11:23:07 pve kernel: CPU: 32 PID: 6028 Comm: kvm Tainted: P        W  OE      6.8.8-2-pve #1
                    Jul 22 11:23:07 pve kernel: Hardware name: Supermicro AS -2015SV-WTNRT/H13SVW-NT, BIOS 1.1b 12/20/2023
                    Jul 22 11:23:07 pve kernel: RIP: 0010:pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:07 pve kernel: Code: 8b b3 c8 00 00 00 48 8d bb c8 00 00 00 e8 04 8c 1c 00 4d 89 e0 44 89 e9 4c 89 f2 48 89 c6 48 c7 c7 d0 43 40 b9 e8 fc b2 7d ff <0f> 0b e9 4a ff ff ff e8 00 e4 82 00 90 90 90 90 90 90 90 90 90 90
                    Jul 22 11:23:07 pve kernel: RSP: 0018:ff72c6bbc51a7920 EFLAGS: 00010246
                    Jul 22 11:23:07 pve kernel: RAX: 0000000000000000 RBX: ff1f06c7c6735000 RCX: 0000000000000000
                    Jul 22 11:23:07 pve kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
                    Jul 22 11:23:07 pve kernel: RBP: ff72c6bbc51a7960 R08: 0000000000000000 R09: 0000000000000000
                    Jul 22 11:23:07 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ff1f06c7c6735610
                    Jul 22 11:23:07 pve kernel: R13: 0000000000000002 R14: ff1f06c7c64ecee0 R15: ff1f06c85b0f8000
                    Jul 22 11:23:07 pve kernel: FS:  00007f9b48ded480(0000) GS:ff1f07258a800000(0000) knlGS:0000000000000000
                    Jul 22 11:23:07 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
                    Jul 22 11:23:07 pve kernel: CR2: 00005763393242c8 CR3: 00000002029f8004 CR4: 0000000000f71ef0
                    Jul 22 11:23:07 pve kernel: PKRU: 55555554
                    Jul 22 11:23:07 pve kernel: Call Trace:
                    Jul 22 11:23:07 pve kernel:  <TASK>
                    Jul 22 11:23:07 pve kernel:  ? show_regs+0x6d/0x80
                    Jul 22 11:23:07 pve kernel:  ? __warn+0x89/0x160
                    Jul 22 11:23:07 pve kernel:  ? pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:07 pve kernel:  ? report_bug+0x17e/0x1b0
                    Jul 22 11:23:07 pve kernel:  ? handle_bug+0x46/0x90
                    Jul 22 11:23:07 pve kernel:  ? exc_invalid_op+0x18/0x80
                    Jul 22 11:23:07 pve kernel:  ? asm_exc_invalid_op+0x1b/0x20
                    Jul 22 11:23:07 pve kernel:  ? pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:07 pve kernel:  pci_update_resource+0x27/0x50
                    Jul 22 11:23:07 pve kernel:  pci_restore_iov_state+0xb4/0x150
                    Jul 22 11:23:07 pve kernel:  pci_restore_state.part.0+0x204/0x3a0
                    Jul 22 11:23:07 pve kernel:  pci_dev_restore+0x58/0x80
                    Jul 22 11:23:07 pve kernel:  pci_try_reset_function+0x6a/0xa0
                    Jul 22 11:23:07 pve kernel:  vfio_pci_core_ioctl+0x7bc/0xe80 [vfio_pci_core]
                    Jul 22 11:23:07 pve kernel:  ? kvm_vm_ioctl_irq_line+0x27/0x60 [kvm]
                    Jul 22 11:23:07 pve kernel:  vfio_device_fops_unl_ioctl+0xa8/0x850 [vfio]
                    Jul 22 11:23:07 pve kernel:  ? __pfx_kvm_set_pic_irq+0x10/0x10 [kvm]
                    Jul 22 11:23:07 pve kernel:  __x64_sys_ioctl+0xa0/0xf0
                    Jul 22 11:23:07 pve kernel:  x64_sys_call+0xa68/0x24b0
                    Jul 22 11:23:07 pve kernel:  do_syscall_64+0x81/0x170
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? __mod_memcg_lruvec_state+0x87/0x140
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? release_pages+0x152/0x4c0
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? __mod_memcg_lruvec_state+0x87/0x140
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? __mod_lruvec_state+0x36/0x50
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? __lruvec_stat_mod_folio+0x70/0xc0
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? set_ptes.constprop.0+0x2b/0xb0
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? do_anonymous_page+0x3a8/0x740
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? __pte_offset_map+0x1c/0x1b0
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? __handle_mm_fault+0xc32/0xee0
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? __count_memcg_events+0x6f/0xe0
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? count_memcg_events.constprop.0+0x2a/0x50
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? handle_mm_fault+0xad/0x380
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? do_user_addr_fault+0x343/0x6b0
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? irqentry_exit_to_user_mode+0x7e/0x260
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? irqentry_exit+0x43/0x50
                    Jul 22 11:23:07 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:07 pve kernel:  ? exc_page_fault+0x94/0x1b0
                    Jul 22 11:23:07 pve kernel:  entry_SYSCALL_64_after_hwframe+0x78/0x80
                    Jul 22 11:23:07 pve kernel: RIP: 0033:0x7f9b4bb6fc5b
                    Jul 22 11:23:07 pve kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
                    Jul 22 11:23:07 pve kernel: RSP: 002b:00007ffce3c437f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
                    Jul 22 11:23:07 pve kernel: RAX: ffffffffffffffda RBX: 0000562efb229c70 RCX: 00007f9b4bb6fc5b
                    Jul 22 11:23:07 pve kernel: RDX: 0000000000000000 RSI: 0000000000003b6f RDI: 0000000000000051
                    Jul 22 11:23:07 pve kernel: RBP: 0000562efb229cf4 R08: 0000000000000000 R09: 0000000000000000
                    Jul 22 11:23:07 pve kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000562ef985a630
                    Jul 22 11:23:07 pve kernel: R13: 0000562ef74d3e10 R14: 0000562ef96ef790 R15: 00000000000000c8
                    Jul 22 11:23:07 pve kernel:  </TASK>
                    Jul 22 11:23:07 pve kernel: ---[ end trace 0000000000000000 ]---
                    Jul 22 11:23:08 pve kernel: ------------[ cut here ]------------
                    Jul 22 11:23:08 pve kernel: vfio-pci 0000:01:00.3: can't update enabled VF BAR4 [??? 0x00000000 flags 0x0]
                    Jul 22 11:23:08 pve kernel: WARNING: CPU: 32 PID: 6028 at drivers/pci/iov.c:966 pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:08 pve kernel: Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter softdog nf_tables bonding tls iavf sunrpc binfmt_misc nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd ipmi_ssif kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd ib_uverbs dax_hmem cxl_acpi acpi_ipmi rapl ast ipmi_si pcspkr cxl_core ib_core ipmi_devintf i2c_algo_bit ccp k10temp ipmi_msghandler joydev input_leds mac_hid vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c rndis_host cdc_ether usbnet mii hid_generic usbmouse usbhid hid xhci_pci ice(OE) xhci_pci_renesas nvme crc32_pclmul xhci_hcd gnss nvme_core bnxt_en i2c_piix4 i40e(OE) nvme_auth
                    Jul 22 11:23:08 pve kernel: CPU: 32 PID: 6028 Comm: kvm Tainted: P        W  OE      6.8.8-2-pve #1
                    Jul 22 11:23:08 pve kernel: Hardware name: Supermicro AS -2015SV-WTNRT/H13SVW-NT, BIOS 1.1b 12/20/2023
                    Jul 22 11:23:08 pve kernel: RIP: 0010:pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:08 pve kernel: Code: 8b b3 c8 00 00 00 48 8d bb c8 00 00 00 e8 04 8c 1c 00 4d 89 e0 44 89 e9 4c 89 f2 48 89 c6 48 c7 c7 d0 43 40 b9 e8 fc b2 7d ff <0f> 0b e9 4a ff ff ff e8 00 e4 82 00 90 90 90 90 90 90 90 90 90 90
                    Jul 22 11:23:08 pve kernel: RSP: 0018:ff72c6bbc51a7920 EFLAGS: 00010246
                    Jul 22 11:23:08 pve kernel: RAX: 0000000000000000 RBX: ff1f06c7c6735000 RCX: 0000000000000000
                    Jul 22 11:23:08 pve kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
                    Jul 22 11:23:08 pve kernel: RBP: ff72c6bbc51a7960 R08: 0000000000000000 R09: 0000000000000000
                    Jul 22 11:23:08 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ff1f06c7c6735690
                    Jul 22 11:23:08 pve kernel: R13: 0000000000000004 R14: ff1f06c7c64ecee0 R15: ff1f06c85b0f8000
                    Jul 22 11:23:08 pve kernel: FS:  00007f9b48ded480(0000) GS:ff1f07258a800000(0000) knlGS:0000000000000000
                    Jul 22 11:23:08 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
                    Jul 22 11:23:08 pve kernel: CR2: 00005763393242c8 CR3: 00000002029f8004 CR4: 0000000000f71ef0
                    Jul 22 11:23:08 pve kernel: PKRU: 55555554
                    Jul 22 11:23:08 pve kernel: Call Trace:
                    Jul 22 11:23:08 pve kernel:  <TASK>
                    Jul 22 11:23:08 pve kernel:  ? show_regs+0x6d/0x80
                    Jul 22 11:23:08 pve kernel:  ? __warn+0x89/0x160
                    Jul 22 11:23:08 pve kernel:  ? pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:08 pve kernel:  ? report_bug+0x17e/0x1b0
                    Jul 22 11:23:08 pve kernel:  ? handle_bug+0x46/0x90
                    Jul 22 11:23:08 pve kernel:  ? exc_invalid_op+0x18/0x80
                    Jul 22 11:23:08 pve kernel:  ? asm_exc_invalid_op+0x1b/0x20
                    Jul 22 11:23:08 pve kernel:  ? pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:08 pve kernel:  pci_update_resource+0x27/0x50
                    Jul 22 11:23:08 pve kernel:  pci_restore_iov_state+0xb4/0x150
                    Jul 22 11:23:08 pve kernel:  pci_restore_state.part.0+0x204/0x3a0
                    Jul 22 11:23:08 pve kernel:  pci_dev_restore+0x58/0x80
                    Jul 22 11:23:08 pve kernel:  pci_try_reset_function+0x6a/0xa0
                    Jul 22 11:23:08 pve kernel:  vfio_pci_core_ioctl+0x7bc/0xe80 [vfio_pci_core]
                    Jul 22 11:23:08 pve kernel:  ? kvm_vm_ioctl_irq_line+0x27/0x60 [kvm]
                    Jul 22 11:23:08 pve kernel:  vfio_device_fops_unl_ioctl+0xa8/0x850 [vfio]
                    Jul 22 11:23:08 pve kernel:  ? __pfx_kvm_set_pic_irq+0x10/0x10 [kvm]
                    Jul 22 11:23:08 pve kernel:  __x64_sys_ioctl+0xa0/0xf0
                    Jul 22 11:23:08 pve kernel:  x64_sys_call+0xa68/0x24b0
                    Jul 22 11:23:08 pve kernel:  do_syscall_64+0x81/0x170
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? __mod_memcg_lruvec_state+0x87/0x140
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? release_pages+0x152/0x4c0
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? __mod_memcg_lruvec_state+0x87/0x140
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? __mod_lruvec_state+0x36/0x50
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? __lruvec_stat_mod_folio+0x70/0xc0
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? set_ptes.constprop.0+0x2b/0xb0
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? do_anonymous_page+0x3a8/0x740
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? __pte_offset_map+0x1c/0x1b0
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? __handle_mm_fault+0xc32/0xee0
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? __count_memcg_events+0x6f/0xe0
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? count_memcg_events.constprop.0+0x2a/0x50
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? handle_mm_fault+0xad/0x380
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? do_user_addr_fault+0x343/0x6b0
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? irqentry_exit_to_user_mode+0x7e/0x260
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? irqentry_exit+0x43/0x50
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? exc_page_fault+0x94/0x1b0
                    Jul 22 11:23:08 pve kernel:  entry_SYSCALL_64_after_hwframe+0x78/0x80
                    Jul 22 11:23:08 pve kernel: RIP: 0033:0x7f9b4bb6fc5b
                    Jul 22 11:23:08 pve kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
                    Jul 22 11:23:08 pve kernel: RSP: 002b:00007ffce3c437f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
                    Jul 22 11:23:08 pve kernel: RAX: ffffffffffffffda RBX: 0000562efb229c70 RCX: 00007f9b4bb6fc5b
                    Jul 22 11:23:08 pve kernel: RDX: 0000000000000000 RSI: 0000000000003b6f RDI: 0000000000000051
                    Jul 22 11:23:08 pve kernel: RBP: 0000562efb229cf4 R08: 0000000000000000 R09: 0000000000000000
                    Jul 22 11:23:08 pve kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000562ef985a630
                    Jul 22 11:23:08 pve kernel: R13: 0000562ef74d3e10 R14: 0000562ef96ef790 R15: 00000000000000c8
                    Jul 22 11:23:08 pve kernel:  </TASK>
                    Jul 22 11:23:08 pve kernel: ---[ end trace 0000000000000000 ]---
                    Jul 22 11:23:08 pve kernel: ------------[ cut here ]------------
                    Jul 22 11:23:08 pve kernel: vfio-pci 0000:01:00.3: can't update enabled VF BAR5 [??? 0x00000000 flags 0x0]
                    Jul 22 11:23:08 pve kernel: WARNING: CPU: 32 PID: 6028 at drivers/pci/iov.c:966 pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:08 pve kernel: Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter softdog nf_tables bonding tls iavf sunrpc binfmt_misc nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd ipmi_ssif kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd ib_uverbs dax_hmem cxl_acpi acpi_ipmi rapl ast ipmi_si pcspkr cxl_core ib_core ipmi_devintf i2c_algo_bit ccp k10temp ipmi_msghandler joydev input_leds mac_hid vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c rndis_host cdc_ether usbnet mii hid_generic usbmouse usbhid hid xhci_pci ice(OE) xhci_pci_renesas nvme crc32_pclmul xhci_hcd gnss nvme_core bnxt_en i2c_piix4 i40e(OE) nvme_auth
                    Jul 22 11:23:08 pve kernel: CPU: 32 PID: 6028 Comm: kvm Tainted: P        W  OE      6.8.8-2-pve #1
                    Jul 22 11:23:08 pve kernel: Hardware name: Supermicro AS -2015SV-WTNRT/H13SVW-NT, BIOS 1.1b 12/20/2023
                    Jul 22 11:23:08 pve kernel: RIP: 0010:pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:08 pve kernel: Code: 8b b3 c8 00 00 00 48 8d bb c8 00 00 00 e8 04 8c 1c 00 4d 89 e0 44 89 e9 4c 89 f2 48 89 c6 48 c7 c7 d0 43 40 b9 e8 fc b2 7d ff <0f> 0b e9 4a ff ff ff e8 00 e4 82 00 90 90 90 90 90 90 90 90 90 90
                    Jul 22 11:23:08 pve kernel: RSP: 0018:ff72c6bbc51a7920 EFLAGS: 00010246
                    Jul 22 11:23:08 pve kernel: RAX: 0000000000000000 RBX: ff1f06c7c6735000 RCX: 0000000000000000
                    Jul 22 11:23:08 pve kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
                    Jul 22 11:23:08 pve kernel: RBP: ff72c6bbc51a7960 R08: 0000000000000000 R09: 0000000000000000
                    Jul 22 11:23:08 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ff1f06c7c67356d0
                    Jul 22 11:23:08 pve kernel: R13: 0000000000000005 R14: ff1f06c7c64ecee0 R15: ff1f06c85b0f8000
                    Jul 22 11:23:08 pve kernel: FS:  00007f9b48ded480(0000) GS:ff1f07258a800000(0000) knlGS:0000000000000000
                    Jul 22 11:23:08 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
                    Jul 22 11:23:08 pve kernel: CR2: 00005763393242c8 CR3: 00000002029f8004 CR4: 0000000000f71ef0
                    Jul 22 11:23:08 pve kernel: PKRU: 55555554
                    Jul 22 11:23:08 pve kernel: Call Trace:
                    Jul 22 11:23:08 pve kernel:  <TASK>
                    Jul 22 11:23:08 pve kernel:  ? show_regs+0x6d/0x80
                    Jul 22 11:23:08 pve kernel:  ? __warn+0x89/0x160
                    Jul 22 11:23:08 pve kernel:  ? pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:08 pve kernel:  ? report_bug+0x17e/0x1b0
                    Jul 22 11:23:08 pve kernel:  ? handle_bug+0x46/0x90
                    Jul 22 11:23:08 pve kernel:  ? exc_invalid_op+0x18/0x80
                    Jul 22 11:23:08 pve kernel:  ? asm_exc_invalid_op+0x1b/0x20
                    Jul 22 11:23:08 pve kernel:  ? pci_iov_update_resource+0x144/0x150
                    Jul 22 11:23:08 pve kernel:  pci_update_resource+0x27/0x50
                    Jul 22 11:23:08 pve kernel:  pci_restore_iov_state+0xb4/0x150
                    Jul 22 11:23:08 pve kernel:  pci_restore_state.part.0+0x204/0x3a0
                    Jul 22 11:23:08 pve kernel:  pci_dev_restore+0x58/0x80
                    Jul 22 11:23:08 pve kernel:  pci_try_reset_function+0x6a/0xa0
                    Jul 22 11:23:08 pve kernel:  vfio_pci_core_ioctl+0x7bc/0xe80 [vfio_pci_core]
                    Jul 22 11:23:08 pve kernel:  ? kvm_vm_ioctl_irq_line+0x27/0x60 [kvm]
                    Jul 22 11:23:08 pve kernel:  vfio_device_fops_unl_ioctl+0xa8/0x850 [vfio]
                    Jul 22 11:23:08 pve kernel:  ? __pfx_kvm_set_pic_irq+0x10/0x10 [kvm]
                    Jul 22 11:23:08 pve kernel:  __x64_sys_ioctl+0xa0/0xf0
                    Jul 22 11:23:08 pve kernel:  x64_sys_call+0xa68/0x24b0
                    Jul 22 11:23:08 pve kernel:  do_syscall_64+0x81/0x170
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? __mod_memcg_lruvec_state+0x87/0x140
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? release_pages+0x152/0x4c0
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? __mod_memcg_lruvec_state+0x87/0x140
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? __mod_lruvec_state+0x36/0x50
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? __lruvec_stat_mod_folio+0x70/0xc0
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? set_ptes.constprop.0+0x2b/0xb0
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? do_anonymous_page+0x3a8/0x740
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? __pte_offset_map+0x1c/0x1b0
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? __handle_mm_fault+0xc32/0xee0
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? __count_memcg_events+0x6f/0xe0
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? count_memcg_events.constprop.0+0x2a/0x50
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? handle_mm_fault+0xad/0x380
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? do_user_addr_fault+0x343/0x6b0
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? irqentry_exit_to_user_mode+0x7e/0x260
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? irqentry_exit+0x43/0x50
                    Jul 22 11:23:08 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
                    Jul 22 11:23:08 pve kernel:  ? exc_page_fault+0x94/0x1b0
                    Jul 22 11:23:08 pve kernel:  entry_SYSCALL_64_after_hwframe+0x78/0x80
                    Jul 22 11:23:08 pve kernel: RIP: 0033:0x7f9b4bb6fc5b
                    Jul 22 11:23:08 pve kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
                    Jul 22 11:23:08 pve kernel: RSP: 002b:00007ffce3c437f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
                    Jul 22 11:23:08 pve kernel: RAX: ffffffffffffffda RBX: 0000562efb229c70 RCX: 00007f9b4bb6fc5b
                    Jul 22 11:23:08 pve kernel: RDX: 0000000000000000 RSI: 0000000000003b6f RDI: 0000000000000051
                    Jul 22 11:23:08 pve kernel: RBP: 0000562efb229cf4 R08: 0000000000000000 R09: 0000000000000000
                    Jul 22 11:23:08 pve kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000562ef985a630
                    Jul 22 11:23:08 pve kernel: R13: 0000562ef74d3e10 R14: 0000562ef96ef790 R15: 00000000000000c8
                    Jul 22 11:23:08 pve kernel:  </TASK>
                    Jul 22 11:23:08 pve kernel: ---[ end trace 0000000000000000 ]---
                    Jul 22 11:23:10 pve pvestatd[5932]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - got timeout
                    Jul 22 11:23:10 pve pvestatd[5932]: status update time (8.072 seconds)
                    Jul 22 11:23:16 pve kernel: vfio-pci 0000:01:00.2: vfio_bar_restore: reset recovery - restoring BARs
                    

                    pfSense crash report:
                    msgbuf.txt

                    ixl0: Reset Requested! (EMPR)
                    ixl0: ECC Error detected!
                    ixl0: HMC Error detected!
                    ixl0: INFO 0xffffffff
                    ixl0: DATA 0x00000000
                    ixl0: Rebuilding driver state...
                    
                    
                    Fatal trap 12: page fault while in kernel mode
                    cpuid = 11; apic id = 0b
                    fault virtual address	= 0x458
                    fault code		= supervisor read data, page not present
                    instruction pointer	= 0x20:0xffffffff80ccd060
                    stack pointer	        = 0x28:0xfffffe00dbb34d90
                    frame pointer	        = 0x28:0xfffffe00dbb34e10
                    code segment		= base 0x0, limit 0xfffff, type 0x1b
                    			= DPL 0, pres 1, long 1, def32 0, gran 1
                    processor eflags	= interrupt enabled, resume, IOPL = 0
                    current process		= 0 (ixl0 (que 2))
                    rdi: fffffe00dc1c0620 rsi: 0000000000000004 rdx: ffffffff835d784b
                    rcx: fffff800023a5740  r8: fffff800023a5c60  r9: fffffe00dbb35000
                    rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe00dbb34e10
                    r10: 00000000000001f4 r11: 0000000082859739 r12: fffffe00dbb34da8
                    r13: 0000000000000000 r14: 0000000000000000 r15: fffffe00dc1c0620
                    trap number		= 12
                    panic: page fault
                    cpuid = 11
                    time = 1721665331
                    KDB: enter: panic
                    

                    ddb.txt

                    1 Reply Last reply Reply Quote 1
                    • L Offline
                      Louis89
                      last edited by

                      Another x710 induced pfSense kernel panic, but this time the hypervisor did not crash.

                      In 6 months of testing...
                      This is the first time I've observed two failures in less than 24 hours, and the first time there are "Malicious Driver Detection" messages in the logs.
                      This is only the second time I've observed the x710 cause a kernel panic. All previous failures only resulted in an unresponsive x710 with "queue <num> appears to be hung!" messages.
                      I've made no notable changes recently that may have caused this new failure pattern.

                      Rebooting only the VM fixed it this time.

                      There are no notable log messages on the hypervisor.
                      Relevant pfSense logs:

                      ixl0: Malicious Driver Detection event 2 on TX queue 0, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 3, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 2 on TX queue 1, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 0, pf number 0 (PF-0)
                      ixl0: RX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 2 on TX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: TX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 7, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 2 on TX queue 4, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 31 on TX queue 4095, pf number 15 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 7, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 0, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 31 on TX queue 4095, pf number 15 (PF-0)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: Malicious Driver Detection event 31 on TX queue 4095, pf number 15 (PF-0)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: TX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 31 on TX queue 4095, pf number 15 (PF-0)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: RX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: Malicious Driver Detection event 31 on TX queue 4095, pf number 15 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: TX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 1 on RX queue 7, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: TX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 7, pf number 0 (PF-0)
                      ixl0: TX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: RX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 31 on TX queue 4095, pf number 15 (PF-0)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 7, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: TX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: RX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: RX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 1 on RX queue 7, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 7, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: TX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: RX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl1: TX Malicious Driver Detection event (unknown)
                      ixl1: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-1)
                      ixl1: TX Malicious Driver Detection event (unknown)
                      ixl1: RX Malicious Driver Detection event (unknown)
                      ixl1: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-1)
                      ixl1: RX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 31 on TX queue 4095, pf number 15 (PF-0)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 0, pf number 0 (PF-0)
                      ixl0: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-0)
                      ixl1: RX Malicious Driver Detection event (unknown)
                      ixl1: Malicious Driver Detection event 31 on TX queue 4095, pf number 15 (PF-1)
                      ixl1: Malicious Driver Detection event 1 on RX queue 2, pf number 0 (PF-1)
                      ixl1: RX Malicious Driver Detection event (unknown)
                      ixl1: RX Malicious Driver Detection event (unknown)
                      ixl1: Malicious Driver Detection event 31 on TX queue 4095, pf number 15 (PF-1)
                      ixl1: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-1)
                      ixl1: RX Malicious Driver Detection event (unknown)
                      ixl1: TX Malicious Driver Detection event (unknown)
                      ixl1: RX Malicious Driver Detection event (unknown)
                      ixl0: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-0)
                      ixl0: Reset Requested! (POR)
                      ixl0: ECC Error detected!
                      ixl0: HMC Error detected!
                      ixl0: INFO 0xffffffff
                      ixl0: DATA 0xffffffff
                      ixl0: PCI Exception detected!
                      ixl0: Reset Requested! (POR)
                      ixl0: ECC Error detected!
                      ixl1: RX Malicious Driver Detection event (unknown)
                      ixl1: TX Malicious Driver Detection event (unknown)
                      ixl1: RX Malicious Driver Detection event (unknown)
                      ixl0: Rebuilding driver state...
                      ixl1: TX Malicious Driver Detection event (unknown)
                      ixl1: RX Malicious Driver Detection event (unknown)
                      ixl1: TX Malicious Driver Detection event (unknown)
                      ixl1: RX Malicious Driver Detection event (unknown)
                      ixl1: Malicious Driver Detection event 255 on RX queue 16383, pf number 255 (PF-1)
                      ixl1: Reset Requested! (POR)
                      ixl1: ECC Error detected!
                      ixl1: HMC Error detected!
                      ixl1: INFO 0xffffffff
                      ixl1: DATA 0xffffffff
                      ixl1: PCI Exception detected!
                      ixl1: TX queue 1 still enabled!
                      ixl1: TX queue 4 still enabled!
                      ixl1: TX queue 7 still enabled!
                      ixl0: capability discovery failed; status I40E_ERR_ADMIN_QUEUE_FULL, error OK
                      ixl0: ixl_get_hw_capabilities failed: 19
                      ixl0: Reload the driver to recover
                      ixl0: Admin Queue is down; resetting...
                      ixl0: capability discovery failed; status I40E_ERR_ADMIN_QUEUE_CRITICAL_ERROR, error OK
                      ixl0: init: Error retrieving HW capabilities; status code 19
                      ixl0: i40e_aq_get_vsi_params() failed, error -66 aq_error 0
                      ixl0: initialize vsi failed!!
                      ixl0: Malicious Driver Detection event 1 on RX queue 385, pf number 0 (PF-0)
                      ixl1: Rebuilding driver state...
                      ixl1: PF-ID[1]: VFs 32, MSI-X 129, VF MSI-X 5, QPs 384, MDIO shared
                      ixl1: Allocating 8 queues for PF LAN VSI; 8 queues active
                      ixl1: Rebuilding driver state done.
                      
                      
                      Fatal trap 12: page fault while in kernel mode
                      cpuid = 4; apic id = 04
                      fault virtual address	= 0x458
                      fault code		= supervisor read data, page not present
                      instruction pointer	= 0x20:0xffffffff80ccd060
                      stack pointer	        = 0x28:0xfffffe00dbb34d90
                      frame pointer	        = 0x28:0xfffffe00dbb34e10
                      code segment		= base 0x0, limit 0xfffff, type 0x1b
                      			= DPL 0, pres 1, long 1, def32 0, gran 1
                      processor eflags	= interrupt enabled, resume, IOPL = 0
                      current process		= 0 (ixl0 (que 2))
                      rdi: fffffe00dc1c0620 rsi: 0000000000000004 rdx: ffffffff835d784b
                      rcx: fffff800023c0740  r8: fffff800023c0c60  r9: fffffe00dbb35000
                      rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe00dbb34e10
                      r10: 00000000000001f4 r11: 00000000803ad992 r12: fffffe00dbb34da8
                      r13: 0000000000000000 r14: 0000000000000000 r15: fffffe00dc1c0620
                      trap number		= 12
                      panic: page fault
                      cpuid = 4
                      time = 1721704766
                      KDB: enter: panic
                      

                      At this point I've switched to a different machine running pfSense until I have time to evaluate alternate adapters. I suspect this X710 is not behaving correctly because either the latest driver/firmware is still bugged, likely in combination with my virtualization and authbridge setup, or I got really lucky with a faulty card. Unfortunately, I don't have multiple X710s to test with.
                      If no new ideas come up, I'll try to remember to post back here in some months when I've settled on an adapter that seems stable long term with this setup.

                      1 Reply Last reply Reply Quote 1
                      • stephenw10S Offline
                        stephenw10 Netgate Administrator
                        last edited by

                        Hmm, painful!

                        It's not a setup I can test here. Potentially it could be a bad NIC.

                        If you have no choice we can make exceptions for transferring the NDI. If you have to replace a NIC because of hardware failure for example.

                        1 Reply Last reply Reply Quote 1
                        • L Offline
                          Louis89
                          last edited by

                          Quick follow up... I swapped the X710-T4L for an X550-T2 a little over 3 months ago and the system has been rock solid ever since. No problems at all. It seems it was either a bad NIC or a driver problem, unfortunately I'm not planning to test with a different X710 any time soon.

                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.