Weird Behavior with x710-da2 in 2.5.x
-
Specs:
R430 w/ESXI 7u3
Intel X710-DA2 on pcie passthrough
New 2.5.2 Pfsense install w/vm-tools package and vlans(wan/lan) on ixl.When I boot the pfsense vm I get random ping times and performance issues (see imgur below). If I check OR uncheck, "Disable Hardware Checksum Offload" and click save without rebooting it works, ping times are normalized and I get expected performance. If I reboot the router I get sporadic times again. In 2.4.5 I get no performance issues out of the box and "Disable Hardware Checksum Offload" is unchecked by default.
Here is a screen recording demonstrating it working in 2.4.5 and how its broken in 2.5.2: https://imgur.com/a/dksfYI2.
-
Sounds like it's setting/unsetting an option on the NIC.
Run ifconfig -vvv against the NIC before and after the save. What is changing?
Steve
-
Thanks for the reply.
it modified options on the main adding:
<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
And then it added the same options to the vlans:
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
Im gong to start researching these options, but Im not sure what they mean and if its the main or the vlans.
Full output - Before:
ixl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8100b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER> ether f8:f2:1e:87:a6:81 inet6 fe80::faf2:1eff:fe87:a681%ixl0 prefixlen 64 scopeid 0x1 media: Ethernet autoselect (10Gbase-SR <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> plugged: SFP/SFP+/SFP28 10G Base-SR (LC) vendor: FS PN: SFP-10G-T SN: F2030602793 DATE: 2021-06-29 module temperature: 43.93 C Voltage: 3.30 Volts RX: 0.00 mW (-inf dBm) TX: 0.00 mW (-inf dBm) SFF8472 DUMP (0xA0 0..127 range): 03 04 07 10 00 00 00 40 00 0C 00 06 67 00 00 00 03 01 00 00 46 53 20 20 20 20 20 20 20 20 20 20 20 20 20 20 00 00 1B 21 53 46 50 2D 31 30 47 2D 54 20 20 20 20 20 20 20 41 20 20 20 03 52 00 85 00 1A 00 00 46 32 30 33 30 36 30 32 37 39 33 20 20 20 20 20 32 31 30 36 32 39 20 20 68 90 01 6D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 enc0: flags=0<> metric 0 mtu 1536 groups: enc nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet 127.0.0.1 netmask 0xff000000 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> pflog0: flags=100<PROMISC> metric 0 mtu 33160 groups: pflog pfsync0: flags=0<> metric 0 mtu 1500 groups: pfsync ixl0.10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether f8:f2:1e:87:a6:81 inet6 fe80::faf2:1eff:fe87:a681%ixl0.10 prefixlen 64 scopeid 0x6 inet 10.42.0.1 netmask 0xffffff00 broadcast 10.42.0.255 groups: vlan vlan: 10 vlanpcp: 0 parent interface: ixl0 media: Ethernet autoselect (10Gbase-SR <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> ixl0.999: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether f8:f2:1e:87:a6:81 inet6 fe80::faf2:1eff:fe87:a681%ixl0.999 prefixlen 64 scopeid 0x7 inet 192.168.1.47 netmask 0xffffff00 broadcast 192.168.1.255 groups: vlan vlan: 999 vlanpcp: 0 parent interface: ixl0 media: Ethernet autoselect (10Gbase-SR <full-duplex>) status: active nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
After:
ixl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e100bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> ether f8:f2:1e:87:a6:81 inet6 fe80::faf2:1eff:fe87:a681%ixl0 prefixlen 64 scopeid 0x1 media: Ethernet autoselect (10Gbase-SR <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> plugged: SFP/SFP+/SFP28 10G Base-SR (LC) vendor: FS PN: SFP-10G-T SN: F2030602793 DATE: 2021-06-29 module temperature: 44.77 C Voltage: 3.30 Volts RX: 0.00 mW (-inf dBm) TX: 0.00 mW (-inf dBm) SFF8472 DUMP (0xA0 0..127 range): 03 04 07 10 00 00 00 40 00 0C 00 06 67 00 00 00 03 01 00 00 46 53 20 20 20 20 20 20 20 20 20 20 20 20 20 20 00 00 1B 21 53 46 50 2D 31 30 47 2D 54 20 20 20 20 20 20 20 41 20 20 20 03 52 00 85 00 1A 00 00 46 32 30 33 30 36 30 32 37 39 33 20 20 20 20 20 32 31 30 36 32 39 20 20 68 90 01 6D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 enc0: flags=0<> metric 0 mtu 1536 groups: enc nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet 127.0.0.1 netmask 0xff000000 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> pflog0: flags=100<PROMISC> metric 0 mtu 33160 groups: pflog pfsync0: flags=0<> metric 0 mtu 1500 groups: pfsync ixl0.10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6> ether f8:f2:1e:87:a6:81 inet6 fe80::faf2:1eff:fe87:a681%ixl0.10 prefixlen 64 scopeid 0x6 inet 10.42.0.1 netmask 0xffffff00 broadcast 10.42.0.255 groups: vlan vlan: 10 vlanpcp: 0 parent interface: ixl0 media: Ethernet autoselect (10Gbase-SR <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> ixl0.999: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6> ether f8:f2:1e:87:a6:81 inet6 fe80::faf2:1eff:fe87:a681%ixl0.999 prefixlen 64 scopeid 0x7 inet 192.168.1.47 netmask 0xffffff00 broadcast 192.168.1.255 groups: vlan vlan: 999 vlanpcp: 0 parent interface: ixl0 media: Ethernet autoselect (10Gbase-SR <full-duplex>) status: active nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
-
Hmm, but rebooting after having enabled it comes up the same?
And then disabling it again corrects the ping latency?
Steve
-
oops, forgot the reboot. Uhh yea this is super weird.
so after reboot when with invalid latency its the same as before the reboot:
ixl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e100bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> ether f8:f2:1e:87:a6:81 inet6 fe80::faf2:1eff:fe87:a681%ixl0 prefixlen 64 scopeid 0x1 media: Ethernet autoselect (10Gbase-SR <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> plugged: SFP/SFP+/SFP28 10G Base-SR (LC) vendor: FS PN: SFP-10G-T SN: F2030602793 DATE: 2021-06-29 module temperature: 44.77 C Voltage: 3.30 Volts RX: 0.00 mW (-inf dBm) TX: 0.00 mW (-inf dBm) SFF8472 DUMP (0xA0 0..127 range): 03 04 07 10 00 00 00 40 00 0C 00 06 67 00 00 00 03 01 00 00 46 53 20 20 20 20 20 20 20 20 20 20 20 20 20 20 00 00 1B 21 53 46 50 2D 31 30 47 2D 54 20 20 20 20 20 20 20 41 20 20 20 03 52 00 85 00 1A 00 00 46 32 30 33 30 36 30 32 37 39 33 20 20 20 20 20 32 31 30 36 32 39 20 20 68 90 01 6D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 enc0: flags=0<> metric 0 mtu 1536 groups: enc nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet 127.0.0.1 netmask 0xff000000 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> pflog0: flags=100<PROMISC> metric 0 mtu 33160 groups: pflog pfsync0: flags=0<> metric 0 mtu 1500 groups: pfsync ixl0.10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6> ether f8:f2:1e:87:a6:81 inet6 fe80::faf2:1eff:fe87:a681%ixl0.10 prefixlen 64 scopeid 0x6 inet 10.42.0.1 netmask 0xffffff00 broadcast 10.42.0.255 groups: vlan vlan: 10 vlanpcp: 0 parent interface: ixl0 media: Ethernet autoselect (10Gbase-SR <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> ixl0.999: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6> ether f8:f2:1e:87:a6:81 inet6 fe80::faf2:1eff:fe87:a681%ixl0.999 prefixlen 64 scopeid 0x7 inet 192.168.1.47 netmask 0xffffff00 broadcast 192.168.1.255 groups: vlan vlan: 999 vlanpcp: 0 parent interface: ixl0 media: Ethernet autoselect (10Gbase-SR <full-duplex>) status: active nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
And then I check the box again, and it reverts back to the original state but fixes the latency issues....
ixl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8100b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER> ether f8:f2:1e:87:a6:81 inet6 fe80::faf2:1eff:fe87:a681%ixl0 prefixlen 64 scopeid 0x1 media: Ethernet autoselect (10Gbase-SR <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> plugged: SFP/SFP+/SFP28 10G Base-SR (LC) vendor: FS PN: SFP-10G-T SN: F2030602793 DATE: 2021-06-29 module temperature: 44.77 C Voltage: 3.30 Volts RX: 0.00 mW (-inf dBm) TX: 0.00 mW (-inf dBm) SFF8472 DUMP (0xA0 0..127 range): 03 04 07 10 00 00 00 40 00 0C 00 06 67 00 00 00 03 01 00 00 46 53 20 20 20 20 20 20 20 20 20 20 20 20 20 20 00 00 1B 21 53 46 50 2D 31 30 47 2D 54 20 20 20 20 20 20 20 41 20 20 20 03 52 00 85 00 1A 00 00 46 32 30 33 30 36 30 32 37 39 33 20 20 20 20 20 32 31 30 36 32 39 20 20 68 90 01 6D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 enc0: flags=0<> metric 0 mtu 1536 groups: enc nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet 127.0.0.1 netmask 0xff000000 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> pflog0: flags=100<PROMISC> metric 0 mtu 33160 groups: pflog pfsync0: flags=0<> metric 0 mtu 1500 groups: pfsync ixl0.10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether f8:f2:1e:87:a6:81 inet6 fe80::faf2:1eff:fe87:a681%ixl0.10 prefixlen 64 scopeid 0x6 inet 10.42.0.1 netmask 0xffffff00 broadcast 10.42.0.255 groups: vlan vlan: 10 vlanpcp: 0 parent interface: ixl0 media: Ethernet autoselect (10Gbase-SR <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> ixl0.999: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether f8:f2:1e:87:a6:81 inet6 fe80::faf2:1eff:fe87:a681%ixl0.999 prefixlen 64 scopeid 0x7 inet 192.168.1.47 netmask 0xffffff00 broadcast 192.168.1.255 groups: vlan vlan: 999 vlanpcp: 0 parent interface: ixl0 media: Ethernet autoselect (10Gbase-SR <full-duplex>) status: active nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
-
I'd also like to note after further testing, everything works perfectly in pfsense 2.4.5 and I get 800/800mbps+. In 2.5.2 I get these latency issues out of the box, and even when I am able to temporarily 'fix' them by modifying those settings I'm only able to get ~100/100mbps. The vms are the same with the same passthrough settings.
-
Hmm, doesn't sound fixed! Check Status > Interfaces for errors.
You migtht try just ifconfig down, ifconfig up on the interface instead if making a change. See if that brings the latency back to normal too. Or try just resaving it in pfSense without making a change.
Steve
-
Resaving in pfsense doesn't change. down/up does fix latency issue. Still low bandwidth. no errors in status > interfaces. https://imgur.com/a/dYCnYJ3
-
Ah try assigning and enabling ixl0 directly even you set it as type 'none'. It may not be applying the settings to VLANs only. And you may not be seeing the errors.
-
I added ixl0 as opt1 with type set to 'none' - no change, did I understand that correctly?
-
Also unchecking the disable hardware checsum/tcp offloading without rebooting and Im getting 600/400mbps, not quite 2.4.5 speeds but its encouraging.
-
Hmm, and no errors shown on ixl0 either state?
-
no, no errors on all 3 interfaces. Really confusing. There is a note in the boot log - about the nvm version not being expected version that the driver needs to be updated, but the same note is present in 2.4.5. I'm assuming thats just the ixl driver being a bit behind.
There is another user on reddit that reported the same issue with esxi 6.7. Works with other vms but not 2.5.2.
Also a note about pcie speed but its in a pcie 3.0 slot.
ixl0: <Intel(R) Ethernet Controller X710 for 10GbE SFP+ - 2.3.0-k> mem 0xe6000000-0xe6ffffff,0xe7af8000-0xe7afffff irq 19 at device 0.0 on pci4 ixl0: fw 8.84.66032 api 1.14 nvm 8.40 etid 8000af82 oem 20.5120.13 ixl0: The driver for the device detected a newer version of the NVM image than expected. ixl0: Please install the most recent version of the network driver. ixl0: PF-ID[1]: VFs 64, MSI-X 129, VF MSI-X 5, QPs 768, I2C ixl0: Using 1024 TX descriptors and 1024 RX descriptors ixl0: Using 6 RX queues 6 TX queues ixl0: failed to allocate 7 MSI-X vectors, err: 6 ixl0: Using an MSI interrupt ixl0: Ethernet address: f8:f2:1e:87:a6:81 ixl0: Allocating 1 queues for PF LAN VSI; 1 queues active ixl0: PCI Express Bus: Speed 5.0GT/s Unknown ixl0: PCI-Express bandwidth available for this device may be insufficient for optimal performance. ixl0: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate. ixl0: SR-IOV ready ixl0: netmap queues/slots: TX 1/1024, RX 1/1024 ixl0: Link is up, 10 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: False, Flow Control: None ixl0: link state changed to UP
-
The bus speed note is just informational I wouldn't expect any issues because of that.
The firmware version could be a problem. I imagine the mismatch just isn't detected/shown in 2.4.5.
Can you test a 2.6 snapshot?
Steve
-
another good suggestion thanks. Ran 2.6.0.b.20220111.0600 and 2.7.0.a.20220115.0600 - same issue have to up/down the interface, no errors. I was able to get 800/450. No boot complaints about network driver being out of sync.
-
Hmm, something has to be changing but it's hard to see what that could be just by down/up-ing the interface.
You could start checking the sysctl stats for ixl0 but there's a lot to wade through:sysctl dev.ixl.0
You might be able to spot some key difference between the two states.
Steve
-
Hi @ashtonianagain - looking at the logs above:
- If the card is PCI Express 3.0, I'd expect the bus speed to be higher (e.g. 8.0GT/s), but that will depend on which the slot the card is sitting in on the motherboard it - could be that the bandwidth is shared or the slot is only 2.0/2.1 capable.
- I also saw that your system defaulted to using MSI vs. MSI-X. Are you passing the card through to pfSense or going fully virtual? A couple links to check out that may help:
https://forum.netgate.com/topic/158860/pfsense-latency-spikes-in-esxi
https://forum.netgate.com/topic/157688/remove-vmware-msi-x-from-the-pci-blacklist/In particular, this setting might help with some of the performance issues you are seeing:
hw.pci.honor_msi_blacklist=0
Hope this helps.
-
That syctl usually only affects vmxnet, which are on the MSI blacklist by default. But there's no harm setting it.
-
@stephenw10 said in Weird Behavior with x710-da2 in 2.5.x:
That syctl usually only affects vmxnet, which are on the MSI blacklist by default. But there's no harm setting it.
That's a good point - I was just going by the last post in this thread, thinking it might be worth a shot:
https://forum.netgate.com/topic/157688/remove-vmware-msi-x-from-the-pci-blacklist/5
-
@tman222 said in Weird Behavior with x710-da2 in 2.5.x:
hw.pci.honor_msi_blacklist=0
THIS WORKED! thank you guys for your help. After setting this I've been able to get normal pings and expected bandwidth performance.
Now Im not sure if I should enable or disable all the of the offloading for performance - what do you guys think?