Igb 2.4.0 causing crashes
-
@ermal:
Start by sharing what you are doing!
Hardware Specs (both boxes are identical):
-
Intel E3-1245 V2 CPU (3.4GHz) w/ HT disabled
-
16GB DDR3 ECC RAM
-
Intel 530 240GB SSD
-
(12) Intel i350 1Gbe
-
(2) Intel X520 10Gbe
Software Config:
-
iperf tests running across ix1 (have tried both SFP+ Direct Attach and Single-Mode OM3 patch with Intel SR optics directly between boxes, as well as running through a Cisco Nexus 5548UP)
-
Interface has simple any/any firewall rule
-
Snort is NOT running on these interfaces (though it is on others)
Tweaks in /boot/loader.conf.local:
-
kern.ipc.nmbclusters="262144"
-
kern.ipc.nmbjumbop="262144"
-
hw.intr_storm_threshold=10000
Setting MSIX on or off seems to make no difference and neither does setting the number of interface queues (have tried 1, 2, and 4).
Tweaks in System Tunables:
-
kern.ipc.maxsockbuf=16777216
-
net.inet.tcp.recvbuf_inc=524288
-
net.inet.tcp.recvbuf_max=16777216
-
net.inet.tcp.sendbuf_inc=16384
-
net.inet.tcp.sendbuf_max=16777216
Test Results (always +/- 2 Gbit/s, sometimes 1.8, sometimes 2.2):
-
iperf -c & -s = 2Gbit/s
-
iperf -c -d & -s = sum of both directions is 2Gbit/s (typically something like 1.8 and 0.2)
-
iperf -c -P2 & -s = sum of both threads is 2Gbit/s (typically something like 1.3 & 0.7)
-
iperf -c -P4 & -s = sum of all threads is 2Gbit/s (typically +/- 0.5 on each)
All 4 cores have an idle percentage in the 40-50% range even when running at the -P4 test.
-
-
You are sourcing traffic from the same box?
-
I have two identical boxes. For the purpose of testing throughput (before I route all the internal traffic from my servers through them) I have them connected directly to each other.
-
Well your result may vary here from the tool used.
Since there are many cores your program may bounce here and there so i do not think you can achieve stable results as that.What i recommend you for ix devices is
hw.ixgbe.rx_process_limit=1024 #maybe higher or lower depends on testing hw.ixgbe.tx_process_limit=1024 hw.ixgbe.num_queues=#ofcores you have hw.ixgbe.txd=4096 hw.ixgbe.rxd=4096
Though these are very dependant on the workload you are trying to produce.
Also with single stream i am not sure with default parameters of iperf you can achieve 10G :).
Also remove this as well
hw.intr_storm_threshold=10000 -
@ermal:
Give it another shot with new snapshots.
The panics have been resolved and let us know.
Any pointers to what the fix actually was? I didn't see anything in redmine, or freebsd patches. Course I haven't
jumped through the hoopsfollowed through to get access to the tools again. Not sure it's worth it for a non-contributor, but active tester and curious code reader. -
You are overthinking the fix I think. I think the fix he is referring to is that thy reverted the drivers to the older versions.
-
Actually the drivers are the latest found in FreeBSD.
The fix was involved in correcting the handling of the interface in FreeBSD 8 which is a bit of a mix compared to later ones.
-
@ermal:
Well your result may vary here from the tool used.
Since there are many cores your program may bounce here and there so i do not think you can achieve stable results as that.What i recommend you for ix devices is
hw.ixgbe.rx_process_limit=1024 #maybe higher or lower depends on testing hw.ixgbe.tx_process_limit=1024 hw.ixgbe.num_queues=#ofcores you have hw.ixgbe.txd=4096 hw.ixgbe.rxd=4096
Though these are very dependant on the workload you are trying to produce.
Also with single stream i am not sure with default parameters of iperf you can achieve 10G :).
Also remove this as well
hw.intr_storm_threshold=10000Thanks, I'll give those a try tomorrow.
It's not so much the single stream performance I'm worried about. It's more the fact that 2 or 4 threads produce the exact same throughput in aggregate but it doesn't appear that I'm CPU bound.
-
Also check to disable aim(auto interrupt moderation) since that migh limit your throughput as well.
-
I added:
hw.ix.rx_process_limit=1024
hw.ix.tx_process_limit=1024
hw.ix.txd=4096
hw.ix.rxd=4096For a single thread this made zero difference; I still see just about 2 Gbit/s. With 4 threads it now hits somewhere between 3.3-4.0Gbit/s (very inconsistent). Single-threaded bidirectional tests (-c -d & -s) hit about 3Gbit/s and dual-threaded bidirectional tests hit around 4Gbit/s (-c -d -P2 & -s). For some reason trying to use 4 threads on a bidirectional test makes iperf segfault so I can't try that.
Reverting hw.intr_storm_threshold to the default of 1000 made no difference (I changed this in FreeNAS to get past ~2.5Gbit/s, if memory serves, assumed the same would be required here since it's mentioned in the pfSense Wiki Docs).
Disabling AIM with setting dev.ix.0.enable_aim & dev.ix.1.enable_aim to "0" also didn't have any impact.