Igb 2.4.0 causing crashes
-
Can confirm crash with i350 card, iperf < 100mbps traffic to another host
2.1.1-PRERELEASE (amd64)
built on Tue Feb 11 22:10:25 EST 2014
FreeBSD 8.3-RELEASE-p14default config options, 1 interface defined and in use (igb0)
Platform IBM x3650m3
Submitted a crash report via gui
i350 card is stable to port saturation on all ports under FreeBSD 10
-
I haven't been able to reproduce the lock up again doing iperf tests on 2.1.1 PRERELEASE so I am unsure if it was related to this issue or not. I might have a different issue or not :). I don't have a box with 1.2.2 PRERELEASE using an igb driver in any kind of real environment yet.
I did just notice a commit to pfsense-tools though that seems to indicate they are going back to the old drivers. I am not 100% sure though that the commit means that as I don't know the internal build stuff but it looks like it to me.
https://github.com/pfsense/pfsense-tools/commit/fde16db5dd82641544017d2a2b2b1e04d5332ec4
builder_scripts/conf/patchlist/patches.RELENG_8_3:
"Disable the ndrivers from head they seem to break things more than help in general"
-~~inet_head.tgz~
-~sys/conf~files.8.3.diff~EDIT: I didn't check the 2.1.1 forum to notice the sticky… it has been reverted.
https://forum.pfsense.org/index.php/topic,72763.0.html -
Give it another shot with new snapshots.
The panics have been resolved and let us know.
-
EDIT: I just realized you were probably not talking about my lockup as that seemed to be a different issue…
I never was able to reproduce this specific lockup (not crash). The only crash issue I have is related to disabling carp on the master while under load which happens even with pfsense 2.0.3. It happens to 2 different identical hardware installs. Since it happens on 2.0.3 too I didn't bring it up here. The crash is with the reverted igb drivers (like in current snapshots) and not the backported drivers which were pulled back out somewhat recently.
https://forum.pfsense.org/index.php?topic=72965.0
-
@ermal:
Give it another shot with new snapshots.
The panics have been resolved and let us know.
Is it in the current snapshots? I can install Friday and give it a test. Maybe Thursday.
-
Yes it is in the latest ones.
-
@ermal:
Yes it is in the latest ones.
I'm not getting any snapshots newer than what I'm on (Fri Mar 7 18:35:38 EST 2014).
-
Yes, correct.
Snapshots will be soon online again as jimp posted here: https://forum.pfsense.org/index.php?topic=72763.msg401986#msg401986 -
@ermal:
Yes it is in the latest ones.
I'm not getting any snapshots newer than what I'm on (Fri Mar 7 18:35:38 EST 2014).
Mar 12 snapshots are available
-
So the good news is that it's not crashing any more.
The bad news is that I still seem to be hitting a pretty hard wall at ~2.1Gbit/s across 10Gb ix interfaces.
-
You need to do tuning for that.
It depends on traffic amount you are generating, what you are using to generate traffic etc… -
@ermal:
You need to do tuning for that.
It depends on traffic amount you are generating, what you are using to generate traffic etc…I've applied the same tweaks I had done to my (now defunct) FreeNAS servers with no luck. Those boxes had slower CPUs and were able to hit ~5-6Gbit/s between each other. Testing is with iperf.
If you have any specific tweaks in mind I'll definitely give them a go.
-
Start by sharing what you are doing!
-
You might contemplate to check if you are CPU-bound or if something else is the issue.
top -SH ```usually gives an idea where the CPU time goes.
-
@ermal:
Start by sharing what you are doing!
Hardware Specs (both boxes are identical):
-
Intel E3-1245 V2 CPU (3.4GHz) w/ HT disabled
-
16GB DDR3 ECC RAM
-
Intel 530 240GB SSD
-
(12) Intel i350 1Gbe
-
(2) Intel X520 10Gbe
Software Config:
-
iperf tests running across ix1 (have tried both SFP+ Direct Attach and Single-Mode OM3 patch with Intel SR optics directly between boxes, as well as running through a Cisco Nexus 5548UP)
-
Interface has simple any/any firewall rule
-
Snort is NOT running on these interfaces (though it is on others)
Tweaks in /boot/loader.conf.local:
-
kern.ipc.nmbclusters="262144"
-
kern.ipc.nmbjumbop="262144"
-
hw.intr_storm_threshold=10000
Setting MSIX on or off seems to make no difference and neither does setting the number of interface queues (have tried 1, 2, and 4).
Tweaks in System Tunables:
-
kern.ipc.maxsockbuf=16777216
-
net.inet.tcp.recvbuf_inc=524288
-
net.inet.tcp.recvbuf_max=16777216
-
net.inet.tcp.sendbuf_inc=16384
-
net.inet.tcp.sendbuf_max=16777216
Test Results (always +/- 2 Gbit/s, sometimes 1.8, sometimes 2.2):
-
iperf -c & -s = 2Gbit/s
-
iperf -c -d & -s = sum of both directions is 2Gbit/s (typically something like 1.8 and 0.2)
-
iperf -c -P2 & -s = sum of both threads is 2Gbit/s (typically something like 1.3 & 0.7)
-
iperf -c -P4 & -s = sum of all threads is 2Gbit/s (typically +/- 0.5 on each)
All 4 cores have an idle percentage in the 40-50% range even when running at the -P4 test.
-
-
You are sourcing traffic from the same box?
-
I have two identical boxes. For the purpose of testing throughput (before I route all the internal traffic from my servers through them) I have them connected directly to each other.
-
Well your result may vary here from the tool used.
Since there are many cores your program may bounce here and there so i do not think you can achieve stable results as that.What i recommend you for ix devices is
hw.ixgbe.rx_process_limit=1024 #maybe higher or lower depends on testing hw.ixgbe.tx_process_limit=1024 hw.ixgbe.num_queues=#ofcores you have hw.ixgbe.txd=4096 hw.ixgbe.rxd=4096
Though these are very dependant on the workload you are trying to produce.
Also with single stream i am not sure with default parameters of iperf you can achieve 10G :).
Also remove this as well
hw.intr_storm_threshold=10000 -
@ermal:
Give it another shot with new snapshots.
The panics have been resolved and let us know.
Any pointers to what the fix actually was? I didn't see anything in redmine, or freebsd patches. Course I haven't
jumped through the hoopsfollowed through to get access to the tools again. Not sure it's worth it for a non-contributor, but active tester and curious code reader. -
You are overthinking the fix I think. I think the fix he is referring to is that thy reverted the drivers to the older versions.