Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Igb 2.4.0 causing crashes

    Scheduled Pinned Locked Moved 2.1.1 Snapshot Feedback and Problems - RETIRED
    32 Posts 9 Posters 10.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • N
      nastraga
      last edited by

      Can confirm crash with i350 card, iperf < 100mbps traffic to another host

      2.1.1-PRERELEASE (amd64)
      built on Tue Feb 11 22:10:25 EST 2014
      FreeBSD 8.3-RELEASE-p14

      default config options, 1 interface defined and in use (igb0)

      Platform IBM x3650m3

      Submitted a crash report via gui

      i350 card is stable to port saturation on all ports under FreeBSD 10

      1 Reply Last reply Reply Quote 0
      • A
        adam65535
        last edited by

        I haven't been able to reproduce the lock up again doing iperf tests on 2.1.1 PRERELEASE so I am unsure if it was related to this issue or not.  I might have a different issue or not :).  I don't have a box with 1.2.2 PRERELEASE using an igb driver in any kind of real environment yet.

        I did just notice a commit to pfsense-tools though that seems to indicate they are going back to the old drivers.  I am not 100% sure though that the commit means that as I don't know the internal build stuff but it looks like it to me.

        https://github.com/pfsense/pfsense-tools/commit/fde16db5dd82641544017d2a2b2b1e04d5332ec4

        builder_scripts/conf/patchlist/patches.RELENG_8_3:
        "Disable the ndrivers from head they seem to break things more than help in general"
        -~~inet_head.tgz~
        -~sys/conf~files.8.3.diff~

        EDIT: I didn't check the 2.1.1 forum to notice the sticky…  it has been reverted.
        https://forum.pfsense.org/index.php/topic,72763.0.html

        1 Reply Last reply Reply Quote 0
        • E
          eri--
          last edited by

          Give it another shot with new snapshots.

          The panics have been resolved and let us know.

          1 Reply Last reply Reply Quote 0
          • A
            adam65535
            last edited by

            EDIT: I just realized you were probably not talking about my lockup as that seemed to be a different issue…

            I never was able to reproduce this specific lockup (not crash).  The only crash issue I have is related to disabling carp on the master while under load which happens even with pfsense 2.0.3.  It happens to 2 different identical hardware installs.  Since it happens on 2.0.3 too I didn't bring it up here.  The crash is with the reverted igb drivers (like in current snapshots) and not the backported drivers which were pulled back out somewhat recently.

            https://forum.pfsense.org/index.php?topic=72965.0

            1 Reply Last reply Reply Quote 0
            • J
              jasonlitka
              last edited by

              @ermal:

              Give it another shot with new snapshots.

              The panics have been resolved and let us know.

              Is it in the current snapshots?  I can install Friday and give it a test.  Maybe Thursday.

              I can break anything.

              1 Reply Last reply Reply Quote 0
              • E
                eri--
                last edited by

                Yes it is in the latest ones.

                1 Reply Last reply Reply Quote 0
                • J
                  jasonlitka
                  last edited by

                  @ermal:

                  Yes it is in the latest ones.

                  I'm not getting any snapshots newer than what I'm on (Fri Mar 7 18:35:38 EST 2014).

                  I can break anything.

                  1 Reply Last reply Reply Quote 0
                  • M
                    maverick_slo
                    last edited by

                    Yes, correct.
                    Snapshots will be soon online again as jimp posted here: https://forum.pfsense.org/index.php?topic=72763.msg401986#msg401986

                    1 Reply Last reply Reply Quote 0
                    • rbgargaR
                      rbgarga Developer Netgate Administrator
                      last edited by

                      @Jason:

                      @ermal:

                      Yes it is in the latest ones.

                      I'm not getting any snapshots newer than what I'm on (Fri Mar 7 18:35:38 EST 2014).

                      Mar 12 snapshots are available

                      Renato Botelho

                      1 Reply Last reply Reply Quote 0
                      • J
                        jasonlitka
                        last edited by

                        So the good news is that it's not crashing any more.

                        The bad news is that I still seem to be hitting a pretty hard wall at ~2.1Gbit/s across 10Gb ix interfaces.

                        I can break anything.

                        1 Reply Last reply Reply Quote 0
                        • E
                          eri--
                          last edited by

                          You need to do tuning for that.
                          It depends on traffic amount you are generating, what you are using to generate traffic etc…

                          1 Reply Last reply Reply Quote 0
                          • J
                            jasonlitka
                            last edited by

                            @ermal:

                            You need to do tuning for that.
                            It depends on traffic amount you are generating, what you are using to generate traffic etc…

                            I've applied the same tweaks I had done to my (now defunct) FreeNAS servers with no luck.  Those boxes had slower CPUs and were able to hit ~5-6Gbit/s between each other. Testing is with iperf.

                            If you have any specific tweaks in mind I'll definitely give them a go.

                            I can break anything.

                            1 Reply Last reply Reply Quote 0
                            • E
                              eri--
                              last edited by

                              Start by sharing what you are doing!

                              1 Reply Last reply Reply Quote 0
                              • K
                                Klaws
                                last edited by

                                You might contemplate to check if you are CPU-bound or if something else is the issue.

                                top -SH
                                ```usually gives an idea where the CPU time goes.
                                1 Reply Last reply Reply Quote 0
                                • J
                                  jasonlitka
                                  last edited by

                                  @ermal:

                                  Start by sharing what you are doing!

                                  Hardware Specs (both boxes are identical):

                                  • Intel E3-1245 V2 CPU (3.4GHz) w/ HT disabled

                                  • 16GB DDR3 ECC RAM

                                  • Intel 530 240GB SSD

                                  • (12) Intel i350 1Gbe

                                  • (2) Intel X520 10Gbe

                                  Software Config:

                                  • iperf tests running across ix1 (have tried both SFP+ Direct Attach and Single-Mode OM3 patch with Intel SR optics directly between boxes, as well as running through a Cisco Nexus 5548UP)

                                  • Interface has simple any/any firewall rule

                                  • Snort is NOT running on these interfaces (though it is on others)

                                  Tweaks in /boot/loader.conf.local:

                                  • kern.ipc.nmbclusters="262144"

                                  • kern.ipc.nmbjumbop="262144"

                                  • hw.intr_storm_threshold=10000

                                  Setting MSIX on or off seems to make no difference and neither does setting the number of interface queues (have tried 1, 2, and 4).

                                  Tweaks in System Tunables:

                                  • kern.ipc.maxsockbuf=16777216

                                  • net.inet.tcp.recvbuf_inc=524288

                                  • net.inet.tcp.recvbuf_max=16777216

                                  • net.inet.tcp.sendbuf_inc=16384

                                  • net.inet.tcp.sendbuf_max=16777216

                                  Test Results (always +/- 2 Gbit/s, sometimes 1.8, sometimes 2.2):

                                  • iperf -c & -s = 2Gbit/s

                                  • iperf -c -d & -s = sum of both directions is 2Gbit/s (typically something like 1.8 and 0.2)

                                  • iperf -c -P2 & -s = sum of both threads is 2Gbit/s (typically something like 1.3 & 0.7)

                                  • iperf -c -P4 & -s = sum of all threads is 2Gbit/s (typically +/- 0.5 on each)

                                  All 4 cores have an idle percentage in the 40-50% range even when running at the -P4 test.

                                  I can break anything.

                                  1 Reply Last reply Reply Quote 0
                                  • E
                                    eri--
                                    last edited by

                                    You are sourcing traffic from the same box?

                                    1 Reply Last reply Reply Quote 0
                                    • J
                                      jasonlitka
                                      last edited by

                                      I have two identical boxes.  For the purpose of testing throughput (before I route all the internal traffic from my servers through them) I have them connected directly to each other.

                                      I can break anything.

                                      1 Reply Last reply Reply Quote 0
                                      • E
                                        eri--
                                        last edited by

                                        Well your result may vary here from the tool used.
                                        Since there are many cores your program may bounce here and there so i do not think you can achieve stable results as that.

                                        What i recommend you for ix devices is

                                        
                                        hw.ixgbe.rx_process_limit=1024 #maybe higher or lower depends on testing
                                        hw.ixgbe.tx_process_limit=1024
                                        
                                        hw.ixgbe.num_queues=#ofcores you have
                                        
                                        hw.ixgbe.txd=4096
                                        hw.ixgbe.rxd=4096
                                        
                                        

                                        Though these are very dependant on the workload you are trying to produce.

                                        Also with single stream i am not sure with default parameters of iperf you can achieve 10G :).

                                        Also remove this as well
                                        hw.intr_storm_threshold=10000

                                        1 Reply Last reply Reply Quote 0
                                        • C
                                          charliem
                                          last edited by

                                          @ermal:

                                          Give it another shot with new snapshots.

                                          The panics have been resolved and let us know.

                                          Any pointers to what the fix actually was?  I didn't see anything in redmine, or freebsd patches.  Course I haven't jumped through the hoops followed through to get access to the tools again.  Not sure it's worth it for a non-contributor, but active tester and curious code reader.

                                          1 Reply Last reply Reply Quote 0
                                          • A
                                            adam65535
                                            last edited by

                                            You are overthinking the fix I think.  I think the fix he is referring to is that thy reverted the drivers to the older versions.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.