Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    10Gbe Tuning?

    Scheduled Pinned Locked Moved Hardware
    83 Posts 19 Posters 40.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Run 'top -SH' at the console to see how the usage breaks down across the cores.
      How are the NICs connected? If they're PCI you might hit a bottleneck there.
      Try running a test through pfSense instead of using it as an end-point.
      The previous user who got greater than 600Mbps through his atom had to make some tweaks. I forget the details but I think he disabled some PCI power saving options in the bios.
      You could try enabling ip fast-forwarding if your not using ipsec.

      Steve

      1 Reply Last reply Reply Quote 0
      • D
        dmitripr
        last edited by

        @stephenw10:

        Run 'top -SH' at the console to see how the usage breaks down across the cores.
        How are the NICs connected? If they're PCI you might hit a bottleneck there.
        Try running a test through pfSense instead of using it as an end-point.
        The previous user who got greater than 600Mbps through his atom had to make some tweaks. I forget the details but I think he disabled some PCI power saving options in the bios.
        You could try enabling ip fast-forwarding if your not using ipsec.

        Steve

        I have embedded Broadcom NICs, not PCI.

        Unfortunately I don't have enough (powerful enough) equipment to handle 1 Gbps simulation through the pfsense.  I got a Lenovo T440 with an i5, but like I said in my previous thread, the I can't get 1 Gbps saturation via iperf on it either (it should be able to, maybe it's a Win7 issue or something.) I also got a NAS, but it's a very slow processor.  I got macbook air as well, but without a gigabit adapter (wifi only).

        So, using what I got. Pfsense –> Lenovo. TCP Window size of 128Kb:

        [ ID] Interval      Transfer    Bandwidth
        [  3]  0.0- 1.0 sec  37.6 MBytes  316 Mbits/sec
        [  3]  1.0- 2.0 sec  39.1 MBytes  328 Mbits/sec
        [  3]  2.0- 3.0 sec  38.4 MBytes  322 Mbits/sec
        [  3]  3.0- 4.0 sec  37.8 MBytes  317 Mbits/sec
        [  3]  4.0- 5.0 sec  37.1 MBytes  311 Mbits/sec
        [  3]  5.0- 6.0 sec  36.9 MBytes  309 Mbits/sec
        [  3]  6.0- 7.0 sec  37.1 MBytes  311 Mbits/sec
        [  3]  7.0- 8.0 sec  37.0 MBytes  310 Mbits/sec
        [  3]  8.0- 9.0 sec  40.0 MBytes  336 Mbits/sec
        [  3]  9.0-10.0 sec  37.9 MBytes  318 Mbits/sec
        [  3]  0.0-10.0 sec  379 MBytes  318 Mbits/sec

        I was running top -SH in another session:

        last pid: 65943;  load averages:  0.18,  0.04,  0.01    up 2+03:16:25  20:26:55
        169 processes: 10 running, 139 sleeping, 3 stopped, 17 waiting
        CPU:  0.0% user,  0.0% nice, 23.7% system, 24.9% interrupt, 51.3% idle
        Mem: 834M Active, 1198M Inact, 699M Wired, 296K Cache, 416M Buf, 1180M Free
        Swap: 8192M Total, 8192M Free

        PID USERNAME PRI NICE  SIZE    RES STATE  C  TIME  WCPU COMMAND
          11 root    171 ki31    0K    64K CPU2    2  49.9H 91.16% idle{idle: cpu2}
          11 root    171 ki31    0K    64K RUN    3  50.3H 87.50% idle{idle: cpu3}
          11 root    171 ki31    0K    64K RUN    1  50.2H 83.25% idle{idle: cpu1}
          12 root    -68    -    0K  336K CPU0    0  10:10 60.89% intr{irq18: bge1
        65943 root      76    0 13556K  2628K CPU1    1  0:08 54.88% iperf{iperf}
          11 root    171 ki31    0K    64K RUN    0  50.5H 43.55% idle{idle: cpu0}
        34264 root      64  20  619M  301M bpf    1  17:53  0.00% snort{snort}
          258 root      76  20  6908K  1404K kqread  3  15:34  0.00% check_reload_stat
          12 root    -68    -    0K  336K WAIT    0  10:05  0.00% intr{irq16: bge0
          12 root    -32    -    0K  336K RUN    0  7:13  0.00% intr{swi4: clock}
        64693 proxy    64  20  380M  364M kqread  2  3:35  0.00% squid
        28093 root      44    0  5784K  1484K select  2  1:29  0.00% apinger
          23 root      20    -    0K    16K syncer  3  0:58  0.00% syncer
            0 root    -16    0    0K  176K sched  2  0:44  0.00% kernel{swapper}
          14 root    -16    -    0K    16K -      2  0:32  0.00% yarrow
        20488 root      44    0 26272K  7532K kqread  0  0:24  0.00% lighttpd
        86216 root      76  20  8296K  1932K wait    0  0:21  0.00% sh
          12 root    -32    -    0K  336K RUN    0  0:18  0.00% intr{swi4: clock}
            8 root    -16    -    0K    16K pftm    1  0:14  0.00% pfpurge
        30278 dhcpd    44    0 15180K 10444K select  2  0:13  0.00% dhcpd

        I'm not sure what the bottleneck is here. On second thought, it doesn't looks like a processor issue. Also, I already have ip fast-forward turned on (I do use IPsec, but have not had any issues with ip fast-forward yet).

        Thanks for any help!

        1 Reply Last reply Reply Quote 0
        • D
          dmitripr
          last edited by

          Good news. I figured out the issue. The length of buffers was too short (1470 bytes for UDP by default), once I increased it to 16000 bytes things got moving much quicker.

          Again pfsense –> Lenovo:

          [2.1.4-RELEASE]: iperf -c 192.168.1.107 -u -b 1000m -i 1 -l 16000
          –----------------------------------------------------------
          Client connecting to 192.168.1.107, UDP port 5001
          Sending 16000 byte datagrams
          UDP buffer size: 56.0 KByte (default)

          [  3] local 192.168.1.1 port 46600 connected with 192.168.1.107 port 5001
          [ ID] Interval      Transfer    Bandwidth
          [  3]  0.0- 1.0 sec  104 MBytes  872 Mbits/sec
          [  3]  1.0- 2.0 sec  105 MBytes  884 Mbits/sec
          [  3]  2.0- 3.0 sec  108 MBytes  908 Mbits/sec
          [  3]  3.0- 4.0 sec  107 MBytes  894 Mbits/sec
          [  3]  4.0- 5.0 sec  109 MBytes  914 Mbits/sec
          [  3]  5.0- 6.0 sec  109 MBytes  915 Mbits/sec
          [  3]  6.0- 7.0 sec  109 MBytes  912 Mbits/sec
          [  3]  7.0- 8.0 sec  108 MBytes  909 Mbits/sec
          [  3]  8.0- 9.0 sec  106 MBytes  890 Mbits/sec
          [  3]  9.0-10.0 sec  105 MBytes  883 Mbits/sec
          [  3]  0.0-10.0 sec  1.05 GBytes  898 Mbits/sec
          [  3] Sent 70583 datagrams

          I'm pretty much hitting the practical limit of a gigabit right there.

          But when I switch to TCP, I'm still getting ~300mbps.

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Even though your NICs are on-board they will still be connected via either a PCI or PCIe bus to the chipset. It seems  unlikely that it would be PCI but you never know. The exact NIC chip code will tell you. Clearly the CPU is not the restriction here, all the cores are still running idle processes.

            Steve

            1 Reply Last reply Reply Quote 0
            • R
              razzfazz
              last edited by

              @dmitripr:

              12 root    -68    -    0K  336K CPU0    0  10:10 60.89% intr{irq18: bge1

              The interrupt load seems pretty high for <1Gbps throughput.

              1 Reply Last reply Reply Quote 0
              • D
                dmitripr
                last edited by

                @razzfazz:

                @dmitripr:

                12 root    -68    -    0K  336K CPU0    0  10:10 60.89% intr{irq18: bge1

                The interrupt load seems pretty high for <1Gbps throughput.

                I'm sure these are not the best NICs out there. :). But considering 4 cores here, this is only ~15% of CPU usage. Probably not too bad, but not great either. Intel NICs would fair better for sure.

                1 Reply Last reply Reply Quote 0
                • ?
                  Guest
                  last edited by

                  @dmitripr:

                  I'm sure these are not the best NICs out there. :). But considering 4 cores here, this is only ~15% of CPU usage. Probably not too bad, but not great either. Intel NICs would fair better for sure.

                  And Chelsio better still.

                  1 Reply Last reply Reply Quote 0
                  • Q
                    q54e3w
                    last edited by

                    new Intel driver v2.5.25 for x520 / x540 cards was released last week - has anybody tried it yet?

                    https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=14688&lang=eng&ProdId=3412

                    1 Reply Last reply Reply Quote 0
                    • C
                      cyruspy
                      last edited by

                      @gonzopancho:

                      @Jason:

                      I was able to get ~8Gbit/s between two FreeNAS 9.x boxes without jumbo frames when using 4 threads.  That's pretty close to wire.

                      OK, Jason… FreeBSD won't forward at wirespeed on 10Gbps networks.

                      Since the BSDRP guy can only manage to forward (no firewall, just fast forwarding) at a pinch over 1.8Mpps, (and you were doing, by my best estimate, 5.5Mpps), I'm going to assert that we still have work to do.

                      brunoc:  we're currently engaged in a 10G performance study, but yes, part of the solution will be tuning, and part of it will be the threaded pf in pfSense version 2.2.

                      Hmm, if all I need is a a pair of routers running CARP and NAT with a pool of IPs with 10GbE Intel NICs, would it make sense to go with 2.2 Alpha snapshots?

                      1 Reply Last reply Reply Quote 0
                      • ?
                        Guest
                        last edited by

                        "8Gbps" is not how we measure these things.

                        Quote PPS or go home.

                        1 Reply Last reply Reply Quote 0
                        • C
                          cyruspy
                          last edited by

                          @gonzopancho:

                          "8Gbps" is not how we measure these things.

                          Quote PPS or go home.

                          My bad, lets say I need NAT (PAT really) for 500kpps

                          1 Reply Last reply Reply Quote 0
                          • ?
                            Guest
                            last edited by

                            There is an active internal project to get the performance of 'pf' up.

                            1 Reply Last reply Reply Quote 0
                            • C
                              cyruspy
                              last edited by

                              @gonzopancho:

                              There is an active internal project to get the performance of 'pf' up.

                              Would be nice to  know a little more about that project.  For the time being, how near that mark can I get with a Xeon E5520/E5620, PCIe and a decent 10GbE Intel NIC?.

                              Should I stay with 2.1.5 or venture with 2.2 ALPHA because of the FreeBSD 10 baseline? .

                              1 Reply Last reply Reply Quote 0
                              • ?
                                Guest
                                last edited by

                                I'd go 2.2-BETA, personally.  there are only a couple things to get fixed.

                                The test harness is here:  https://github.com/gvnn3/conductor

                                (Remember, people say I don't know how to open source.)

                                1 Reply Last reply Reply Quote 0
                                • C
                                  cyruspy
                                  last edited by

                                  I didn't know there was a Beta already, I'll look at it. Thanks.

                                  1 Reply Last reply Reply Quote 0
                                  • ?
                                    Guest
                                    last edited by

                                    It's not, but should be quite soon.

                                    1 Reply Last reply Reply Quote 0
                                    • S
                                      superbree
                                      last edited by

                                      Now that 2.2 is beta.  A few questions about 10Gbe.

                                      1. are the system tune-able tweaks still necessary for the intel ix drivers?

                                      2. are the tweaks needed in the /boot/loader.conf.local as mentioned in reply #14?

                                      3. Are LRO and TSO still needed to be disabled in 2.2 beta for the ix drivers?

                                      Thank you in advance for any reply!

                                      1 Reply Last reply Reply Quote 0
                                      • ?
                                        Guest
                                        last edited by

                                        Dude in #14 doesn't understand what he's doing.

                                        (People who "tune" TCP variables to get packet filtering / NAT throughput are wasting time.)

                                        You're getting faster IPSec (AES-GCM w/ AES-NI) with 2.2.  You'll see some improvement from the threaded "pf" in FreeBSD 10(.1), upon which pFsense 2.2 is based.

                                        I've already discussed the faster version of pf here and elsewhere.  There are a couple easy improvements (good for 12-15%), and these might make it into 2.2.x.  After that it gets hard, pf is a really crappy architecture for performance.

                                        In any case, these things take time, and/or money.

                                        "Patches accepted."

                                        1 Reply Last reply Reply Quote 0
                                        • S
                                          superbree
                                          last edited by

                                          Thank you for you reply.  I completely understand what you are saying.  PF and PPS ;D  I am excited to see what threaded PF in 2.2 might do.  I am also interested in what you have been saying with regards to a faster version of PF.  Can you point me in the right direction what you have been discussing so that i might catch up?  A link or PM?

                                          Thank you,

                                          1 Reply Last reply Reply Quote 0
                                          • ?
                                            Guest
                                            last edited by

                                            Now that we're on FreeBSD-10, we have netmap (*).

                                            ipfw over netmap exists: https://code.google.com/p/netmap-ipfw/
                                            Quoting that page, "This version reaches 7-10 Mpps for filtering".

                                            A preview of same: http://lists.freebsd.org/pipermail/freebsd-ipfw/2012-July/005176.html
                                            "A quick test with a simple ruleset (4 rules, see below) shows a processing speed of 9-10Mpps on one core."

                                            Seems obvious (to me) that plumbing pf over netmap (so pf in userspace) is something we should attempt.  There is a pfSense hackathon mid-October, and we should know more coming out of that.

                                            10G Ethernet at 64 byte packets is 14.8Mpps.  If we can do 7Mpps with pf, then for an average packet size of just over 128 bytes, we will be able to filter at line rate.

                                            Moreover, that's not the end of it, it's just where we're starting.

                                            But getting this work into pfSense will be more than just implementing "pf over netmap".

                                            (*) OK, I"m talking about pfSense 2.2, which technically is still in beta, and yes, netmap was in 8.3 as well, let's not discuss how old 8.3 is.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.