Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Solved - 10GB link 1GB speeds

    Hardware
    6
    39
    13.7k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ?
      Guest
      last edited by

      Thanks all for the help, but this hassle just isn't worth the 4 days I have put into it.

      • pfSense tuning for 10 Gbit Throughput
        Frequency of my cpu is 2.6 Ghz, scaling to 3.8 Ghz (Xeon E3-1275 Turboboos) is a linear factor of 1,46 -> 5,0 Gbit/s -> 7,3 Gbit/s

      • 10Gbe Tuning?
        I set the MTU on these to 9000 yesterday and 9000 on the iperf servers I'm using and was able to saturate (9.5Gb/s) the link.  So I'm pretty sure I'm hitting just one interface.

      • 10gbe firewall using open source tools
        We're using Xeon E3 boxes (1260L) with Intel 10 GbE nic's (520 series) and PFSense 2.0.1 and it's working really well. We peak around 9000 Mbps at 55% CPU utilization.

      I don´t know them personally and was not in place in the moment where the tests where done
      but I am pretty sure with the today given options like HT, Speed Step and TurboBoost and perhaps
      with no PPPoE at the WAN or 10 GBit/s at the LAN will be able to realize, to get nearly 10 GBit/s out.
      But perhaps it is also pending on the used hardware. If your FreeNAS is able to deliver such numbers
      what should be then the angle point on pfSense? The packet filter, the rules, something else. I really
      don´t know it, but from time to time we will see more and more threads here in the forum about that,
      perhaps there will be at one day someone able to deliver some results and tips that is matching then
      for all others too.

      1 Reply Last reply Reply Quote 0
      • D
        donnydavis
        last edited by

        So I moved my pfsense machine to one of my blades. It's not new or anything fancy, but should have yielded better performance.
        And I was correct, the performance was 3x of the C2758

        [  3]  0.0- 1.0 sec  370 MBytes  3.10 Gbits/sec
        [  3]  1.0- 2.0 sec  363 MBytes  3.05 Gbits/sec
        [  3]  2.0- 3.0 sec  365 MBytes  3.06 Gbits/sec
        [  3]  3.0- 4.0 sec  366 MBytes  3.07 Gbits/sec
        [  3]  4.0- 5.0 sec  368 MBytes  3.08 Gbits/sec
        [  3]  5.0- 6.0 sec  372 MBytes  3.12 Gbits/sec
        [  3]  6.0- 7.0 sec  373 MBytes  3.13 Gbits/sec
        [  3]  7.0- 8.0 sec  373 MBytes  3.13 Gbits/sec
        [  3]  8.0- 9.0 sec  375 MBytes  3.15 Gbits/sec
        [  3]  9.0-10.0 sec  373 MBytes  3.13 Gbits/sec

        I have hyperthreading disabled and the bios performance level set to maximum.
        However with this type of equipment, I expected things to move about twice as fast. I have a fairly simple ruleset.
        With PF turned off, I do get performance that is closer in line with the other systems I have on my network.

        For instance I have openstack routers (which are just linux/SNAT/iptables) that run around 6.5 to 7 G on the other blades.

        Now with multiple threads (4) I can get a little closer to my mark at around 6G per second.

        I know there are people out there getting near wireline speeds from their gear, I just don't know how they are doing it.

        1 Reply Last reply Reply Quote 0
        • ?
          Guest
          last edited by

          Try disabling all offloading options, and using polling and having bigger buffers.

          1 Reply Last reply Reply Quote 0
          • D
            donnydavis
            last edited by

            It would seem that any sort of tuning actually makes it run slower. I have no marked improvements from the pfsense defaults.

            It would seem I need to get some better hardware that is more suited for the task. Good news is a 40G card is coming along with a 40G switch (6 ports).

            I am curious to see what kind of hurt I can put on this box with 40G gear.

            You can mark this thread closed, I am moving on to more important things. I will open a new one when the 40G gear get here and I have a chance to tinker.

            1 Reply Last reply Reply Quote 0
            • D
              donnydavis
              last edited by

              So i have an update. The 40G nic from mellanox performs wonderfully on vanilla FreeBSD and Linux, however I see the same performance with pfSense that I was getting with the 10GB nics. I would like to know what the differences are from the raw BSD kernel.

              I really love pfSense, it makes my life so easy to do otherwise complicated stuff. But these performance issues should be addressed.

              1 Reply Last reply Reply Quote 0
              • D
                donnydavis
                last edited by

                @johnkeates:

                Before you throw it all out: try polling. This isn't always the solution, but if you are starving due to interrupts, polling might solve some of it.

                I am not familiar? any good places to start?

                edit
                ifconfig mlxen0 polling

                Client connecting to ..., TCP port 5001
                TCP window size: 85.0 KByte (default)
                –----------------------------------------------------------
                [ ID] Interval      Transfer    Bandwidth
                [  3]  0.0- 1.0 sec  110 MBytes  922 Mbits/sec
                [  4]  0.0- 1.0 sec  64.6 MBytes  542 Mbits/sec
                [  5]  0.0- 1.0 sec  53.2 MBytes  447 Mbits/sec
                [SUM]  0.0- 1.0 sec  228 MBytes  1.91 Gbits/sec
                [  3]  1.0- 2.0 sec  110 MBytes  925 Mbits/sec
                [  5]  1.0- 2.0 sec  57.4 MBytes  481 Mbits/sec
                [  4]  1.0- 2.0 sec  56.5 MBytes  474 Mbits/sec
                [SUM]  1.0- 2.0 sec  224 MBytes  1.88 Gbits/sec
                [  3]  2.0- 3.0 sec  112 MBytes  936 Mbits/sec
                [  4]  2.0- 3.0 sec  54.5 MBytes  457 Mbits/sec
                [  5]  2.0- 3.0 sec  59.9 MBytes  502 Mbits/sec
                [SUM]  2.0- 3.0 sec  226 MBytes  1.90 Gbits/sec
                [  4]  3.0- 4.0 sec  52.8 MBytes  442 Mbits/sec
                [  3]  3.0- 4.0 sec  113 MBytes  948 Mbits/sec
                [  5]  3.0- 4.0 sec  62.1 MBytes  521 Mbits/sec
                [SUM]  3.0- 4.0 sec  228 MBytes  1.91 Gbits/sec

                ifconfig mlxen0 -polling
                –----------------------------------------------------------
                Client connecting to ..., TCP port 5001
                TCP window size: 85.0 KByte (default)

                [ ID] Interval      Transfer    Bandwidth
                [  3]  0.0- 1.0 sec  108 MBytes  905 Mbits/sec
                [  5]  0.0- 1.0 sec  109 MBytes  915 Mbits/sec
                [  4]  0.0- 1.0 sec  107 MBytes  898 Mbits/sec
                [SUM]  0.0- 1.0 sec  324 MBytes  2.72 Gbits/sec
                [  5]  1.0- 2.0 sec  108 MBytes  904 Mbits/sec
                [  4]  1.0- 2.0 sec  107 MBytes  898 Mbits/sec
                [  3]  1.0- 2.0 sec  107 MBytes  901 Mbits/sec
                [SUM]  1.0- 2.0 sec  322 MBytes  2.70 Gbits/sec
                [  5]  2.0- 3.0 sec  108 MBytes  910 Mbits/sec
                [  4]  2.0- 3.0 sec  107 MBytes  900 Mbits/sec
                [  3]  2.0- 3.0 sec  108 MBytes  906 Mbits/sec
                [SUM]  2.0- 3.0 sec  324 MBytes  2.72 Gbits/sec

                1 Reply Last reply Reply Quote 0
                • ?
                  Guest
                  last edited by

                  So i have an update. The 40G nic from mellanox performs wonderfully on vanilla FreeBSD and Linux, however I see the same performance with pfSense that I was getting with the 10GB nics. I would like to know what the differences are from the raw BSD kernel.

                  pfSense is using the pf (packet filter) and NAT as a point later in the pf process, and this will be not done
                  in the FreeBSD and Linux OS!!!! So if you want to compare then against this will be the most matching
                  answer and on top of this it might be also pending on the used hardware, if you are using a Xeon E3 or high scaling
                  Xeon E3 CPU (3,7GHz 7C/8T) you will perhaps get more throughput out of this then using a  C2758 based machine.

                  I really love pfSense, it makes my life so easy to do otherwise complicated stuff. But these performance issues should be addressed.

                  Take hardware with more horse power, or stronger sorted CPUs (and RAM) so there is nothing that have addressed to.

                  1 Reply Last reply Reply Quote 0
                  • ?
                    Guest
                    last edited by

                    To debug this a bit more try setting up pfSense as a test with no NAT enabled. At the same time, disable pf in the advanced settings. With that done, try a iperf test again. If we're gonna figure out why this is happening, we're gonna need to start excluding stuff.

                    On the other hand, if you need this to work, you might be better off buying support at Netgate since they build pfSense.

                    1 Reply Last reply Reply Quote 0
                    • D
                      donnydavis
                      last edited by

                      I agree with your point, and these are not complaints. If I wanted this to just work, I would stick with Fedora. However, I’m just trying to get to the bottom of what appears to be a pfsense specific issue. With pfctl -d I still only get around 5g and high cpu/ interrupts. Are there settings that I am missing. This is a clean install with default settings.

                      On FreeBSD and Linux there is almost no cpu utilization, as it’s mostly offloaded to the nic. However I’m not seeing this reflected in the pfsense build.

                      Thanks all for you input and time.
                      ~/D

                      1 Reply Last reply Reply Quote 0
                      • D
                        donnydavis
                        last edited by

                        @BlueKobold:

                        So i have an update. The 40G nic from mellanox performs wonderfully on vanilla FreeBSD and Linux, however I see the same performance with pfSense that I was getting with the 10GB nics. I would like to know what the differences are from the raw BSD kernel.

                        pfSense is using the pf (packet filter) and NAT as a point later in the pf process, and this will be not done
                        in the FreeBSD and Linux OS!!!! So if you want to compare then against this will be the most matching
                        answer and on top of this it might be also pending on the used hardware, if you are using a Xeon E3 or high scaling
                        Xeon E3 CPU (3,7GHz 7C/8T) you will perhaps get more throughput out of this then using a  C2758 based machine.

                        I’m only routing packets, no NAT. Also with pf fully disabled I still get very high utilization numbers.

                        I really love pfSense, it makes my life so easy to do otherwise complicated stuff. But these performance issues should be addressed.

                        Take hardware with more horse power, or stronger sorted CPUs (and RAM) so there is nothing that have addressed to.

                        There isn’t really a need for better equipment, it works fine with other options.

                        1 Reply Last reply Reply Quote 0
                        • ?
                          Guest
                          last edited by

                          Have you tried to run VyOS on your hardware? With basic NAT and firewalling enabled it will allow you to assess what your hardware is really capable of as a basic gateway/firewall.

                          1 Reply Last reply Reply Quote 0
                          • ?
                            Guest
                            last edited by

                            Hmm, next would probably be comparing sysctl output (I guess just getting both sysctl outputs and running a diff on them will do), and perhaps kernel/driver build configs (again, a diff should suffice).

                            1 Reply Last reply Reply Quote 0
                            • F
                              fwcheck
                              last edited by

                              There are some cheap ways to increase the throughput.

                              1. Increase MTU
                              If you are lucky you can use jumbo-frames throughout your environment (this will lead to a factor of 6 in throughput, assuming MTU of 9000 (maximum which is usable in vmware) instead of 1500). However if you speak to the outside-world you are likely to create a bottleneck due to the need to fragment.

                              2. Packet Rates
                              For high packet rates with small packets this will not help. There is a limit within the packet processing within FreeBSD which might be lower than in other network-stacks: Compare for example:
                              http://rhelblog.redhat.com/2015/09/29/pushing-the-limits-of-kernel-networking/
                              A valid source seems the Freebsd-Router-Project:
                              https://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr

                              They also give figures for pf.

                              3. Real World examples
                              Remember always to measure through the device:

                              [ Pc1 ]  –- > [pfsense-system] –- > [Pc2]

                              I can give some real world examples: ESXi-Guests with 8 CPUs (2.6 GHz) allow pushing of 5 Gbit/s with MTU 1500. Therefore i assume that real hardware should be able to achive higher throughputs.

                              The main problem seems to be the high interrupt-rate.

                              I did some measurements on a X710 40 Gbit/s Card (8 CPUs, > 2 GHz) and i was able to reach throughputs around 12.3 Gbit/s.
                              As far as i heared with commodity hardware the limit seems to be 26 Gbit/s,
                              https://www.ntop.org/products/packet-capture/pf_ring/pf_ring-zc-zero-copy/

                              1 Reply Last reply Reply Quote 0
                              • ?
                                Guest
                                last edited by

                                @fwcheck:

                                There are some cheap ways to increase the throughput.

                                1. Increase MTU
                                If you are lucky you can use jumbo-frames throughout your environment (this will lead to a factor of 6 in throughput, assuming MTU of 9000 (maximum which is usable in vmware) instead of 1500). However if you speak to the outside-world you are likely to create a bottleneck due to the need to fragment.

                                2. Packet Rates
                                For high packet rates with small packets this will not help. There is a limit within the packet processing within FreeBSD which might be lower than in other network-stacks: Compare for example:
                                http://rhelblog.redhat.com/2015/09/29/pushing-the-limits-of-kernel-networking/
                                A valid source seems the Freebsd-Router-Project:
                                https://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr

                                They also give figures for pf.

                                3. Real World examples
                                Remember always to measure through the device:

                                [ Pc1 ]  –- > [pfsense-system] –- > [Pc2]

                                I can give some real world examples: ESXi-Guests with 8 CPUs (2.6 GHz) allow pushing of 5 Gbit/s with MTU 1500. Therefore i assume that real hardware should be able to achive higher throughputs.

                                The main problem seems to be the high interrupt-rate.

                                I did some measurements on a X710 40 Gbit/s Card (8 CPUs, > 2 GHz) and i was able to reach throughputs around 12.3 Gbit/s.
                                As far as i heared with commodity hardware the limit seems to be 26 Gbit/s,
                                https://www.ntop.org/products/packet-capture/pf_ring/pf_ring-zc-zero-copy/

                                The 'problem' isn't in FreeBSD. He tried a plain FreeBSD install and it works fine there. It is in some difference between the settings in pfSense and FreeBSD, probably pf config, interface config, kernel config or sysctl changes.

                                1 Reply Last reply Reply Quote 0
                                • F
                                  fwcheck
                                  last edited by

                                  I am not sure i understand the problem right.

                                  Your setup looks like this:

                                  [System 1 (network 1)]  –- >  [Device under test]  –-> [System 2(network 2)]

                                  Right ?

                                  You use a freebsd system as router/firewall and achive a higher throughput than using the pfsense ?
                                  If this is the case you should check all network settings / drivers / sysctrl etc., maybe there is a setting which is
                                  not identical.
                                  Therefore using this settings should lead to a higher throughput.

                                  If you are just measuring speed via iperf3 to the pfsense system, a huge difference is given if hw-acceleration is in place, which is not recommend for a system doing routing. Check the flags (LRO, TSO, etc. to name a few options which can give huge differences) and usually also needs a reboot to be in place.

                                  1 Reply Last reply Reply Quote 0
                                  • ?
                                    Guest
                                    last edited by

                                    The 'problem' isn't in FreeBSD. He tried a plain FreeBSD install and it works fine there. It is in some difference between the settings in pfSense and FreeBSD, probably pf config, interface config, kernel config or sysctl changes.

                                    I am pretty sure, that pfSense is not only something on top of FreeBSD since the version 2.2.x it is more and more
                                    special or custom build based on the original kernel but with many many changes.

                                    If the netgate team or the pfSense team was able to push ~40 GBit/s over a IPSec tunnel using an Intel QAT card, and
                                    that card came without any ports on them, so it must be able to handle that speed over the pfSense too in my opinion.
                                    For sure also ports that are supporting and/or allowing that entire speed or throughput rate.

                                    1 Reply Last reply Reply Quote 0
                                    • D
                                      donnydavis
                                      last edited by

                                      I will pull the defaults from FreeBSD. I’m confident pfSense is fully capable of what I’m looking for. I’m just missing something.

                                      It is looking like an offload issue, as in seemingly nothing is offload to the nic. I have tried 3 different cards {intel x520, chelsio t5, Mellanox x3 40G}, all with nearly identical results. The limit of this gear with no offloads would seem to be around 4G.

                                      On a recent Linux kernel (Fedora 26) there is almost no cpu load as it’s all being done in the card.

                                      Thanks for the continued help and interest in this post. Yet another reason to push forward with pfSense. This is a great community.

                                      1 Reply Last reply Reply Quote 0
                                      • D
                                        donnydavis
                                        last edited by

                                        @fwcheck:

                                        There are some cheap ways to increase the throughput.

                                        1. Increase MTU
                                        If you are lucky you can use jumbo-frames throughout your environment (this will lead to a factor of 6 in throughput, assuming MTU of 9000 (maximum which is usable in vmware) instead of 1500). However if you speak to the outside-world you are likely to create a bottleneck due to the need to fragment.

                                        2. Packet Rates
                                        For high packet rates with small packets this will not help. There is a limit within the packet processing within FreeBSD which might be lower than in other network-stacks: Compare for example:
                                        http://rhelblog.redhat.com/2015/09/29/pushing-the-limits-of-kernel-networking/
                                        A valid source seems the Freebsd-Router-Project:
                                        https://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr

                                        They also give figures for pf.

                                        3. Real World examples
                                        Remember always to measure through the device:

                                        [ Pc1 ]  –- > [pfsense-system] –- > [Pc2]

                                        I can give some real world examples: ESXi-Guests with 8 CPUs (2.6 GHz) allow pushing of 5 Gbit/s with MTU 1500. Therefore i assume that real hardware should be able to achive higher throughputs.

                                        The main problem seems to be the high interrupt-rate.

                                        I did some measurements on a X710 40 Gbit/s Card (8 CPUs, > 2 GHz) and i was able to reach throughputs around 12.3 Gbit/s.
                                        As far as i heared with commodity hardware the limit seems to be 26 Gbit/s,
                                        https://www.ntop.org/products/packet-capture/pf_ring/pf_ring-zc-zero-copy/

                                        From [device] <–--> [device]
                                        I get wire line speed

                                        From [device]–-->[pfsense]–-> [device]

                                        This is where the issue resides

                                        I would be happy with something close to half wire line on 10G because this device is doing more than just routing traffic. However I am really quite a distance from that without 100% interrupts

                                        1 Reply Last reply Reply Quote 0
                                        • D
                                          donnydavis
                                          last edited by

                                          Here are stats from the same link on the same router using centos 7.4. These are with the factory defaults and no iptables enabled.

                                          –----------------------------------------------------------
                                          Client connecting to ..., TCP port 5001
                                          TCP window size: 85.0 KByte (default)

                                          [ ID] Interval      Transfer    Bandwidth
                                          [  5]  0.0- 1.0 sec  256 MBytes  2.15 Gbits/sec
                                          [  4]  0.0- 1.0 sec  270 MBytes  2.26 Gbits/sec
                                          [  3]  0.0- 1.0 sec  258 MBytes  2.17 Gbits/sec
                                          [  6]  0.0- 1.0 sec  327 MBytes  2.75 Gbits/sec
                                          [SUM]  0.0- 1.0 sec  1.09 GBytes  9.32 Gbits/sec
                                          [  5]  1.0- 2.0 sec  242 MBytes  2.03 Gbits/sec
                                          [  4]  1.0- 2.0 sec  251 MBytes  2.11 Gbits/sec
                                          [  3]  1.0- 2.0 sec  281 MBytes  2.36 Gbits/sec
                                          [  6]  1.0- 2.0 sec  337 MBytes  2.83 Gbits/sec
                                          [SUM]  1.0- 2.0 sec  1.09 GBytes  9.33 Gbits/sec
                                          ^C[  5]  0.0- 2.6 sec  679 MBytes  2.15 Gbits/sec
                                          [  4]  0.0- 2.6 sec  715 MBytes  2.27 Gbits/sec
                                          [  3]  0.0- 2.6 sec  718 MBytes  2.28 Gbits/sec
                                          [  6]  0.0- 2.6 sec  818 MBytes  2.60 Gbits/sec
                                          [SUM]  0.0- 2.6 sec  2.86 GBytes  9.29 Gbits/sec

                                          The CPU utilization is almost zero.

                                          cpu-util-rtr.png
                                          cpu-util-rtr.png_thumb

                                          1 Reply Last reply Reply Quote 0
                                          • D
                                            donnydavis
                                            last edited by

                                            And these are the default options that are turned on for the nic in linux.

                                            rx-checksumming: on
                                            tx-checksumming: on
                                            tx-checksum-ipv4: on
                                            tx-checksum-ipv6: on
                                            scatter-gather: on
                                            tx-scatter-gather: on
                                            tx-tcp-segmentation: on
                                            tx-tcp6-segmentation: on
                                            receive-hashing: on
                                            highdma: on [fixed]
                                            rx-vlan-filter: on [fixed]
                                            rx-vlan-stag-hw-parse: on
                                            rx-vlan-stag-filter: on [fixed]
                                            busy-poll: on [fixed]

                                            I have no idea how to translate these to bsd options. But I am thinking my issue lies here - what is offloaded for the nic to handle.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.