Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Fabiatech FX5625 improving throughput

    Scheduled Pinned Locked Moved Hardware
    13 Posts 2 Posters 452 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      The maximum throughput with a D525 is somewhere in the 650Mbps region but that's with ideal test traffic. With real works traffic, mixed packet sizes it will be lower. There may not be that much that can be done here.

      What load makes up the 100% usage on one core?

      Can we see the output of top -aSH at the command line whilst you are seeing maximum throughput?

      Steve

      S 1 Reply Last reply Reply Quote 0
      • S
        SimonB256 @stephenw10
        last edited by

        @stephenw10

        The 100% CPU usage only seems to happen the early hours of the morning. Always at the same time. I'll try to get on to it and take a look remotely tomorrow morning and will post update.

        1 Reply Last reply Reply Quote 0
        • S
          SimonB256
          last edited by SimonB256

          @stephenw10

          After manually chucking some data through to generate this load, the main process responsible is 'intr{irq257: em0:rx0}' with similar processes for the other interfaces alongside it but not quite as high (understandably as em0 is the WAN interface).

          Sample output:

          PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
             11 root     155 ki31     0K    64K CPU1    1  23.6H  87.26% [idle{idle: cpu1}]
             12 root     -92    -     0K   832K CPU0    0 233:12  79.73% [intr{irq257: em0:rx0}]
             11 root     155 ki31     0K    64K RUN     3  23.9H  76.87% [idle{idle: cpu3}]
             11 root     155 ki31     0K    64K CPU2    2  23.2H  49.23% [idle{idle: cpu2}]
             12 root     -92    -     0K   832K WAIT    2   4:05  34.74% [intr{irq278: em5:rx0}]
              0 root     -92    -     0K   816K -       3   7:47  20.59% [kernel{em0 rxq (cpuid 0)}]
             11 root     155 ki31     0K    64K RUN     0  18.6H  13.41% [idle{idle: cpu0}]
             12 root     -92    -     0K   832K WAIT    2  51:47  11.30% [intr{irq261: em1:rx0}]
             12 root     -92    -     0K   832K WAIT    0 107:33   5.89% [intr{irq265: em2:rx0}]
              0 root     -92    -     0K   816K -       2  23:04   5.05% [kernel{dummynet}]
             12 root     -92    -     0K   832K WAIT    1  16:14   4.75% [intr{irq258: em0:tx0}]
             12 root     -92    -     0K   832K WAIT    3   0:16   4.39% [intr{irq279: em5:tx0}]
             12 root     -92    -     0K   832K WAIT    3   6:09   1.87% [intr{irq262: em1:tx0}]
             12 root     -92    -     0K   832K WAIT    2  13:49   1.40% [intr{irq269: em3:rx0}]
              0 root     -92    -     0K   816K -       1   1:41   0.75% [kernel{em5 rxq (cpuid 2)}]
             12 root     -92    -     0K   832K WAIT    1  15:43   0.58% [intr{irq266: em2:tx0}]
          74844 root      20    0  9868K  4700K CPU3    3   0:00   0.53% top -aSH
              0 root     -92    -     0K   816K -       1   2:42   0.46% [kernel{em1 rxq (cpuid 2)}]
             12 root     -92    -     0K   832K WAIT    0  11:15   0.42% [intr{irq281: em6:rx0}]
             12 root     -60    -     0K   832K WAIT    1   3:25   0.27% [intr{swi4: clock (0)}]
             12 root     -92    -     0K   832K WAIT    3   2:18   0.26% [intr{irq270: em3:tx0}]
          

          Checking things like mbuf et al, and there appears to be plenty of room there:

          35554/14801/50355 mbufs in use (current/cache/total)
          33501/13093/46594/249500 mbuf clusters in use (current/cache/total/max)
          33501/13051 mbuf+clusters out of packet secondary zone in use (current/cache)
          0/34/34/124749 4k (page size) jumbo clusters in use (current/cache/total/max)
          0/0/0/36962 9k jumbo clusters in use (current/cache/total/max)
          0/0/0/20791 16k jumbo clusters in use (current/cache/total/max)
          75890K/30022K/105912K bytes allocated to network (current/cache/total)
          0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
          0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
          0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
          0/0/0 requests for jumbo clusters denied (4k/9k/16k)
          0 sendfile syscalls
          0 sendfile syscalls completed without I/O request
          0 requests for I/O initiated by sendfile
          0 pages read by sendfile as part of a request
          0 pages were valid at time of a sendfile request
          0 pages were requested for read ahead by applications
          0 pages were read ahead by sendfile
          0 times sendfile encountered an already busy page
          0 requests for sfbufs denied
          0 requests for sfbufs delayed
          

          Current MBUF limit set as:

          [2.4.5-RELEASE][admin@firewall1.midlandcomputers.com]/root: sysctl kern.ipc.nmbclusters
          kern.ipc.nmbclusters: 249500
          
          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            em uses a single receive and transmit queue so you're unlikely to exhaust the mbufs.

            What throughput were you seeing when that was taken?
            Between which interfaces

            What throughput do you see without any of those loader variables, just using the em defaults?

            What output do you get from vmstat -i and sysctl net.isr

            Steve

            S 1 Reply Last reply Reply Quote 0
            • S
              SimonB256 @stephenw10
              last edited by

              Output from sysctl net.isr :

              net.isr.numthreads: 4
              net.isr.maxprot: 16
              net.isr.defaultqlimit: 256
              net.isr.maxqlimit: 10240
              net.isr.bindthreads: 0
              net.isr.maxthreads: 4
              net.isr.dispatch: direct
              

              Output from vmstat -i:

              interrupt                          total       rate
              irq18: uhci2+                     304106          3
              cpu0:timer                     108772857       1036
              cpu1:timer                      68073061        648
              cpu2:timer                       9281390         88
              cpu3:timer                      19118159        182
              irq257: em0:rx0                194215751       1850
              irq258: em0:tx0                229258370       2183
              irq259: em0:link                       1          0
              irq261: em1:rx0                 48310327        460
              irq262: em1:tx0                 82599543        787
              irq263: em1:link                       1          0
              irq265: em2:rx0                113082535       1077
              irq266: em2:tx0                193176467       1840
              irq267: em2:link                       1          0
              irq269: em3:rx0                 23497096        224
              irq270: em3:tx0                 39913436        380
              irq271: em3:link                       1          0
              irq273: em4:rx0                   157084          1
              irq274: em4:tx0                   104642          1
              irq275: em4:link                       1          0
              irq277: pcib8                          1          0
              irq278: em5:rx0                  3537702         34
              irq279: em5:tx0                  3615446         34
              irq280: em5:link                       1          0
              irq281: em6:rx0                 11959127        114
              irq282: em6:tx0                 15965140        152
              irq283: em6:link                       1          0
              irq284: em7:rx0                   421216          4
              irq285: em7:tx0                    21775          0
              irq286: em7:link                       9          0
              Total                         1165385247      11098
              

              In the example I posted above I was simply downloading large files two hosts without bandwidth caps. Where em0 is the WAN interface, and em1 & em5 were where the hosts were residing.

              I will remove what I have entered from the loader.conf, reboot and retry, but rebooting the firewall during office hours is a pain to arrange. I'll get this done this evening.

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                You might try setting:
                net.isr.bindthreads=1

                The core affinity might give you better distribution.

                S 1 Reply Last reply Reply Quote 1
                • S
                  SimonB256 @stephenw10
                  last edited by

                  Hi,

                  I've set that and rebooted, and will test over the weekend.

                  I might be gong completely along the wrong train of thought, but would net.isr.direct=1 possibly also help?

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    @SimonB256 said in Fabiatech FX5625 improving throughput:

                    net.isr.direct

                    That doesn't exist in FreeBSD after 9 (pfSense 2.4.5 is built on 11.3), that's what net.isr.dispatch: direct does.

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • S
                      SimonB256
                      last edited by

                      Just to update, it appears that I am now getting better throughput after adding net.isr.bindthreads=1.

                      Thank you for your help.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Ah, good to hear. What sort of improvement are you seeing?

                        1 Reply Last reply Reply Quote 0
                        • S
                          SimonB256
                          last edited by

                          In terms of throughput I'm only seeing a 15-20Mbps increase (so we're up to 470Mbps). But we're seeing far less packet loss at the top end of these speeds.

                          Looking further at the kind of traffic we're handling. We're talking around 600-700 flows at any given time (according to ntop I have running elsewhere in the network), and around 15k-20k states listed on the firewall itself.

                          So I imagine for this small device, handling a reasonable amount of small connections at any time might explain why we wouldn't be getting the 600Mbps+ theoretical max.

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Yes that seems reasonable. You would only see >600Mbps using all full size packets.

                            Steve

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.