Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Fabiatech FX5625 improving throughput

    Scheduled Pinned Locked Moved Hardware
    13 Posts 2 Posters 385 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      SimonB256
      last edited by

      I have a Fabiatech FX5625 on a 500Mbps leased line, and to be honest it's struggling with anything over 450Mbps. Where we are seeing about 10% packet loss at rates above 470Mbps.

      Can anyone suggest where I should be looking to further tune its performance?

      System is running PFSense 2.4.5_1

      It has 8 configured interfaces as:
      1 x WAN
      7 x OPT/LAN
      1 x for pfsync to a backup unit

      There is no NAT involved.
      There is some traffic shaping, but only via limiters/queues.
      Rules are generally only on the WAN (still less than 100rules), with less than 10 on each interface

      The only package installed is bandwidthd, but I can remove it if it would help.

      We monitor the device via SNMP, and we can see that its not loaded in terms of:
      Memory - 0 swap usage, 2.5Gb free of 4Gb)
      CPU - Atom D525, Core0 - sits at 15/20% peaks at 100%, the three remaining cores sit at 5% peak at 40%
      Disk IO

      More resource info:

      Context Switches:
      Context switches

      Interrupts:
      Interrupts

      Load:
      Load

      Details of the hardware:
      (output of pciconf -lv)

      hostb0@pci0:0:0:0:	class=0x060000 card=0xa0008086 chip=0xa0008086 rev=0x02 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = 'Atom Processor D4xx/D5xx/N4xx/N5xx DMI Bridge'
          class      = bridge
          subclass   = HOST-PCI
      vgapci0@pci0:0:2:0:	class=0x030000 card=0xa0018086 chip=0xa0018086 rev=0x02 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = 'Atom Processor D4xx/D5xx/N4xx/N5xx Integrated Graphics Controller'
          class      = display
          subclass   = VGA
      vgapci1@pci0:0:2:1:	class=0x038000 card=0xa0018086 chip=0xa0028086 rev=0x02 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = 'Atom Processor D4xx/D5xx/N4xx/N5xx Integrated Graphics Controller'
          class      = display
      pcib1@pci0:0:28:0:	class=0x060400 card=0x283f8086 chip=0x283f8086 rev=0x04 hdr=0x01
          vendor     = 'Intel Corporation'
          device     = '82801H (ICH8 Family) PCI Express Port 1'
          class      = bridge
          subclass   = PCI-PCI
      pcib2@pci0:0:28:1:	class=0x060400 card=0x28418086 chip=0x28418086 rev=0x04 hdr=0x01
          vendor     = 'Intel Corporation'
          device     = '82801H (ICH8 Family) PCI Express Port 2'
          class      = bridge
          subclass   = PCI-PCI
      pcib3@pci0:0:28:2:	class=0x060400 card=0x28438086 chip=0x28438086 rev=0x04 hdr=0x01
          vendor     = 'Intel Corporation'
          device     = '82801H (ICH8 Family) PCI Express Port 3'
          class      = bridge
          subclass   = PCI-PCI
      pcib4@pci0:0:28:3:	class=0x060400 card=0x28458086 chip=0x28458086 rev=0x04 hdr=0x01
          vendor     = 'Intel Corporation'
          device     = '82801H (ICH8 Family) PCI Express Port 4'
          class      = bridge
          subclass   = PCI-PCI
      pcib5@pci0:0:28:4:	class=0x060400 card=0x28478086 chip=0x28478086 rev=0x04 hdr=0x01
          vendor     = 'Intel Corporation'
          device     = '82801H (ICH8 Family) PCI Express Port 5'
          class      = bridge
          subclass   = PCI-PCI
      pcib6@pci0:0:28:5:	class=0x060400 card=0x28498086 chip=0x28498086 rev=0x04 hdr=0x01
          vendor     = 'Intel Corporation'
          device     = '82801H (ICH8 Family) PCI Express Port 6'
          class      = bridge
          subclass   = PCI-PCI
      uhci0@pci0:0:29:0:	class=0x0c0300 card=0x28308086 chip=0x28308086 rev=0x04 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82801H (ICH8 Family) USB UHCI Controller'
          class      = serial bus
          subclass   = USB
      uhci1@pci0:0:29:1:	class=0x0c0300 card=0x28318086 chip=0x28318086 rev=0x04 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82801H (ICH8 Family) USB UHCI Controller'
          class      = serial bus
          subclass   = USB
      uhci2@pci0:0:29:2:	class=0x0c0300 card=0x28328086 chip=0x28328086 rev=0x04 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82801H (ICH8 Family) USB UHCI Controller'
          class      = serial bus
          subclass   = USB
      uhci3@pci0:0:29:3:	class=0x0c0300 card=0x28338086 chip=0x28338086 rev=0x04 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82801H (ICH8 Family) USB UHCI Controller'
          class      = serial bus
          subclass   = USB
      ehci0@pci0:0:29:7:	class=0x0c0320 card=0x28368086 chip=0x28368086 rev=0x04 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82801H (ICH8 Family) USB2 EHCI Controller'
          class      = serial bus
          subclass   = USB
      pcib12@pci0:0:30:0:	class=0x060401 card=0x24488086 chip=0x24488086 rev=0xf4 hdr=0x01
          vendor     = 'Intel Corporation'
          device     = '82801 Mobile PCI Bridge'
          class      = bridge
          subclass   = PCI-PCI
      isab0@pci0:0:31:0:	class=0x060100 card=0x28158086 chip=0x28158086 rev=0x04 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82801HM (ICH8M) LPC Interface Controller'
          class      = bridge
          subclass   = PCI-ISA
      atapci0@pci0:0:31:1:	class=0x01018a card=0x28508086 chip=0x28508086 rev=0x04 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82801HM/HEM (ICH8M/ICH8M-E) IDE Controller'
          class      = mass storage
          subclass   = ATA
      atapci1@pci0:0:31:2:	class=0x01018f card=0x28288086 chip=0x28288086 rev=0x04 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [IDE mode]'
          class      = mass storage
          subclass   = ATA
      none0@pci0:0:31:3:	class=0x0c0500 card=0x283e8086 chip=0x283e8086 rev=0x04 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82801H (ICH8 Family) SMBus Controller'
          class      = serial bus
          subclass   = SMBus
      em0@pci0:1:0:0:	class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82574L Gigabit Network Connection'
          class      = network
          subclass   = ethernet
      em1@pci0:2:0:0:	class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82574L Gigabit Network Connection'
          class      = network
          subclass   = ethernet
      em2@pci0:3:0:0:	class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82574L Gigabit Network Connection'
          class      = network
          subclass   = ethernet
      em3@pci0:4:0:0:	class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82574L Gigabit Network Connection'
          class      = network
          subclass   = ethernet
      em4@pci0:5:0:0:	class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82574L Gigabit Network Connection'
          class      = network
          subclass   = ethernet
      pcib7@pci0:6:0:0:	class=0x060400 card=0x850510b5 chip=0x850510b5 rev=0xaa hdr=0x01
          vendor     = 'PLX Technology, Inc.'
          device     = 'PEX 8505 5-lane, 5-port PCI Express Switch'
          class      = bridge
          subclass   = PCI-PCI
      pcib8@pci0:7:1:0:	class=0x060400 card=0x850510b5 chip=0x850510b5 rev=0xaa hdr=0x01
          vendor     = 'PLX Technology, Inc.'
          device     = 'PEX 8505 5-lane, 5-port PCI Express Switch'
          class      = bridge
          subclass   = PCI-PCI
      pcib9@pci0:7:2:0:	class=0x060400 card=0x850510b5 chip=0x850510b5 rev=0xaa hdr=0x01
          vendor     = 'PLX Technology, Inc.'
          device     = 'PEX 8505 5-lane, 5-port PCI Express Switch'
          class      = bridge
          subclass   = PCI-PCI
      pcib10@pci0:7:3:0:	class=0x060400 card=0x850510b5 chip=0x850510b5 rev=0xaa hdr=0x01
          vendor     = 'PLX Technology, Inc.'
          device     = 'PEX 8505 5-lane, 5-port PCI Express Switch'
          class      = bridge
          subclass   = PCI-PCI
      pcib11@pci0:7:4:0:	class=0x060400 card=0x850510b5 chip=0x850510b5 rev=0xaa hdr=0x01
          vendor     = 'PLX Technology, Inc.'
          device     = 'PEX 8505 5-lane, 5-port PCI Express Switch'
          class      = bridge
          subclass   = PCI-PCI
      em5@pci0:9:0:0:	class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82574L Gigabit Network Connection'
          class      = network
          subclass   = ethernet
      em6@pci0:10:0:0:	class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82574L Gigabit Network Connection'
          class      = network
          subclass   = ethernet
      em7@pci0:11:0:0:	class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
          vendor     = 'Intel Corporation'
          device     = '82574L Gigabit Network Connection'
          class      = network
          subclass   = ethernet
      
      

      Current loader.conf:

      legal.intel_wpi.license_ack=1
      legal.intel_ipw.license_ack=1
      kern.ipc.somaxconn="4096"
      hw.intr_storm_threshold="5000"
      hw.em.fc_setting="0"
      hw.em.rxd="4096"
      hw.em.txd="4096"
      hw.em.tx_int_delay="512"
      hw.em.rx_int_delay="512"
      hw.em.tx_abs_int_delay="1024"
      hw.em.rx_abs_int_delay="1024"
      autoboot_delay="3"
      hw.usb.no_pf="1"
      net.pf.request_maxcount="2000000"
      
      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        The maximum throughput with a D525 is somewhere in the 650Mbps region but that's with ideal test traffic. With real works traffic, mixed packet sizes it will be lower. There may not be that much that can be done here.

        What load makes up the 100% usage on one core?

        Can we see the output of top -aSH at the command line whilst you are seeing maximum throughput?

        Steve

        S 1 Reply Last reply Reply Quote 0
        • S
          SimonB256 @stephenw10
          last edited by

          @stephenw10

          The 100% CPU usage only seems to happen the early hours of the morning. Always at the same time. I'll try to get on to it and take a look remotely tomorrow morning and will post update.

          1 Reply Last reply Reply Quote 0
          • S
            SimonB256
            last edited by SimonB256

            @stephenw10

            After manually chucking some data through to generate this load, the main process responsible is 'intr{irq257: em0:rx0}' with similar processes for the other interfaces alongside it but not quite as high (understandably as em0 is the WAN interface).

            Sample output:

            PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
               11 root     155 ki31     0K    64K CPU1    1  23.6H  87.26% [idle{idle: cpu1}]
               12 root     -92    -     0K   832K CPU0    0 233:12  79.73% [intr{irq257: em0:rx0}]
               11 root     155 ki31     0K    64K RUN     3  23.9H  76.87% [idle{idle: cpu3}]
               11 root     155 ki31     0K    64K CPU2    2  23.2H  49.23% [idle{idle: cpu2}]
               12 root     -92    -     0K   832K WAIT    2   4:05  34.74% [intr{irq278: em5:rx0}]
                0 root     -92    -     0K   816K -       3   7:47  20.59% [kernel{em0 rxq (cpuid 0)}]
               11 root     155 ki31     0K    64K RUN     0  18.6H  13.41% [idle{idle: cpu0}]
               12 root     -92    -     0K   832K WAIT    2  51:47  11.30% [intr{irq261: em1:rx0}]
               12 root     -92    -     0K   832K WAIT    0 107:33   5.89% [intr{irq265: em2:rx0}]
                0 root     -92    -     0K   816K -       2  23:04   5.05% [kernel{dummynet}]
               12 root     -92    -     0K   832K WAIT    1  16:14   4.75% [intr{irq258: em0:tx0}]
               12 root     -92    -     0K   832K WAIT    3   0:16   4.39% [intr{irq279: em5:tx0}]
               12 root     -92    -     0K   832K WAIT    3   6:09   1.87% [intr{irq262: em1:tx0}]
               12 root     -92    -     0K   832K WAIT    2  13:49   1.40% [intr{irq269: em3:rx0}]
                0 root     -92    -     0K   816K -       1   1:41   0.75% [kernel{em5 rxq (cpuid 2)}]
               12 root     -92    -     0K   832K WAIT    1  15:43   0.58% [intr{irq266: em2:tx0}]
            74844 root      20    0  9868K  4700K CPU3    3   0:00   0.53% top -aSH
                0 root     -92    -     0K   816K -       1   2:42   0.46% [kernel{em1 rxq (cpuid 2)}]
               12 root     -92    -     0K   832K WAIT    0  11:15   0.42% [intr{irq281: em6:rx0}]
               12 root     -60    -     0K   832K WAIT    1   3:25   0.27% [intr{swi4: clock (0)}]
               12 root     -92    -     0K   832K WAIT    3   2:18   0.26% [intr{irq270: em3:tx0}]
            

            Checking things like mbuf et al, and there appears to be plenty of room there:

            35554/14801/50355 mbufs in use (current/cache/total)
            33501/13093/46594/249500 mbuf clusters in use (current/cache/total/max)
            33501/13051 mbuf+clusters out of packet secondary zone in use (current/cache)
            0/34/34/124749 4k (page size) jumbo clusters in use (current/cache/total/max)
            0/0/0/36962 9k jumbo clusters in use (current/cache/total/max)
            0/0/0/20791 16k jumbo clusters in use (current/cache/total/max)
            75890K/30022K/105912K bytes allocated to network (current/cache/total)
            0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
            0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
            0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
            0/0/0 requests for jumbo clusters denied (4k/9k/16k)
            0 sendfile syscalls
            0 sendfile syscalls completed without I/O request
            0 requests for I/O initiated by sendfile
            0 pages read by sendfile as part of a request
            0 pages were valid at time of a sendfile request
            0 pages were requested for read ahead by applications
            0 pages were read ahead by sendfile
            0 times sendfile encountered an already busy page
            0 requests for sfbufs denied
            0 requests for sfbufs delayed
            

            Current MBUF limit set as:

            [2.4.5-RELEASE][admin@firewall1.midlandcomputers.com]/root: sysctl kern.ipc.nmbclusters
            kern.ipc.nmbclusters: 249500
            
            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              em uses a single receive and transmit queue so you're unlikely to exhaust the mbufs.

              What throughput were you seeing when that was taken?
              Between which interfaces

              What throughput do you see without any of those loader variables, just using the em defaults?

              What output do you get from vmstat -i and sysctl net.isr

              Steve

              S 1 Reply Last reply Reply Quote 0
              • S
                SimonB256 @stephenw10
                last edited by

                Output from sysctl net.isr :

                net.isr.numthreads: 4
                net.isr.maxprot: 16
                net.isr.defaultqlimit: 256
                net.isr.maxqlimit: 10240
                net.isr.bindthreads: 0
                net.isr.maxthreads: 4
                net.isr.dispatch: direct
                

                Output from vmstat -i:

                interrupt                          total       rate
                irq18: uhci2+                     304106          3
                cpu0:timer                     108772857       1036
                cpu1:timer                      68073061        648
                cpu2:timer                       9281390         88
                cpu3:timer                      19118159        182
                irq257: em0:rx0                194215751       1850
                irq258: em0:tx0                229258370       2183
                irq259: em0:link                       1          0
                irq261: em1:rx0                 48310327        460
                irq262: em1:tx0                 82599543        787
                irq263: em1:link                       1          0
                irq265: em2:rx0                113082535       1077
                irq266: em2:tx0                193176467       1840
                irq267: em2:link                       1          0
                irq269: em3:rx0                 23497096        224
                irq270: em3:tx0                 39913436        380
                irq271: em3:link                       1          0
                irq273: em4:rx0                   157084          1
                irq274: em4:tx0                   104642          1
                irq275: em4:link                       1          0
                irq277: pcib8                          1          0
                irq278: em5:rx0                  3537702         34
                irq279: em5:tx0                  3615446         34
                irq280: em5:link                       1          0
                irq281: em6:rx0                 11959127        114
                irq282: em6:tx0                 15965140        152
                irq283: em6:link                       1          0
                irq284: em7:rx0                   421216          4
                irq285: em7:tx0                    21775          0
                irq286: em7:link                       9          0
                Total                         1165385247      11098
                

                In the example I posted above I was simply downloading large files two hosts without bandwidth caps. Where em0 is the WAN interface, and em1 & em5 were where the hosts were residing.

                I will remove what I have entered from the loader.conf, reboot and retry, but rebooting the firewall during office hours is a pain to arrange. I'll get this done this evening.

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  You might try setting:
                  net.isr.bindthreads=1

                  The core affinity might give you better distribution.

                  S 1 Reply Last reply Reply Quote 1
                  • S
                    SimonB256 @stephenw10
                    last edited by

                    Hi,

                    I've set that and rebooted, and will test over the weekend.

                    I might be gong completely along the wrong train of thought, but would net.isr.direct=1 possibly also help?

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      @SimonB256 said in Fabiatech FX5625 improving throughput:

                      net.isr.direct

                      That doesn't exist in FreeBSD after 9 (pfSense 2.4.5 is built on 11.3), that's what net.isr.dispatch: direct does.

                      Steve

                      1 Reply Last reply Reply Quote 0
                      • S
                        SimonB256
                        last edited by

                        Just to update, it appears that I am now getting better throughput after adding net.isr.bindthreads=1.

                        Thank you for your help.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Ah, good to hear. What sort of improvement are you seeing?

                          1 Reply Last reply Reply Quote 0
                          • S
                            SimonB256
                            last edited by

                            In terms of throughput I'm only seeing a 15-20Mbps increase (so we're up to 470Mbps). But we're seeing far less packet loss at the top end of these speeds.

                            Looking further at the kind of traffic we're handling. We're talking around 600-700 flows at any given time (according to ntop I have running elsewhere in the network), and around 15k-20k states listed on the firewall itself.

                            So I imagine for this small device, handling a reasonable amount of small connections at any time might explain why we wouldn't be getting the 600Mbps+ theoretical max.

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Yes that seems reasonable. You would only see >600Mbps using all full size packets.

                              Steve

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.