Fabiatech FX5625 improving throughput
-
I have a Fabiatech FX5625 on a 500Mbps leased line, and to be honest it's struggling with anything over 450Mbps. Where we are seeing about 10% packet loss at rates above 470Mbps.
Can anyone suggest where I should be looking to further tune its performance?
System is running PFSense 2.4.5_1
It has 8 configured interfaces as:
1 x WAN
7 x OPT/LAN
1 x for pfsync to a backup unitThere is no NAT involved.
There is some traffic shaping, but only via limiters/queues.
Rules are generally only on the WAN (still less than 100rules), with less than 10 on each interfaceThe only package installed is bandwidthd, but I can remove it if it would help.
We monitor the device via SNMP, and we can see that its not loaded in terms of:
Memory - 0 swap usage, 2.5Gb free of 4Gb)
CPU - Atom D525, Core0 - sits at 15/20% peaks at 100%, the three remaining cores sit at 5% peak at 40%
Disk IOMore resource info:
Context Switches:
Interrupts:
Load:
Details of the hardware:
(output of pciconf -lv)hostb0@pci0:0:0:0: class=0x060000 card=0xa0008086 chip=0xa0008086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = 'Atom Processor D4xx/D5xx/N4xx/N5xx DMI Bridge' class = bridge subclass = HOST-PCI vgapci0@pci0:0:2:0: class=0x030000 card=0xa0018086 chip=0xa0018086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = 'Atom Processor D4xx/D5xx/N4xx/N5xx Integrated Graphics Controller' class = display subclass = VGA vgapci1@pci0:0:2:1: class=0x038000 card=0xa0018086 chip=0xa0028086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = 'Atom Processor D4xx/D5xx/N4xx/N5xx Integrated Graphics Controller' class = display pcib1@pci0:0:28:0: class=0x060400 card=0x283f8086 chip=0x283f8086 rev=0x04 hdr=0x01 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) PCI Express Port 1' class = bridge subclass = PCI-PCI pcib2@pci0:0:28:1: class=0x060400 card=0x28418086 chip=0x28418086 rev=0x04 hdr=0x01 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) PCI Express Port 2' class = bridge subclass = PCI-PCI pcib3@pci0:0:28:2: class=0x060400 card=0x28438086 chip=0x28438086 rev=0x04 hdr=0x01 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) PCI Express Port 3' class = bridge subclass = PCI-PCI pcib4@pci0:0:28:3: class=0x060400 card=0x28458086 chip=0x28458086 rev=0x04 hdr=0x01 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) PCI Express Port 4' class = bridge subclass = PCI-PCI pcib5@pci0:0:28:4: class=0x060400 card=0x28478086 chip=0x28478086 rev=0x04 hdr=0x01 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) PCI Express Port 5' class = bridge subclass = PCI-PCI pcib6@pci0:0:28:5: class=0x060400 card=0x28498086 chip=0x28498086 rev=0x04 hdr=0x01 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) PCI Express Port 6' class = bridge subclass = PCI-PCI uhci0@pci0:0:29:0: class=0x0c0300 card=0x28308086 chip=0x28308086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB UHCI Controller' class = serial bus subclass = USB uhci1@pci0:0:29:1: class=0x0c0300 card=0x28318086 chip=0x28318086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB UHCI Controller' class = serial bus subclass = USB uhci2@pci0:0:29:2: class=0x0c0300 card=0x28328086 chip=0x28328086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB UHCI Controller' class = serial bus subclass = USB uhci3@pci0:0:29:3: class=0x0c0300 card=0x28338086 chip=0x28338086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB UHCI Controller' class = serial bus subclass = USB ehci0@pci0:0:29:7: class=0x0c0320 card=0x28368086 chip=0x28368086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB2 EHCI Controller' class = serial bus subclass = USB pcib12@pci0:0:30:0: class=0x060401 card=0x24488086 chip=0x24488086 rev=0xf4 hdr=0x01 vendor = 'Intel Corporation' device = '82801 Mobile PCI Bridge' class = bridge subclass = PCI-PCI isab0@pci0:0:31:0: class=0x060100 card=0x28158086 chip=0x28158086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = '82801HM (ICH8M) LPC Interface Controller' class = bridge subclass = PCI-ISA atapci0@pci0:0:31:1: class=0x01018a card=0x28508086 chip=0x28508086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = '82801HM/HEM (ICH8M/ICH8M-E) IDE Controller' class = mass storage subclass = ATA atapci1@pci0:0:31:2: class=0x01018f card=0x28288086 chip=0x28288086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = '82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [IDE mode]' class = mass storage subclass = ATA none0@pci0:0:31:3: class=0x0c0500 card=0x283e8086 chip=0x283e8086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) SMBus Controller' class = serial bus subclass = SMBus em0@pci0:1:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' class = network subclass = ethernet em1@pci0:2:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' class = network subclass = ethernet em2@pci0:3:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' class = network subclass = ethernet em3@pci0:4:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' class = network subclass = ethernet em4@pci0:5:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' class = network subclass = ethernet pcib7@pci0:6:0:0: class=0x060400 card=0x850510b5 chip=0x850510b5 rev=0xaa hdr=0x01 vendor = 'PLX Technology, Inc.' device = 'PEX 8505 5-lane, 5-port PCI Express Switch' class = bridge subclass = PCI-PCI pcib8@pci0:7:1:0: class=0x060400 card=0x850510b5 chip=0x850510b5 rev=0xaa hdr=0x01 vendor = 'PLX Technology, Inc.' device = 'PEX 8505 5-lane, 5-port PCI Express Switch' class = bridge subclass = PCI-PCI pcib9@pci0:7:2:0: class=0x060400 card=0x850510b5 chip=0x850510b5 rev=0xaa hdr=0x01 vendor = 'PLX Technology, Inc.' device = 'PEX 8505 5-lane, 5-port PCI Express Switch' class = bridge subclass = PCI-PCI pcib10@pci0:7:3:0: class=0x060400 card=0x850510b5 chip=0x850510b5 rev=0xaa hdr=0x01 vendor = 'PLX Technology, Inc.' device = 'PEX 8505 5-lane, 5-port PCI Express Switch' class = bridge subclass = PCI-PCI pcib11@pci0:7:4:0: class=0x060400 card=0x850510b5 chip=0x850510b5 rev=0xaa hdr=0x01 vendor = 'PLX Technology, Inc.' device = 'PEX 8505 5-lane, 5-port PCI Express Switch' class = bridge subclass = PCI-PCI em5@pci0:9:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' class = network subclass = ethernet em6@pci0:10:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' class = network subclass = ethernet em7@pci0:11:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' class = network subclass = ethernet
Current loader.conf:
legal.intel_wpi.license_ack=1 legal.intel_ipw.license_ack=1 kern.ipc.somaxconn="4096" hw.intr_storm_threshold="5000" hw.em.fc_setting="0" hw.em.rxd="4096" hw.em.txd="4096" hw.em.tx_int_delay="512" hw.em.rx_int_delay="512" hw.em.tx_abs_int_delay="1024" hw.em.rx_abs_int_delay="1024" autoboot_delay="3" hw.usb.no_pf="1" net.pf.request_maxcount="2000000"
-
The maximum throughput with a D525 is somewhere in the 650Mbps region but that's with ideal test traffic. With real works traffic, mixed packet sizes it will be lower. There may not be that much that can be done here.
What load makes up the 100% usage on one core?
Can we see the output of
top -aSH
at the command line whilst you are seeing maximum throughput?Steve
-
The 100% CPU usage only seems to happen the early hours of the morning. Always at the same time. I'll try to get on to it and take a look remotely tomorrow morning and will post update.
-
After manually chucking some data through to generate this load, the main process responsible is 'intr{irq257: em0:rx0}' with similar processes for the other interfaces alongside it but not quite as high (understandably as em0 is the WAN interface).
Sample output:
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 155 ki31 0K 64K CPU1 1 23.6H 87.26% [idle{idle: cpu1}] 12 root -92 - 0K 832K CPU0 0 233:12 79.73% [intr{irq257: em0:rx0}] 11 root 155 ki31 0K 64K RUN 3 23.9H 76.87% [idle{idle: cpu3}] 11 root 155 ki31 0K 64K CPU2 2 23.2H 49.23% [idle{idle: cpu2}] 12 root -92 - 0K 832K WAIT 2 4:05 34.74% [intr{irq278: em5:rx0}] 0 root -92 - 0K 816K - 3 7:47 20.59% [kernel{em0 rxq (cpuid 0)}] 11 root 155 ki31 0K 64K RUN 0 18.6H 13.41% [idle{idle: cpu0}] 12 root -92 - 0K 832K WAIT 2 51:47 11.30% [intr{irq261: em1:rx0}] 12 root -92 - 0K 832K WAIT 0 107:33 5.89% [intr{irq265: em2:rx0}] 0 root -92 - 0K 816K - 2 23:04 5.05% [kernel{dummynet}] 12 root -92 - 0K 832K WAIT 1 16:14 4.75% [intr{irq258: em0:tx0}] 12 root -92 - 0K 832K WAIT 3 0:16 4.39% [intr{irq279: em5:tx0}] 12 root -92 - 0K 832K WAIT 3 6:09 1.87% [intr{irq262: em1:tx0}] 12 root -92 - 0K 832K WAIT 2 13:49 1.40% [intr{irq269: em3:rx0}] 0 root -92 - 0K 816K - 1 1:41 0.75% [kernel{em5 rxq (cpuid 2)}] 12 root -92 - 0K 832K WAIT 1 15:43 0.58% [intr{irq266: em2:tx0}] 74844 root 20 0 9868K 4700K CPU3 3 0:00 0.53% top -aSH 0 root -92 - 0K 816K - 1 2:42 0.46% [kernel{em1 rxq (cpuid 2)}] 12 root -92 - 0K 832K WAIT 0 11:15 0.42% [intr{irq281: em6:rx0}] 12 root -60 - 0K 832K WAIT 1 3:25 0.27% [intr{swi4: clock (0)}] 12 root -92 - 0K 832K WAIT 3 2:18 0.26% [intr{irq270: em3:tx0}]
Checking things like mbuf et al, and there appears to be plenty of room there:
35554/14801/50355 mbufs in use (current/cache/total) 33501/13093/46594/249500 mbuf clusters in use (current/cache/total/max) 33501/13051 mbuf+clusters out of packet secondary zone in use (current/cache) 0/34/34/124749 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/36962 9k jumbo clusters in use (current/cache/total/max) 0/0/0/20791 16k jumbo clusters in use (current/cache/total/max) 75890K/30022K/105912K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 sendfile syscalls 0 sendfile syscalls completed without I/O request 0 requests for I/O initiated by sendfile 0 pages read by sendfile as part of a request 0 pages were valid at time of a sendfile request 0 pages were requested for read ahead by applications 0 pages were read ahead by sendfile 0 times sendfile encountered an already busy page 0 requests for sfbufs denied 0 requests for sfbufs delayed
Current MBUF limit set as:
[2.4.5-RELEASE][admin@firewall1.midlandcomputers.com]/root: sysctl kern.ipc.nmbclusters kern.ipc.nmbclusters: 249500
-
em uses a single receive and transmit queue so you're unlikely to exhaust the mbufs.
What throughput were you seeing when that was taken?
Between which interfacesWhat throughput do you see without any of those loader variables, just using the em defaults?
What output do you get from
vmstat -i
andsysctl net.isr
Steve
-
Output from
sysctl net.isr
:net.isr.numthreads: 4 net.isr.maxprot: 16 net.isr.defaultqlimit: 256 net.isr.maxqlimit: 10240 net.isr.bindthreads: 0 net.isr.maxthreads: 4 net.isr.dispatch: direct
Output from
vmstat -i
:interrupt total rate irq18: uhci2+ 304106 3 cpu0:timer 108772857 1036 cpu1:timer 68073061 648 cpu2:timer 9281390 88 cpu3:timer 19118159 182 irq257: em0:rx0 194215751 1850 irq258: em0:tx0 229258370 2183 irq259: em0:link 1 0 irq261: em1:rx0 48310327 460 irq262: em1:tx0 82599543 787 irq263: em1:link 1 0 irq265: em2:rx0 113082535 1077 irq266: em2:tx0 193176467 1840 irq267: em2:link 1 0 irq269: em3:rx0 23497096 224 irq270: em3:tx0 39913436 380 irq271: em3:link 1 0 irq273: em4:rx0 157084 1 irq274: em4:tx0 104642 1 irq275: em4:link 1 0 irq277: pcib8 1 0 irq278: em5:rx0 3537702 34 irq279: em5:tx0 3615446 34 irq280: em5:link 1 0 irq281: em6:rx0 11959127 114 irq282: em6:tx0 15965140 152 irq283: em6:link 1 0 irq284: em7:rx0 421216 4 irq285: em7:tx0 21775 0 irq286: em7:link 9 0 Total 1165385247 11098
In the example I posted above I was simply downloading large files two hosts without bandwidth caps. Where em0 is the WAN interface, and em1 & em5 were where the hosts were residing.
I will remove what I have entered from the loader.conf, reboot and retry, but rebooting the firewall during office hours is a pain to arrange. I'll get this done this evening.
-
You might try setting:
net.isr.bindthreads=1The core affinity might give you better distribution.
-
Hi,
I've set that and rebooted, and will test over the weekend.
I might be gong completely along the wrong train of thought, but would
net.isr.direct=1
possibly also help? -
@SimonB256 said in Fabiatech FX5625 improving throughput:
net.isr.direct
That doesn't exist in FreeBSD after 9 (pfSense 2.4.5 is built on 11.3), that's what
net.isr.dispatch: direct
does.Steve
-
Just to update, it appears that I am now getting better throughput after adding
net.isr.bindthreads=1
.Thank you for your help.
-
Ah, good to hear. What sort of improvement are you seeing?
-
In terms of throughput I'm only seeing a 15-20Mbps increase (so we're up to 470Mbps). But we're seeing far less packet loss at the top end of these speeds.
Looking further at the kind of traffic we're handling. We're talking around 600-700 flows at any given time (according to ntop I have running elsewhere in the network), and around 15k-20k states listed on the firewall itself.
So I imagine for this small device, handling a reasonable amount of small connections at any time might explain why we wouldn't be getting the 600Mbps+ theoretical max.
-
Yes that seems reasonable. You would only see >600Mbps using all full size packets.
Steve