Traffic between 2 interfaces
-
Hi guys!
Strange problem (I myself do not think of it as problem :)).
I have latest pfSense 2.1 and 3 em NICs in it.
1 is WAN other 2 are for 2 separated LANs…I can transfer with speed of 50 MB/s between those interfaces, I have also enabled HW TCP offloading and large send size in advanced options.
I see no errors on file transfer or service usage between those subnets.
Now funny thing...
See attached image, it is giving me errors.
I was digging in and found this:
dev.em.0.mac_stats.missed_packets: 27086
This number matched to number displayed as error in in attached file...Now I believe it has to do with buffers because I`m sending 3GB+ file over network...
Can I somehow eliminate this "errors"?Thanks!
-
Missed packets just mean that some temporary resource shortage or error caused the packet to be dropped.
I would speculate that in trying to transfer a large amount of data between two Gigabit interfaces at some point your Atom runs out of cpu cycles and causes the packets to be dropped.
What does 'top -SH' show under those conditions?
Steve
-
Hi!
This is what bothers me :)
CPU is not fully utilised, so I`m guessing PCI bandwidth?TOP output:
last pid: 46950; load averages: 0.80, 0.54, 0.31 up 0+10:20:55 20:53:42 344 processes: 3 running, 316 sleeping, 25 waiting CPU: 6.5% user, 20.2% nice, 9.2% system, 36.7% interrupt, 27.3% idle Mem: 965M Active, 498M Inact, 182M Wired, 300K Cache, 112M Buf, 1586M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root -68 - 0K 200K WAIT 0 13:23 56.49% intr{irq256: em0:rx 0} 38224 root 76 20 595M 365M bpf 1 85:36 40.87% snort{snort} 11 root 171 ki31 0K 16K RUN 1 516:57 40.77% idle{idle: cpu1} 11 root 171 ki31 0K 16K CPU0 0 578:41 19.87% idle{idle: cpu0} 0 root -68 0 0K 112K - 0 9:58 10.50% kernel{em2 taskq} 12 root -68 - 0K 200K WAIT 0 1:35 10.35% intr{irq259: em1:rx 0} 33610 root 63 0 76820K 23680K nanslp 1 0:08 9.08% php 39747 root 45 0 78996K 30516K accept 1 0:19 1.86% php{php} 12 root -68 - 0K 200K WAIT 0 0:17 1.66% intr{irq260: em1:tx 0} 12 root -68 - 0K 200K WAIT 0 1:02 0.98% intr{irq257: em0:tx 0} 41587 proxy 64 20 345M 338M select 1 2:04 0.00% squid 258 root 76 20 3352K 1152K kqread 1 1:38 0.00% check_reload_status 14 root -16 - 0K 8K - 0 1:15 0.00% yarrow 63680 root 64 20 228M 206M select 1 1:12 0.00% clamd{clamd} 12 root -32 - 0K 200K WAIT 0 1:03 0.00% intr{swi4: clock} 0 root -16 0 0K 112K sched 0 0:37 0.00% kernel{swapper} 77111 root 44 0 3412K 1424K select 0 0:34 0.00% syslogd
Regards,
M -
Umm OK, one more thing…
From my PC to other subnet there are errors...
From other subnet to my PC there are NO errors...
Huh?
Brand new switch and brand new network cards??TOP output from other direction file copy:
last pid: 2792; load averages: 1.16, 0.82, 0.52 up 0+10:27:15 21:00:02 350 processes: 10 running, 318 sleeping, 1 zombie, 21 waiting CPU: 6.0% user, 42.9% nice, 3.0% system, 48.1% interrupt, 0.0% idle Mem: 970M Active, 500M Inact, 181M Wired, 300K Cache, 112M Buf, 1580M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root -68 - 0K 200K CPU0 0 2:57 74.85% intr{irq259: em1:rx 0} 11 root 171 ki31 0K 16K RUN 1 520:09 34.47% idle{idle: cpu1} 38224 root 76 20 595M 366M RUN 1 87:43 32.37% snort{snort} 12 root -68 - 0K 200K RUN 0 13:58 13.57% intr{irq256: em0:rx 0} 0 root -68 0 0K 112K - 1 10:28 10.89% kernel{em2 taskq} 31020 root 76 0 76820K 23680K accept 1 0:05 7.86% php 11 root 171 ki31 0K 16K RUN 0 582:16 4.30% idle{idle: cpu0} 12 root -68 - 0K 200K RUN 0 1:07 3.17% intr{irq257: em0:tx 0} 35122 root 46 0 76820K 28496K accept 1 0:07 2.20% php 1038 root 116 20 70676K 11608K RUN 0 0:00 0.98% php 12 root -68 - 0K 200K RUN 0 0:19 0.49% intr{irq260: em1:tx 0} 41587 proxy 64 20 345M 339M select 1 2:05 0.00% squid
It seems that CPU utilisation is higher than before, but no packets are missed…
One more thing poped up to me..
Both NICs are integrated to MB if that makes any more sense... -
Are they PCI or PCIe?
Steve
-
This is the MB: http://ark.intel.com/products/56462/Intel-Desktop-Board-D2500CC
I would say PCI + 1 PCI intel NIC (WAN PPPoE) is installed in 1 free slot…
-
I would check the cable, this would explain the errors in one direction. A poorly terminated cable can get very noise at the ends especially when there is a lot of traffic moving through that cable. Also if both your nics are Gigabit you should be able to get much higher transfer rates depending the read and write targets. If they are SSDs this should be no problem. You state that you are getting 50MB/s is that to say 400Mbps or do you mean 50Mbps? I know that there is some over head for routing but I would think that you could get closer to 1000Mbps. I have noticed from Windows to Windows speeds are much faster than from BSD to Windows or Linux to Windows. I would imagine this would be the same from BSD to BSD and Linux to Linux.
Worst case senario if your NICs are PCI that means they are sharing bandwidth over the south bridge but the Legacy speed of the that would be no less than 133MB/s or 1064Mbps. They should be the only things using that bus. I would bet money thought that those NICs on your board are PCIe. -
Yes, I`m getting around 500 Mbit/s which give me around 60 MB/s…
I think that NICs are PCI and share 133 MB/s so one NIC upload 60 MB/s other download 60 MB/s that equals around 120 MB/s troughput...
Which would make sense...Cable is brand new as well, I just tried 2 different ones.
In same network from client to server and vice-versa I get 112 MB/s download or upload...
So cables are fine, switch is fine all is fine :)What I will try to do is:
switch on LAN iface with WAN (WAN is on another PCI card on its own bus) and then try again...
BDW, you saw link about MB I posted?
There is no mention of PCIe anywhere... -
Yes if they're both sharing the same bus then you could be hitting a limit there. I never quite got my head around what the maximum transfer speed between two NICs on a pci bus is. :-
Even with PCIe NICs you won't see more than 6-700Mbps through an Atom.Steve
-
The on-board NICs are PCIe, not PCI.
-
Ummm OK…
I tried also with iperf.
Lets say Computer A is on LAN1 iterface and Computer B on OPT1 interface.
Testing iperf - Computer A is server:
iperf -c 10.10.0.90 -t 60 -i 10 -f MB/s
–----------------------------------------------------------
Client connecting to 10.10.0.90, TCP port 5001
TCP window size: 0.06 MByte (default)[156] local 172.16.16.230 port 49213 connected with 10.10.0.90 port 5001
[ ID] Interval Transfer Bandwidth
[156] 0.0-10.0 sec 559 MBytes 55.9 MBytes/sec
[156] 10.0-20.0 sec 561 MBytes 56.1 MBytes/sec
[156] 20.0-30.0 sec 561 MBytes 56.1 MBytes/sec
[156] 30.0-40.0 sec 560 MBytes 56.0 MBytes/sec
[156] 40.0-50.0 sec 560 MBytes 56.0 MBytes/sec
[156] 50.0-60.0 sec 561 MBytes 56.1 MBytes/sec
[156] 0.0-60.0 sec 3363 MBytes 56.0 MBytes/secNow Computer B is server:
iperf -c 172.16.16.230 -t 60 -i 10 -f MB/s ------------------------------------------------------------ Client connecting to 172.16.16.230, TCP port 5001 TCP window size: 0.06 MByte (default) ------------------------------------------------------------ [156] local 10.10.0.90 port 59164 connected with 172.16.16.230 port 5001 [ ID] Interval Transfer Bandwidth [156] 0.0-10.0 sec 497 MBytes 49.7 MBytes/sec [156] 10.0-20.0 sec 467 MBytes 46.7 MBytes/sec [156] 20.0-30.0 sec 476 MBytes 47.6 MBytes/sec [156] 30.0-40.0 sec 512 MBytes 51.2 MBytes/sec [156] 40.0-50.0 sec 514 MBytes 51.4 MBytes/sec [156] 50.0-60.0 sec 500 MBytes 50.0 MBytes/sec [156] 0.0-60.0 sec 2966 MBytes 49.4 MBytes/sec
and
iperf -c 172.16.16.230 -t 60 -i 10 -f MB/s ------------------------------------------------------------ Client connecting to 172.16.16.230, TCP port 5001 TCP window size: 0.06 MByte (default) ------------------------------------------------------------ [156] local 10.10.0.90 port 59202 connected with 172.16.16.230 port 5001 [ ID] Interval Transfer Bandwidth [156] 0.0-10.0 sec 536 MBytes 53.6 MBytes/sec [156] 10.0-20.0 sec 506 MBytes 50.6 MBytes/sec [156] 20.0-30.0 sec 491 MBytes 49.1 MBytes/sec [156] 30.0-40.0 sec 455 MBytes 45.5 MBytes/sec [156] 40.0-50.0 sec 537 MBytes 53.7 MBytes/sec write failed: Connection reset by peer read on server close failed: Connection reset by peer [156] 0.0-77.4 sec 2928 MBytes 37.8 MBytes/sec
So if Computer B acts as a server I get different results and bunch of missed packets… If Computer A is server there are absolutely no errors...
Well I have to admit, I`m little confused here...2 same NICs on same swtch, same settings for both of them (firewall rules are different) and in one direction OK in other not OK...
I just don`t get it :) -
:)
Enabled Fastforwarding here are results which are pretty good now :)
Close to Atom limitiperf -c 172.16.16.230 -t 60 -i 5 -f MB/s ------------------------------------------------------------ Client connecting to 172.16.16.230, TCP port 5001 TCP window size: 0.06 MByte (default) ------------------------------------------------------------ [176] local 10.10.0.60 port 58939 connected with 172.16.16.230 port 5001 [ ID] Interval Transfer Bandwidth [176] 0.0- 5.0 sec 288 MBytes 57.6 MBytes/sec [176] 5.0-10.0 sec 311 MBytes 62.2 MBytes/sec [176] 10.0-15.0 sec 331 MBytes 66.3 MBytes/sec [176] 15.0-20.0 sec 326 MBytes 65.2 MBytes/sec [176] 20.0-25.0 sec 277 MBytes 55.4 MBytes/sec [176] 25.0-30.0 sec 327 MBytes 65.3 MBytes/sec [176] 30.0-35.0 sec 312 MBytes 62.4 MBytes/sec [176] 35.0-40.0 sec 331 MBytes 66.1 MBytes/sec [176] 40.0-45.0 sec 288 MBytes 57.7 MBytes/sec [176] 45.0-50.0 sec 320 MBytes 64.0 MBytes/sec [176] 50.0-55.0 sec 295 MBytes 59.1 MBytes/sec [176] 55.0-60.0 sec 322 MBytes 64.3 MBytes/sec [176] 0.0-60.0 sec 3728 MBytes 62.1 MBytes/sec
-
Interesting result in itself. :)
The two cards you are using for LAN and OPT1 are different though yes?
They probably have either different or differently supported hardware off loading features. One of those cards is failing to keep up under load. You could try disabling various off-load settings in System: Advanced: Networking: and see what happens. You would expect thoughput to be reduced but if the support is broken as it says on that page then it may actually speed up.
Or, since it's doing you no harm, you could just live with it. ;)Steve
-
Hi!
Cards are both on board Intel, they are the same:
em0@pci0:2:0:0: class=0x020000 card=0x202c8086 chip=0x10d38086 rev=0x00 hdr=0x00 cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled em1@pci0:1:0:0: class=0x020000 card=0x202c8086 chip=0x10d38086 rev=0x00 hdr=0x00 cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
I tried different combos of offloading on/off I even enabled device pooling and I also raised send and receive buffers…
All same results. After fastforwarding = 1, things are working a lot faster...
And I think that it hasn`t broke anything... I will have to wait and see :)I think I will have to live with it, as long packets are missed and not failed and as long there is no actual errors it should be fine :)
-
Huh, well that's just odd then. Perhaps one of those cards is sharing an IRQ with some thing that's misbehaving slightly. They are MSI/X capable but are they using that?
IP fastforwarding will break IPSec if you are using that. I've had it enabled on my home box for a good while now with no ill effects.
Steve
-
MSI/X, how to verify if used?
-
Furthermore:
dmesg | grep irq ioapic0 <version 2.0=""> irqs 0-23 on motherboard vgapci0: <vga-compatible display=""> port 0x40d0-0x40d7 mem 0xd0300000-0xd03fffff irq 16 at device 2.0 on pci0 em0: <intel(r) 1000="" pro="" network="" connection="" 7.3.2=""> port 0x3000-0x301f mem 0xd0220000-0xd023ffff,0xd0200000-0xd021ffff,0xd0240000-0xd0243fff irq 16 at device 0.0 on pci2 em1: <intel(r) 1000="" pro="" network="" connection="" 7.3.2=""> port 0x2000-0x201f mem 0xd0120000-0xd013ffff,0xd0100000-0xd011ffff,0xd0140000-0xd0143fff irq 17 at device 0.0 on pci1 uhci0: <intel 82801g="" (ich7)="" usb="" controller="" usb-a=""> port 0x40a0-0x40bf irq 23 at device 29.0 on pci0 uhci1: <intel 82801g="" (ich7)="" usb="" controller="" usb-b=""> port 0x4080-0x409f irq 19 at device 29.1 on pci0 uhci2: <intel 82801g="" (ich7)="" usb="" controller="" usb-c=""> port 0x4060-0x407f irq 18 at device 29.2 on pci0 uhci3: <intel 82801g="" (ich7)="" usb="" controller="" usb-d=""> port 0x4040-0x405f irq 16 at device 29.3 on pci0 ehci0: <intel 82801gb="" r="" (ich7)="" usb="" 2.0="" controller=""> mem 0xd0404400-0xd04047ff irq 23 at device 29.7 on pci0 em2: <intel(r) 1000="" pro="" legacy="" network="" connection="" 1.0.4=""> port 0x1000-0x103f mem 0xd0020000-0xd003ffff,0xd0000000-0xd001ffff irq 20 at device 0.0 on pci3 atapci0: <intel ich7="" sata300="" controller=""> port 0x40c8-0x40cf,0x40dc-0x40df,0x40c0-0x40c7,0x40d8-0x40db,0x4020-0x402f mem 0xd0404000-0xd04043ff irq 19 at device 31.2 on pci0 atrtc0: <at realtime="" clock=""> port 0x70-0x77 irq 8 on acpi0 ppc0: <parallel port=""> port 0x378-0x37f irq 7 on acpi0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 3 flags 0x10 on acpi0 uart1: <16550 or compatible> port 0x2f8-0x2ff irq 4 on acpi0 uart2: <16550 or compatible> port 0x2e8-0x2ef irq 3 on acpi0 ata0: <ata channel=""> at port 0x1f0-0x1f7,0x3f6 irq 14 on isa0 ata1: <ata channel=""> at port 0x170-0x177,0x376 irq 15 on isa0 atkbd0: <at keyboard=""> irq 1 on atkbdc0</at></ata></ata></parallel></at></intel></intel(r)></intel></intel></intel></intel></intel></intel(r)></intel(r)></vga-compatible></version>
and snippet from above for em0 (problematic interface)
dmesg | grep irq\ 16 vgapci0: <vga-compatible display="">port 0x40d0-0x40d7 mem 0xd0300000-0xd03fffff irq 16 at device 2.0 on pci0 em0: <intel(r) 1000="" pro="" network="" connection="" 7.3.2="">port 0x3000-0x301f mem 0xd0220000-0xd023ffff,0xd0200000-0xd021ffff,0xd0240000-0xd0243fff irq 16 at device 0.0 on pci2 uhci3: <intel 82801g="" (ich7)="" usb="" controller="" usb-d="">port 0x4040-0x405f irq 16 at device 29.3 on pci0</intel></intel(r)></vga-compatible>
and em1:
dmesg | grep irq\ 17 em1: <intel(r) 1000="" pro="" network="" connection="" 7.3.2="">port 0x2000-0x201f mem 0xd0120000-0xd013ffff,0xd0100000-0xd011ffff,0xd0140000-0xd0143fff irq 17 at device 0.0 on pci1</intel(r)>
and em2:
dmesg | grep irq\ 20 em2: <intel(r) 1000="" pro="" legacy="" network="" connection="" 1.0.4="">port 0x1000-0x103f mem 0xd0020000-0xd003ffff,0xd0000000-0xd001ffff irq 20 at device 0.0 on pci3</intel(r)>
So em0 is sharing IRQ with VGA and USB controller?
This would explain why em0 is dropping things in my opinion…
em1 and em2 are not sharing IRQ :) -
Yes that seems like a likely suspect.
On a box I have here:
[2.1-RELEASE][root@pfsense.localdomain]/root(4): cat /var/log/dmesg.boot | grep MSI em0: Using MSIX interrupts with 3 vectors em1: Using MSIX interrupts with 3 vectors em2: Using MSIX interrupts with 3 vectors em3: Using MSIX interrupts with 3 vectors em4: Using MSIX interrupts with 3 vectors em5: Using MSIX interrupts with 3 vectors
Yet at the same time:
[2.1-RELEASE][root@pfsense.localdomain]/root(14): cat /var/log/dmesg.boot | grep irq em0: <intel(r) 1000="" pro="" network="" connection="" 7.3.2="">port 0x9c00-0x9c1f mem 0xfe6e0000-0xfe6fffff,0xfe6dc000-0xfe6dffff irq 16 at device 0.0 on pci2 em1: <intel(r) 1000="" pro="" network="" connection="" 7.3.2="">port 0xac00-0xac1f mem 0xfe7e0000-0xfe7fffff,0xfe7dc000-0xfe7dffff irq 17 at device 0.0 on pci3 em2: <intel(r) 1000="" pro="" network="" connection="" 7.3.2="">port 0xbc00-0xbc1f mem 0xfe8e0000-0xfe8fffff,0xfe8dc000-0xfe8dffff irq 18 at device 0.0 on pci4 em3: <intel(r) 1000="" pro="" network="" connection="" 7.3.2="">port 0xcc00-0xcc1f mem 0xfe9e0000-0xfe9fffff,0xfe9dc000-0xfe9dffff irq 19 at device 0.0 on pci5 em4: <intel(r) 1000="" pro="" network="" connection="" 7.3.2="">port 0xdc00-0xdc1f mem 0xfeae0000-0xfeafffff,0xfeadc000-0xfeadffff irq 16 at device 0.0 on pci6 em5: <intel(r) 1000="" pro="" network="" connection="" 7.3.2="">port 0xec00-0xec1f mem 0xfebe0000-0xfebfffff,0xfebdc000-0xfebdffff irq 17 at device 0.0 on pci7</intel(r)></intel(r)></intel(r)></intel(r)></intel(r)></intel(r)>
I expect to see much higher numbered IRQs if it was really using them. More vmstat shows:
[2.1-RELEASE][root@pfsense.localdomain]/root(21): vmstat -i interrupt total rate irq4: uart0 515 0 irq14: ata0 79091 0 irq20: fxp0 847274 1 irq23: uhci0 ehci0 199142380 235 cpu0: timer 335017795 396 irq265: em3:rx 0 335218 0 irq266: em3:tx 0 334731 0 irq267: em3:link 2 0 cpu1: timer 335017323 396 Total 870774329 1029
Only em3 is connected on that box.
Steve
-
I guess I use MSIX as well:
em0: Using MSIX interrupts with 3 vectors em1: Using MSIX interrupts with 3 vectors
[2.1-RELEASE][root@gateway.rasca.local]/root(3): vmstat -i interrupt total rate irq19: uhci1+ 20856 13 irq20: em2 712364 453 cpu0: timer 3125735 1990 irq256: em0:rx 0 1179348 751 irq257: em0:tx 0 1214662 773 irq258: em0:link 1625 1 irq259: em1:rx 0 758263 482 irq260: em1:tx 0 871653 555 irq261: em1:link 2919 1 cpu1: timer 3105724 1978 Total 10993149 7002
So both em0 and em1 are OK.
But IRQs are not OK in my opinion…I have em2 as WAN, which is on FTTH 20/20 so max troughput is 40MBit/s.
I will move problematic em0 to WAN and use em2 instead of em0 for LAN.I think this move should solve my problem.
I have to try 40 MBit/s limiter with iperf and if no errors em0 should handle my WAN just fine right?
-
- I will disable audio/usb/serial and this should give me few IRQs I need :)