No buffer space - solution for Via board with 5 NICs
-
After banging my head against this for a couple of days, I'm hoping my notes can help someone else in the future – and maybe someone will have extra info and insights to add!!
I have a 2yo Via C7 board that is a 1GHz CPU that has a 4 port Soekris NIC in it. To the best of my knowledge it has all the BIOS things turned off for Audio, video etc that make sense to turn off. More info:
CPU: VIA C7 Esther+RNG+AES+AES-CTR+SHA1+SHA256+RSA (999.90-MHz 686-class CPU) Origin = "CentaurHauls" Id = 0x6a9 Stepping = 9 Features=0xa7c9bbff <fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,cmov,pat,clflush,acpi,mmx,fxsr,sse,sse2,tm,pbe>Features2=0x181 <sse3,est,tm2>real memory = 1005518848 (958 MB)</sse3,est,tm2></fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,cmov,pat,clflush,acpi,mmx,fxsr,sse,sse2,tm,pbe>
For the past few days I chased down weird network problems that boiled down to the LAN interface was dropping packets.
This system has two WANs NOT configured for failover or sharing. Rather WAN-LAN are tied to a 10mbit (both ways) bonded DSL pipe. WAN2-VOIP ports are used for voip on a much slower 3mbit x 768k link. Before narrowing down the symptoms we were noticing network slowness and random problems. Removing individual devices from the network seemed to fix things for a while, as did resetting the switches, so we were distracted by that before finding the real problem.
I was finally able to definitively diagnose when I logged into ssh on pfsense and started pinging devices on my LAN. Then I got the "No buffer space" message while pinging. I also saw in dmesg output that the interface was being brought up and down by the watchdog timer. CPU sits around 5-10%. Memory usage is minimal.
The fix was to move the LAN to the onboard NIC. Since making that change, all is good. My guess is that some interrupt was being overloaded and that the onboard NIC is on a different interrupt and/or behaves better.
I welcome any insights or comments!
In case it helps, here is more dmesg info about the ethernet ports:
vr0: <via 10="" vt6102="" rhine="" ii="" 100basetx="">port 0xf200-0xf2ff mem 0xfdffe000-0xfdffe0ff irq 23 at device 18. 0 on pci0 vr0: Quirks: 0x0 vr0: Revision: 0x78 miibus0: <mii bus="">on vr0 ukphy0: <generic ieee="" 802.3u="" media="" interface="">PHY 1 on miibus0 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto vr0: Ethernet address: 00:19:db:4f:3c:96 vr0: [ITHREAD] pcib2: <pci-pci bridge="">at device 20.0 on pci0 pci2: <pci bus="">on pcib2 sis0: <natsemi 10="" dp8381[56]="" 100basetx="">port 0xde00-0xdeff mem 0xfdeff000-0xfdefffff irq 17 at device 0.0 on pci2 sis0: Silicon Revision: DP83816A miibus1: <mii bus="">on sis0 ukphy1: <generic ieee="" 802.3u="" media="" interface="">PHY 0 on miibus1 ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto sis0: Ethernet address: 00:00:24:c9:43:80 sis0: [ITHREAD] sis1: <natsemi 10="" dp8381[56]="" 100basetx="">port 0xdc00-0xdcff mem 0xfdefe000-0xfdefefff irq 18 at device 1.0 on pci2 sis1: Silicon Revision: DP83816A miibus2: <mii bus="">on sis1 ukphy2: <generic ieee="" 802.3u="" media="" interface="">PHY 0 on miibus2 ukphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto sis1: Ethernet address: 00:00:24:c9:43:81 sis1: [ITHREAD] sis2: <natsemi 10="" dp8381[56]="" 100basetx="">port 0xda00-0xdaff mem 0xfdefd000-0xfdefdfff irq 19 at device 2.0 on pci2 sis2: Silicon Revision: DP83816A miibus3: <mii bus="">on sis2 ukphy3: <generic ieee="" 802.3u="" media="" interface="">PHY 0 on miibus3 ukphy3: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto sis2: Ethernet address: 00:00:24:c9:43:82 sis2: [ITHREAD] sis3: <natsemi 10="" dp8381[56]="" 100basetx="">port 0xd800-0xd8ff mem 0xfdefc000-0xfdefcfff irq 16 at device 3.0 on pci2 sis3: Silicon Revision: DP83816A miibus4: <mii bus="">on sis3 ukphy4: <generic ieee="" 802.3u="" media="" interface="">PHY 0 on miibus4 ukphy4: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto sis3: Ethernet address: 00:00:24:c9:43:83 sis3: [ITHREAD]</generic></mii></natsemi></generic></mii></natsemi></generic></mii></natsemi></generic></mii></natsemi></pci></pci-pci></generic></mii></via>
-
What version of pfSense?
Perhaps the problem is fixed in pfSense 1.2.3-RC3 or one of the snapshot builds.
-
The latest 'production' release, 1.2.2. Non-embedded version (though running off a SSD).
-
I suggest you try 1.2.3-RC3 since the hardware support is more up to date. See http://blog.pfsense.org/?p=497 for the announcement.
I'm guessing that the "no buffer space" message comes from the kernel and indicates kernel memory exhaustion. That it happens when a sis NIC is the LAN interface and doesn't happen when vr0 is the LAN NIC suggests the sis driver might have a memory leak.
-
Thanks for you thoughts! From what I understand that NIC driver has been around a long time … but who knows! Strange that this has worked successfully for almost 2 years. However perhaps we've had more more traffic recently and that's begun to be a problem. I still like my interrupt theory ;)
Yes, I'm looking forward to upgrading once 1.2.3 is done!
In the meantime, we're looking to upgrade the hardware. We'll bite the bullet and go for a real (albiet old) server and Intel gigabit NICs, along with fans to fail and a big UPS because it will use real power. This should give us enough margin for growth and error that we shouldn't lose hours of productivity again...