Errors on LAN interface

ctirado

Hey folks

I've been lurking in the forum but recently took the plunge and replaced my Tomato based router with a pFsense machine. I am running:

2.2.5-RELEASE (amd64)
built on Wed Nov 04 15:49:37 CST 2015
FreeBSD 10.1-RELEASE-p24

on an Intel(R) Atom(TM) CPU D2550 @ 1.86GHz with 4 Gigs of RAM and a 128 Gig SSD. The NICs are Broadcom 57788 and there are two of them.

The reason I am writing is that I am seeing some errors on the LAN interface (bge1). 140 errors to be exact. After a bit of reading I tried running:

sysctl -a | grep .bge.

dev.bge.1.%desc: Broadcom BCM57780 A1, ASIC rev. 0x57780001
dev.bge.1.%driver: bge
dev.bge.1.%location: slot=0 function=0 handle=_SB_.PCI0.RP03.PXSX
dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1691 subvendor=0x14e4 subdevice=0x9691 class=0x020000
dev.bge.1.%parent: pci2
dev.bge.1.forced_collapse: 0
dev.bge.1.msi: 1
dev.bge.1.forced_udpcsum: 0
dev.bge.1.stats.FramesDroppedDueToFilters: 0
dev.bge.1.stats.DmaWriteQueueFull: 0
dev.bge.1.stats.DmaWriteHighPriQueueFull: 0
dev.bge.1.stats.NoMoreRxBDs: 8
dev.bge.1.stats.InputDiscards: 132
dev.bge.1.stats.InputErrors: 0
dev.bge.1.stats.RecvThresholdHit: 0
dev.bge.1.stats.rx.ifHCInOctets: 4683947508
dev.bge.1.stats.rx.Fragments: 0
dev.bge.1.stats.rx.UnicastPkts: 27816240
dev.bge.1.stats.rx.MulticastPkts: 42
dev.bge.1.stats.rx.BroadcastPkts: 196561
dev.bge.1.stats.rx.FCSErrors: 0
dev.bge.1.stats.rx.AlignmentErrors: 0
dev.bge.1.stats.rx.xonPauseFramesReceived: 0
dev.bge.1.stats.rx.xoffPauseFramesReceived: 0
dev.bge.1.stats.rx.ControlFramesReceived: 0
dev.bge.1.stats.rx.xoffStateEntered: 0
dev.bge.1.stats.rx.FramesTooLong: 0
dev.bge.1.stats.rx.Jabbers: 0
dev.bge.1.stats.rx.UndersizePkts: 0
dev.bge.1.stats.tx.ifHCOutOctets: 72213082361
dev.bge.1.stats.tx.Collisions: 0
dev.bge.1.stats.tx.XonSent: 0
dev.bge.1.stats.tx.XoffSent: 0
dev.bge.1.stats.tx.InternalMacTransmitErrors: 0
dev.bge.1.stats.tx.SingleCollisionFrames: 0
dev.bge.1.stats.tx.MultipleCollisionFrames: 0
dev.bge.1.stats.tx.DeferredTransmissions: 0
dev.bge.1.stats.tx.ExcessiveCollisions: 0
dev.bge.1.stats.tx.LateCollisions: 0
dev.bge.1.stats.tx.UnicastPkts: 55120810
dev.bge.1.stats.tx.MulticastPkts: 3
dev.bge.1.stats.tx.BroadcastPkts: 5028
dev.bge.1.wake: 0

It seems to be a problem with running out receive buffers if I am understanding what I have read correctly. Would changing kern.ipc.maxsockbuf be the right tunable to change? It is currently at 4262144. I know that 140 errors isn't a lot but if there is something I can do to fix it, I would love to do so. Thanks folks.

Carlos

ctirado

Can anyone shed some light on how to alleviate these errors? They have now increased:

dev.bge.1.stats.NoMoreRxBDs: 10
dev.bge.1.stats.InputDiscards: 324

I have increased kern.ipc.maxsockbuf to 16777216 but that didn't seem to help things. I even replaced the switch with a different one and changed the patch cable connecting the two, even though this doesn't seem to be a layer 1 or 2 issue. Any suggestions or is this just what I can expect because of the Broadcom NIC controller in this box?

Carlos

KOM

While I can't offer specific advice for your issue, I'm thinking 140 dropped frames aren't really that big a deal, depending on your sample size of course. Others with this issue seem to have very large counts for this stat. Do you notice any performance issues? YOu could try to rule out hardware by reassigning the interfaces and swapping the cables, making WAN -> LAN and vice-versa. See if the errors follow the NIC/cable or the network. How many clients on LAN? Is it possible the frame is being discarded because it's malformed from the source, or in transit?

ctirado

First of all, thanks for the reply KOM.

Yes the number is quite small, especially in the context of the number of packets processed (348 packets discarded out of 56,202,561 unicast packets; the percentage is so small that it comes out in scientific notation on the Windows calculator.) I was just concerned that it was a sign of a much greater issue as on other threads users stated that any number of errors was cause for alarm.

I haven't noticed any issues and the LAN is quite small (2 PCs, 1 Smart TV, 1 PS3 and one WAP serving an 2 iPads and 2 Android devices.) I do use a Powerline network to link my upstairs office to the downstairs entertainment center, where the pfSense box is. Since the discards seem to be related to running out of receive buffers on the LAN and dev.bge.1.stats.InputErrors: has remained at 0, I didn't think we were looking at a bad frame issue. I will try your idea of swapping the WAN and LAN and see if the errors follow. I guess in the grand scheme of things, its probably not something to worry about.

Thanks again for your input.

Carlos

cmb

348 out of 56+ million is safe to ignore. Not even worth looking into at that low of a rate IMO, given you have no noticeable problems.

whosmatt

FWIW i was getting similar on my own pfsense install; tried a different physical interface, different switch ports, different ethernet cable. Problem was resolved by replacing the switch. The errors were annoying to see, but did not affect performance in any measurable way. I don't recall the error rate now, but it was ~6000 errors over about 5 weeks of uptime.

Harvy66

During most of the year, I sit around 0 errors, but around winter, the air gets dry and the errors start to go up. At one point I went to plug in an Ethernet cable into my switch, during which time I so happened to have my error counts opened, I got a static shock and I saw the count go up. I get zapped a lot in the winter.

ljorgensen

@ctirado:

The reason I am writing is that I am seeing some errors on the LAN interface (bge1). 140 errors to be exact.

I'm seeing the same thing here, also on a bge interface. This is in a corporate environment with a lot of traffic, but the error rates are ridiculous:

[2.2.5-RELEASE][admin@pfsense-bitarkiv.kb.dk]/root: sysctl -a | grep dev.bge.0
dev.bge.0.%desc: HP Ethernet 1Gb 4-port 331i Adapter, ASIC rev. 0x5719001
dev.bge.0.%driver: bge
dev.bge.0.%location: slot=0 function=0 handle=_SB_.PCI0.PEX4.EMB1
dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1657 subvendor=0x103c subdevice=0x22be class=0x020000
dev.bge.0.%parent: pci2
dev.bge.0.forced_collapse: 0
dev.bge.0.msi: 1
dev.bge.0.forced_udpcsum: 0
dev.bge.0.stats.FramesDroppedDueToFilters: 0
dev.bge.0.stats.DmaWriteQueueFull: 0
dev.bge.0.stats.DmaWriteHighPriQueueFull: 0
dev.bge.0.stats.NoMoreRxBDs: 2812
dev.bge.0.stats.InputDiscards: 796042
dev.bge.0.stats.InputErrors: 0
dev.bge.0.stats.RecvThresholdHit: 0
dev.bge.0.stats.rx.ifHCInOctets: 350015645
dev.bge.0.stats.rx.Fragments: 0
dev.bge.0.stats.rx.UnicastPkts: 6926
dev.bge.0.stats.rx.MulticastPkts: 1882614
dev.bge.0.stats.rx.BroadcastPkts: 1494008
dev.bge.0.stats.rx.FCSErrors: 0
dev.bge.0.stats.rx.AlignmentErrors: 0
dev.bge.0.stats.rx.xonPauseFramesReceived: 0
dev.bge.0.stats.rx.xoffPauseFramesReceived: 0
dev.bge.0.stats.rx.ControlFramesReceived: 0
dev.bge.0.stats.rx.xoffStateEntered: 0
dev.bge.0.stats.rx.FramesTooLong: 0
dev.bge.0.stats.rx.Jabbers: 0
dev.bge.0.stats.rx.UndersizePkts: 0
dev.bge.0.stats.tx.ifHCOutOctets: 1245085
dev.bge.0.stats.tx.Collisions: 0
dev.bge.0.stats.tx.XonSent: 0
dev.bge.0.stats.tx.XoffSent: 0
dev.bge.0.stats.tx.InternalMacTransmitErrors: 0
dev.bge.0.stats.tx.SingleCollisionFrames: 0
dev.bge.0.stats.tx.MultipleCollisionFrames: 0
dev.bge.0.stats.tx.DeferredTransmissions: 0
dev.bge.0.stats.tx.ExcessiveCollisions: 0
dev.bge.0.stats.tx.LateCollisions: 0
dev.bge.0.stats.tx.UnicastPkts: 7036
dev.bge.0.stats.tx.MulticastPkts: 4
dev.bge.0.stats.tx.BroadcastPkts: 36

At the time of writing, the NIC has received 7,475 packets and experienced 798,854 input errors. I have tried changing the cable but the situation is the same. I don't suspect the switch since it is a relatively new HP A5120 gigabit switch and nothing else on it is reporting errors. I don't see any errors on the switch side but that doesn't tell me anything as the errors are on the receiving side of the pfSense box.

ctirado

Hmm… This could be a coincidence but you're having the exact same problem: NoMoreRxBDs showing that the NIC port has run out of buffers but with no Input, FCS or Alignment errors. That should rule out any layer 1 problems. We're also both using the same pfsense release and using the same driver. Does anyone know if something changed with 2.2.5 vis a vis the Broadcom bge driver?

Carlos

ljorgensen

I don't think it's a driver issue.

I changed the NIC to a card that registered as "em0" and saw the exact same problem. Then I changed the VLAN the interface was placed in and the input errors stopped immediately.

I have now realised I have something very noise in one of my vlans but it doesn't show up in a tcp dump. It shows as "packets discarded by the kernel", so they must be malformed in some way. Problem is, I have about 500 clients in the vlan so it could take some time to find the culprit…

ctirado

Oh. Good to know. I also I am using VLANs but only on the WAN side; CenturyLink requires all WAN packets to be tagged with VLAN 201. However, I don't see pfSense having any problems with the WAN interface.

Carlos

jml

Hi, since 2.2.6 I suddenly get the same errors, and a lot of them.
Here the sysctl -a:

dev.bge.0.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x004101
dev.bge.0.%driver: bge
dev.bge.0.%location: slot=0 function=0
dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1028 subdevice=0x01b6 class=0x020000
dev.bge.0.%parent: pci3
dev.bge.0.forced_collapse: 0
dev.bge.0.msi: 1
dev.bge.0.forced_udpcsum: 0
dev.bge.0.stats.FramesDroppedDueToFilters: 0
dev.bge.0.stats.DmaWriteQueueFull: 0
dev.bge.0.stats.DmaWriteHighPriQueueFull: 0
dev.bge.0.stats.NoMoreRxBDs: 18488
dev.bge.0.stats.InputDiscards: 816676
dev.bge.0.stats.InputErrors: 0
dev.bge.0.stats.RecvThresholdHit: 0
dev.bge.0.stats.rx.ifHCInOctets: 1479529684755
dev.bge.0.stats.rx.Fragments: 0
dev.bge.0.stats.rx.UnicastPkts: 1176430502
dev.bge.0.stats.rx.MulticastPkts: 0
dev.bge.0.stats.rx.BroadcastPkts: 125
dev.bge.0.stats.rx.FCSErrors: 0
dev.bge.0.stats.rx.AlignmentErrors: 0
dev.bge.0.stats.rx.xonPauseFramesReceived: 0
dev.bge.0.stats.rx.xoffPauseFramesReceived: 0
dev.bge.0.stats.rx.ControlFramesReceived: 0
dev.bge.0.stats.rx.xoffStateEntered: 0
dev.bge.0.stats.rx.FramesTooLong: 0
dev.bge.0.stats.rx.Jabbers: 0
dev.bge.0.stats.rx.UndersizePkts: 0
dev.bge.0.stats.tx.ifHCOutOctets: 234929844784
dev.bge.0.stats.tx.Collisions: 0
dev.bge.0.stats.tx.XonSent: 0
dev.bge.0.stats.tx.XoffSent: 0
dev.bge.0.stats.tx.InternalMacTransmitErrors: 0
dev.bge.0.stats.tx.SingleCollisionFrames: 0
dev.bge.0.stats.tx.MultipleCollisionFrames: 0
dev.bge.0.stats.tx.DeferredTransmissions: 0
dev.bge.0.stats.tx.ExcessiveCollisions: 0
dev.bge.0.stats.tx.LateCollisions: 0
dev.bge.0.stats.tx.UnicastPkts: 670015355
dev.bge.0.stats.tx.MulticastPkts: 3
dev.bge.0.stats.tx.BroadcastPkts: 124
dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x004101

It's reported as in the screenshot attached. I'm seeing a high "NoMoreRxBDs: 18488"

Using NetXtreme Gigabit Ethernet PCI Express (BCM5721 / bge0)

I don't use VLANs on the WAN side. WAN is directly attached to fibre-modem over UTP. With 2.2.5 I had (almost) no errors.
Any idea what can cause / solve this?

Selection_053.jpg_thumb

ctirado

I know it's not fixed in 2.3 beta. I am still getting them but nowhere near the rate you are. For my use, the percentage of discarded packets is so small (around one thousand of a percent) that I didn't bother researching it further. Have you tried the tips on Pfsense NIC tuning?

Carlos