LAN Interface "In" errors

psychosquirrel

Hello,

I have tried for the past week to resolve this errors issue I am seeing only on the LAN interface. Before modifications, the errors would increment by thousands on the LAN interface. By default "Hardware TCP Segmentation Offloading" and "Hardware Large Receive Offloading" are checked under "System > Advanced > Networking" indicating that they are disabled. System was rebooted. The errors still persist. Then I found this doc https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#Broadcom_bce.284.29_Cards and added the entries from there related to broadcom cards. The errors are now slower to increment, but errors are still seen.

As of right now I am stuck. what are these errors? what is considered an error?

in tcpdump, I noted many many packets with bad chksum. Mostly from the Wifi AP

My network is set up like so

Cable Modem –> bge0 (WAN) pfsense running dell power edge r200 (LAN) bge1 --> Enterasys c2g124-48p switch (8 ports enabled 40 ports disabled) --> WiFi AP Asus RT-AC66U


LAN Interface:

Status
    up
MAC Address
    00:21:9b:fc:96:1b
IPv4 Address
    192.168.50.1
Subnet mask IPv4
    255.255.255.0
IPv6 Link Local
    fe80::221:9bff:fefc:961b%bge1
MTU
    1500
Media
    1000baseT <full-duplex>In/out packets
    8233132/17058096 (1.75 GiB/19.30 GiB)
In/out packets (pass)
    8233132/17058096 (1.75 GiB/19.30 GiB)
In/out packets (block)
    7943/0 (565 KiB/0 B)
In/out errors
    4594/0
Collisions
    0</full-duplex>


System Information:

System 	pfSense

BIOS 	Vendor: Dell Inc.
Version: 1.4.3
Release Date: Fri May 15 2009
Version 	2.4.3-RELEASE (amd64)
built on Mon Mar 26 18:02:04 CDT 2018
FreeBSD 11.1-RELEASE-p7

The system is on the latest version.
Version information updated at Sat Apr 14 12:11:02 CDT 2018  
CPU Type 	Intel(R) Xeon(R) CPU X3360 @ 2.83GHz
Current: 2000 MHz, Max: 2834 MHz
4 CPUs: 1 package(s) x 4 core(s)
AES-NI CPU Crypto: No
Kernel PTI 	Enabled
Uptime 	1 Day 13 Hours 40 Minutes 48 Seconds
Current date/time 	
Sat Apr 14 12:36:13 CDT 2018
DNS server(s) 	

    127.0.0.1
    8.8.8.8
    8.8.4.4

Last config change 	Sat Apr 14 11:49:26 CDT 2018
State table size 	
0% (193/814000) Show states
MBUF Usage 	
3% (3296/131072)
Load average 	
0.20, 0.23, 0.17
CPU usage 	
0%
Memory usage 	
3% of 8146 MiB
SWAP usage 	
0% of 3979 MiB
Disk usage:
     / 	
1% of 222GiB - ufs
     /var/run 	
4% of 3.4MiB - ufs in RAM

keen

try to change the ethernet cable between your firewall and switch, and maybe if nothing is change, try to change the port on the switch.

psychosquirrel

I think I may have figured out something here. Looking at my AP (Asus RT-AC66U) the interface errors being transmitted match the LAN "In" errors on pfSense. Keep in mind that the AP and pf Sense are not directly connected I'm thinking now that pfSense is not at fault and is subject to the Garbage In Garbage out. Further testing this theory, I wired a computer directly to my Enterasys switch and pfSense does not increment in errors.

Below is the output of ifconfig from my AP:


br0       Link encap:Ethernet  HWaddr 60:45:CB:B0:2C:68  
          inet addr:192.168.50.3  Bcast:192.168.50.255  Mask:255.255.255.0
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          RX packets:1475234 errors:0 dropped:0 overruns:0 frame:0
          TX packets:786186 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:270674350 (258.1 MiB)  TX bytes:143958571 (137.2 MiB)

eth0      Link encap:Ethernet  HWaddr 60:45:CB:B0:2C:68  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:45066001 errors:0 dropped:0 overruns:0 frame:0
          TX packets:15976706 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:937025919 (893.6 MiB)  TX bytes:4052578657 (3.7 GiB)
          Interrupt:179 Base address:0x4000 

eth1      Link encap:Ethernet  HWaddr 60:45:CB:B0:2C:68  
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          RX packets:100915 errors:0 dropped:0 overruns:0 frame:891588
          TX packets:452609 errors:3871 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:17667996 (16.8 MiB)  TX bytes:179308043 (171.0 MiB)
          Interrupt:163 

eth2      Link encap:Ethernet  HWaddr 60:45:CB:B0:2C:6C  
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          RX packets:16604878 errors:0 dropped:0 overruns:0 frame:6313378
          TX packets:42886378 errors:33205 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:3880400783 (3.6 GiB)  TX bytes:957843658 (913.4 MiB)
          Interrupt:169 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING MULTICAST  MTU:16436  Metric:1
          RX packets:266157 errors:0 dropped:0 overruns:0 frame:0
          TX packets:266157 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:62906197 (59.9 MiB)  TX bytes:62906197 (59.9 MiB)

vlan1     Link encap:Ethernet  HWaddr 60:45:CB:B0:2C:68  
          UP BROADCAST RUNNING ALLMULTI MULTICAST  MTU:1500  Metric:1
          RX packets:45065999 errors:0 dropped:0 overruns:0 frame:0
          TX packets:15976706 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:56569758523 (52.6 GiB)  TX bytes:3813453606 (3.5 GiB)

stephenw10

Hmm, that's fun. ;)

I would also disable 'Hardware Checksum Offloading' if you have not done already.

Steve

strangegopher

I used to get these errors when I enabled traffic shaping

psychosquirrel

I found something else now regarding my LAN "In" errors.

probing sysctl dev.bge.1 shows the following:


dev.bge.1.stats.tx.BroadcastPkts: 512
dev.bge.1.stats.tx.MulticastPkts: 3593
dev.bge.1.stats.tx.UnicastPkts: 2039687
dev.bge.1.stats.tx.LateCollisions: 0
dev.bge.1.stats.tx.ExcessiveCollisions: 0
dev.bge.1.stats.tx.DeferredTransmissions: 0
dev.bge.1.stats.tx.MultipleCollisionFrames: 0
dev.bge.1.stats.tx.SingleCollisionFrames: 0
dev.bge.1.stats.tx.InternalMacTransmitErrors: 0
dev.bge.1.stats.tx.XoffSent: 0
dev.bge.1.stats.tx.XonSent: 0
dev.bge.1.stats.tx.Collisions: 0
dev.bge.1.stats.tx.ifHCOutOctets: 2812725981
dev.bge.1.stats.rx.UndersizePkts: 0
dev.bge.1.stats.rx.Jabbers: 0
dev.bge.1.stats.rx.FramesTooLong: 0
dev.bge.1.stats.rx.xoffStateEntered: 0
dev.bge.1.stats.rx.ControlFramesReceived: 0
dev.bge.1.stats.rx.xoffPauseFramesReceived: 0
dev.bge.1.stats.rx.xonPauseFramesReceived: 0
dev.bge.1.stats.rx.AlignmentErrors: 0
dev.bge.1.stats.rx.FCSErrors: 0
dev.bge.1.stats.rx.BroadcastPkts: 795
dev.bge.1.stats.rx.MulticastPkts: 372
dev.bge.1.stats.rx.UnicastPkts: 1255895
dev.bge.1.stats.rx.Fragments: 0
dev.bge.1.stats.rx.ifHCInOctets: 440643493
dev.bge.1.stats.RecvThresholdHit: 0
dev.bge.1.stats.InputErrors: 0
dev.bge.1.stats.InputDiscards: 1924
dev.bge.1.stats.NoMoreRxBDs: 0
dev.bge.1.stats.DmaWriteHighPriQueueFull: 0
dev.bge.1.stats.DmaWriteQueueFull: 0
dev.bge.1.stats.FramesDroppedDueToFilters: 0
dev.bge.1.forced_udpcsum: 0
dev.bge.1.msi: 1
dev.bge.1.forced_collapse: 0
dev.bge.1.%parent: pci3
dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1028 subdevice=0x023c class=0x020000
dev.bge.1.%location: slot=0 function=0 dbsf=pci0:4:0:0
dev.bge.1.%driver: bge
dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x004201

What are InputDiscards?


dev.bge.1.stats.InputDiscards: 1924

If I connect my laptop or computer to the bge1 interface and run speed tests the discards do not increment. However, if I connect my RT-AC66U router (in AP mode) directly to the interface and do the same speed test the InputDiscards increment like crazy. 300-500 Discards per test.

Is there anything I can do to reduce those discard errors? or find out why a packet is getting discarded?

strangegopher

try some of the tuning mentioned here: https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards
There is a section about "Packet loss with many (small) UDP packets"

you can also try and make a vmware esxi vm and visualize pfsense to see if the errors go away.

psychosquirrel

@strangegopher:

try some of the tuning mentioned here: https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards
There is a section about "Packet loss with many (small) UDP packets"

you can also try and make a vmware esxi vm and visualize pfsense to see if the errors go away.

I went through this wiki previously, it dramatically decreased the error,. but did not eliminate them. I re-read it again and decided to increase the kern.ipc.nmbclusters to 1 million in /boot/loader.conf.local since I have 8GB RAM on this machine.

Here are my settings from /boot/loader.conf.local (The only item changed since last update is the kern.ipc.nmbclusters from 131072 to 1000000)


net.inet.tcp.tso=0
kern.ipc.nmbclusters="1000000"
hw.bge.tso_enable=0
hw.pci.enable_msix=0
net.isr.direct_force=1
net.isr.direct=1

I read again and it said that the two ISR variables should be in the system tunables page, but it should still work yes?

strangegopher

Not sure if it will work, i would add it to both places just to be sure.

I would also try and disable all the hardware offloading at the bottom of the page under "System > Advanced" on the "Networking" tab

Edit: One thing you can try is get a 2 or 4 port intel gigabit nic if you got expansion slots.

psychosquirrel

@strangegopher:

Not sure if it will work, i would add it to both places just to be sure.

I would also try and disable all the hardware offloading at the bottom of the page under "System > Advanced" on the "Networking" tab

Edit: One thing you can try is get a 2 or 4 port intel gigabit nic if you got expansion slots.

I was thinking about doing that, I despise Broadcom devices for *nix. Brings back flashbacks of using fw-cutter and "reverse engineering" a driver suitable for my Linux kernel at the time under CentOS 5. Even then it worked "flaky at best"

I think I might locate an Intel NIC and disable the on board Broadcom stuff.

Its so strange though, the only device that is having these errors is bge1 (LAN) bge0 has no issues. Swap the interfaces so that bge0 was LAN and bge1 was WAN and that interface has the issues. I know I'm running a modern pfsense on an ancient dinosaur of a machine, but its strange. Why is my LAN so "noisy" and its all coming from the Asus RT-AC66U. If I connect via ethernet to anything, the switch, the Asus RT-AC66U, even the pfsense box directly no errors.

What's frustrating is that there's no ethtool or any way to diagnose what packets are being discarded and at what layer. Is it the CPU doing the discarding or what?

I have disabled all of the hardware offloading, changed a few parameters here and there, but without any tools so to speak I'm flying blind.

psychosquirrel

So I am reading about FreeBSD 11, there are so so many tunables for networking.

Here are my current tunables in System > Advanced > System Tunables


Changed Tunables:
net.inet.tcp.log_debug = 1
net.inet.tcp.tso = 0
net.isr.direct_force = 1
net.isr.direct = 1
hw.pci.enable_msix = 0
hw.pci.enable_msi = 0
hw.bge.tso_enable = 0
net.inet.icmp.icmplim = 1000
net.inet.tcp.delayed_ack = 1
net.inet.tcp.drop_synfin = 0
net.inet.tcp.syncookies = 0
net.inet.ip.fastforwarding = 1

Unchanged Tunables:
net.inet.ip.portrange.first = 1024
net.inet.tcp.blackhole = 2
net.inet.udp.blackhole = 1
net.inet.ip.random_id = 1
net.inet.ip.redirect = 1
net.inet6.ip6.redirect = 1
net.inet6.ip6.use_tempaddr = 0
net.inet6.ip6.prefer_tempaddr = 0
net.inet.tcp.recvspace = 65228
net.inet.tcp.sendspace = 65228
net.inet.udp.maxdgram = 57344
net.link.bridge.pfil_onlyip = 0
net.link.bridge.pfil_member = 1
net.link.bridge.pfil_bridge = 0
net.link.tap.user_open = 1
net.link.vlan.mtag_pcp = 1
kern.randompid = 347
net.inet.ip.intr_queue_maxlen = 1000
hw.syscons.kbd_reboot = 0
vfs.read_max = 32
kern.ipc.maxsockbuf = 4262144
net.inet.ip.process_options = 0
kern.random.harvest.mask = 351
net.route.netisr_maxqlen = 1024
net.inet.udp.checksum = 1
net.inet.icmp.reply_from_interface = 1
net.inet6.ip6.rfc6204w3 = 1
net.enc.out.ipsec_bpf_mask = 0x0001
net.enc.out.ipsec_filter_mask = 0x0001
net.enc.in.ipsec_bpf_mask = 0x0002
net.enc.in.ipsec_filter_mask = 0x0002
net.key.preferred_oldsa = 0
net.inet.carp.senderr_demotion_factor = 0
net.pfsync.carp_demotion_factor = 0
net.raw.recvspace = 65536
net.raw.sendspace = 65536
net.inet.raw.recvspace = 131072
net.inet.raw.maxdgram = 131072
kern.corefile = /root/%N.core

I have increased my "speed" however these settings have done nothing for the errors. I am beginning to thing I'm worrying about the errors for nothing as Im getting 515 Mbps down over Wifi and close to 850 Mbps via ethernet.

I have gigabit Internet over cable.

What do you all think? Am I overly concerned regarding the errors for nothing? Im not seeing any indications of "serious errors" in dmesg Just these errors regarding timestamps missing, but that's the tcp debug. I was hoping to find if I could find out what packets are being dropped.


TCP: [192.168.50.30]:62732 to [192.168.50.1]:80 tcpflags 0x10<ack>; tcp_do_segment: Timestamp missing, no action
TCP: [192.168.50.30]:62735 to [192.168.50.1]:80 tcpflags 0x10<ack>; tcp_do_segment: Timestamp missing, no action
TCP: [192.168.50.30]:62734 to [192.168.50.1]:80 tcpflags 0x10<ack>; tcp_do_segment: Timestamp missing, no action
TCP: [192.168.50.30]:62730 to [192.168.50.1]:80 tcpflags 0x10<ack>; tcp_do_segment: Timestamp missing, no action
TCP: [192.168.50.30]:62732 to [192.168.50.1]:80 tcpflags 0x10<ack>; tcp_do_segment: Timestamp missing, no action
TCP: [192.168.50.30]:62735 to [192.168.50.1]:80 tcpflags 0x10<ack>; tcp_do_segment: Timestamp missing, no action
TCP: [192.168.50.30]:62734 to [192.168.50.1]:80 tcpflags 0x10<ack>; tcp_do_segment: Timestamp missing, no action
TCP: [208.123.73.93]:443 to [72.47.40.251]:44841 tcpflags 0x12<syn,ack>; tcp_do_segment: Timestamp not expected, no action
TCP: [208.123.73.93]:443 to [72.47.40.251]:28208 tcpflags 0x12<syn,ack>; tcp_do_segment: Timestamp not expected, no action
TCP: [10.0.0.1]:45674 to [192.168.50.1]:22 tcpflags 0x2<syn>; syncache_add: Received duplicate SYN, resetting timer and retransmitting SYN|ACK
TCP: [10.0.0.1]:45674 to [192.168.50.1]:22 tcpflags 0x2<syn>; syncache_add: Received duplicate SYN, resetting timer and retransmitting SYN|ACK</syn></syn></syn,ack></syn,ack></ack></ack></ack></ack></ack></ack></ack>

Are there any other buffer tunables I can use to increase the buffer size?

Below are my state table sizes and MBUF:


State table size  0% (317/814000)
MBUF Usage 	0% (3046/1000000)

In case I havent stated previously Here is the CPU info:


Intel(R) Xeon(R) CPU X3360 @ 2.83GHz
Current: 2000 MHz, Max: 2834 MHz
4 CPUs: 1 package(s) x 4 core(s)
AES-NI CPU Crypto: No

stephenw10

Can you generate traffic via the Asus without using wifi? Via the switch ports there of from the device itself?

Wifi in inherently prone to errors just due to random interference, reflections etc. If you have a wifi interface in pfSense directly for example you will always see some errors on it. However I would expect to see most of those layer 1 type errors on the Asus and not passed to pfSense. Anything at layer 2 may be though, if you have the AP connected at L2.

Broadcom Ethernet was never anywhere near as bad as wifi. In fact they were second to Intel for a while IMO. If you can use an Intel NIC though you should.

Can you try putting a switch in between pfSense and the Asus device? That would rule out some obscure incompatibility.

Steve