LAN Interface "In" errors
-
Hello,
I have tried for the past week to resolve this errors issue I am seeing only on the LAN interface. Before modifications, the errors would increment by thousands on the LAN interface. By default "Hardware TCP Segmentation Offloading" and "Hardware Large Receive Offloading" are checked under "System > Advanced > Networking" indicating that they are disabled. System was rebooted. The errors still persist. Then I found this doc https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#Broadcom_bce.284.29_Cards and added the entries from there related to broadcom cards. The errors are now slower to increment, but errors are still seen.
As of right now I am stuck. what are these errors? what is considered an error?
in tcpdump, I noted many many packets with bad chksum. Mostly from the Wifi AP
My network is set up like so
Cable Modem –> bge0 (WAN) pfsense running dell power edge r200 (LAN) bge1 --> Enterasys c2g124-48p switch (8 ports enabled 40 ports disabled) --> WiFi AP Asus RT-AC66U
LAN Interface: Status up MAC Address 00:21:9b:fc:96:1b IPv4 Address 192.168.50.1 Subnet mask IPv4 255.255.255.0 IPv6 Link Local fe80::221:9bff:fefc:961b%bge1 MTU 1500 Media 1000baseT <full-duplex>In/out packets 8233132/17058096 (1.75 GiB/19.30 GiB) In/out packets (pass) 8233132/17058096 (1.75 GiB/19.30 GiB) In/out packets (block) 7943/0 (565 KiB/0 B) In/out errors 4594/0 Collisions 0</full-duplex>
System Information: System pfSense BIOS Vendor: Dell Inc. Version: 1.4.3 Release Date: Fri May 15 2009 Version 2.4.3-RELEASE (amd64) built on Mon Mar 26 18:02:04 CDT 2018 FreeBSD 11.1-RELEASE-p7 The system is on the latest version. Version information updated at Sat Apr 14 12:11:02 CDT 2018 CPU Type Intel(R) Xeon(R) CPU X3360 @ 2.83GHz Current: 2000 MHz, Max: 2834 MHz 4 CPUs: 1 package(s) x 4 core(s) AES-NI CPU Crypto: No Kernel PTI Enabled Uptime 1 Day 13 Hours 40 Minutes 48 Seconds Current date/time Sat Apr 14 12:36:13 CDT 2018 DNS server(s) 127.0.0.1 8.8.8.8 8.8.4.4 Last config change Sat Apr 14 11:49:26 CDT 2018 State table size 0% (193/814000) Show states MBUF Usage 3% (3296/131072) Load average 0.20, 0.23, 0.17 CPU usage 0% Memory usage 3% of 8146 MiB SWAP usage 0% of 3979 MiB Disk usage: / 1% of 222GiB - ufs /var/run 4% of 3.4MiB - ufs in RAM
-
try to change the ethernet cable between your firewall and switch, and maybe if nothing is change, try to change the port on the switch.
-
I think I may have figured out something here. Looking at my AP (Asus RT-AC66U) the interface errors being transmitted match the LAN "In" errors on pfSense. Keep in mind that the AP and pf Sense are not directly connected I'm thinking now that pfSense is not at fault and is subject to the Garbage In Garbage out. Further testing this theory, I wired a computer directly to my Enterasys switch and pfSense does not increment in errors.
Below is the output of ifconfig from my AP:
br0 Link encap:Ethernet HWaddr 60:45:CB:B0:2C:68 inet addr:192.168.50.3 Bcast:192.168.50.255 Mask:255.255.255.0 UP BROADCAST RUNNING ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:1475234 errors:0 dropped:0 overruns:0 frame:0 TX packets:786186 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:270674350 (258.1 MiB) TX bytes:143958571 (137.2 MiB) eth0 Link encap:Ethernet HWaddr 60:45:CB:B0:2C:68 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:45066001 errors:0 dropped:0 overruns:0 frame:0 TX packets:15976706 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:937025919 (893.6 MiB) TX bytes:4052578657 (3.7 GiB) Interrupt:179 Base address:0x4000 eth1 Link encap:Ethernet HWaddr 60:45:CB:B0:2C:68 UP BROADCAST RUNNING ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:100915 errors:0 dropped:0 overruns:0 frame:891588 TX packets:452609 errors:3871 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:17667996 (16.8 MiB) TX bytes:179308043 (171.0 MiB) Interrupt:163 eth2 Link encap:Ethernet HWaddr 60:45:CB:B0:2C:6C UP BROADCAST RUNNING ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:16604878 errors:0 dropped:0 overruns:0 frame:6313378 TX packets:42886378 errors:33205 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3880400783 (3.6 GiB) TX bytes:957843658 (913.4 MiB) Interrupt:169 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MULTICAST MTU:16436 Metric:1 RX packets:266157 errors:0 dropped:0 overruns:0 frame:0 TX packets:266157 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:62906197 (59.9 MiB) TX bytes:62906197 (59.9 MiB) vlan1 Link encap:Ethernet HWaddr 60:45:CB:B0:2C:68 UP BROADCAST RUNNING ALLMULTI MULTICAST MTU:1500 Metric:1 RX packets:45065999 errors:0 dropped:0 overruns:0 frame:0 TX packets:15976706 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:56569758523 (52.6 GiB) TX bytes:3813453606 (3.5 GiB)
-
Hmm, that's fun. ;)
I would also disable 'Hardware Checksum Offloading' if you have not done already.
Steve
-
I used to get these errors when I enabled traffic shaping
-
I found something else now regarding my LAN "In" errors.
probing sysctl dev.bge.1 shows the following:
dev.bge.1.stats.tx.BroadcastPkts: 512 dev.bge.1.stats.tx.MulticastPkts: 3593 dev.bge.1.stats.tx.UnicastPkts: 2039687 dev.bge.1.stats.tx.LateCollisions: 0 dev.bge.1.stats.tx.ExcessiveCollisions: 0 dev.bge.1.stats.tx.DeferredTransmissions: 0 dev.bge.1.stats.tx.MultipleCollisionFrames: 0 dev.bge.1.stats.tx.SingleCollisionFrames: 0 dev.bge.1.stats.tx.InternalMacTransmitErrors: 0 dev.bge.1.stats.tx.XoffSent: 0 dev.bge.1.stats.tx.XonSent: 0 dev.bge.1.stats.tx.Collisions: 0 dev.bge.1.stats.tx.ifHCOutOctets: 2812725981 dev.bge.1.stats.rx.UndersizePkts: 0 dev.bge.1.stats.rx.Jabbers: 0 dev.bge.1.stats.rx.FramesTooLong: 0 dev.bge.1.stats.rx.xoffStateEntered: 0 dev.bge.1.stats.rx.ControlFramesReceived: 0 dev.bge.1.stats.rx.xoffPauseFramesReceived: 0 dev.bge.1.stats.rx.xonPauseFramesReceived: 0 dev.bge.1.stats.rx.AlignmentErrors: 0 dev.bge.1.stats.rx.FCSErrors: 0 dev.bge.1.stats.rx.BroadcastPkts: 795 dev.bge.1.stats.rx.MulticastPkts: 372 dev.bge.1.stats.rx.UnicastPkts: 1255895 dev.bge.1.stats.rx.Fragments: 0 dev.bge.1.stats.rx.ifHCInOctets: 440643493 dev.bge.1.stats.RecvThresholdHit: 0 dev.bge.1.stats.InputErrors: 0 dev.bge.1.stats.InputDiscards: 1924 dev.bge.1.stats.NoMoreRxBDs: 0 dev.bge.1.stats.DmaWriteHighPriQueueFull: 0 dev.bge.1.stats.DmaWriteQueueFull: 0 dev.bge.1.stats.FramesDroppedDueToFilters: 0 dev.bge.1.forced_udpcsum: 0 dev.bge.1.msi: 1 dev.bge.1.forced_collapse: 0 dev.bge.1.%parent: pci3 dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1028 subdevice=0x023c class=0x020000 dev.bge.1.%location: slot=0 function=0 dbsf=pci0:4:0:0 dev.bge.1.%driver: bge dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x004201
What are InputDiscards?
dev.bge.1.stats.InputDiscards: 1924
If I connect my laptop or computer to the bge1 interface and run speed tests the discards do not increment. However, if I connect my RT-AC66U router (in AP mode) directly to the interface and do the same speed test the InputDiscards increment like crazy. 300-500 Discards per test.
Is there anything I can do to reduce those discard errors? or find out why a packet is getting discarded?
-
try some of the tuning mentioned here: https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards
There is a section about "Packet loss with many (small) UDP packets"you can also try and make a vmware esxi vm and visualize pfsense to see if the errors go away.
-
try some of the tuning mentioned here: https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards
There is a section about "Packet loss with many (small) UDP packets"you can also try and make a vmware esxi vm and visualize pfsense to see if the errors go away.
I went through this wiki previously, it dramatically decreased the error,. but did not eliminate them. I re-read it again and decided to increase the kern.ipc.nmbclusters to 1 million in /boot/loader.conf.local since I have 8GB RAM on this machine.
Here are my settings from /boot/loader.conf.local (The only item changed since last update is the kern.ipc.nmbclusters from 131072 to 1000000)
net.inet.tcp.tso=0 kern.ipc.nmbclusters="1000000" hw.bge.tso_enable=0 hw.pci.enable_msix=0 net.isr.direct_force=1 net.isr.direct=1
I read again and it said that the two ISR variables should be in the system tunables page, but it should still work yes?
-
Not sure if it will work, i would add it to both places just to be sure.
I would also try and disable all the hardware offloading at the bottom of the page under "System > Advanced" on the "Networking" tab
Edit: One thing you can try is get a 2 or 4 port intel gigabit nic if you got expansion slots.
-
Not sure if it will work, i would add it to both places just to be sure.
I would also try and disable all the hardware offloading at the bottom of the page under "System > Advanced" on the "Networking" tab
Edit: One thing you can try is get a 2 or 4 port intel gigabit nic if you got expansion slots.
I was thinking about doing that, I despise Broadcom devices for *nix. Brings back flashbacks of using fw-cutter and "reverse engineering" a driver suitable for my Linux kernel at the time under CentOS 5. Even then it worked "flaky at best"
I think I might locate an Intel NIC and disable the on board Broadcom stuff.
Its so strange though, the only device that is having these errors is bge1 (LAN) bge0 has no issues. Swap the interfaces so that bge0 was LAN and bge1 was WAN and that interface has the issues. I know I'm running a modern pfsense on an ancient dinosaur of a machine, but its strange. Why is my LAN so "noisy" and its all coming from the Asus RT-AC66U. If I connect via ethernet to anything, the switch, the Asus RT-AC66U, even the pfsense box directly no errors.
What's frustrating is that there's no ethtool or any way to diagnose what packets are being discarded and at what layer. Is it the CPU doing the discarding or what?
I have disabled all of the hardware offloading, changed a few parameters here and there, but without any tools so to speak I'm flying blind.
-
So I am reading about FreeBSD 11, there are so so many tunables for networking.
Here are my current tunables in System > Advanced > System Tunables
Changed Tunables: net.inet.tcp.log_debug = 1 net.inet.tcp.tso = 0 net.isr.direct_force = 1 net.isr.direct = 1 hw.pci.enable_msix = 0 hw.pci.enable_msi = 0 hw.bge.tso_enable = 0 net.inet.icmp.icmplim = 1000 net.inet.tcp.delayed_ack = 1 net.inet.tcp.drop_synfin = 0 net.inet.tcp.syncookies = 0 net.inet.ip.fastforwarding = 1 Unchanged Tunables: net.inet.ip.portrange.first = 1024 net.inet.tcp.blackhole = 2 net.inet.udp.blackhole = 1 net.inet.ip.random_id = 1 net.inet.ip.redirect = 1 net.inet6.ip6.redirect = 1 net.inet6.ip6.use_tempaddr = 0 net.inet6.ip6.prefer_tempaddr = 0 net.inet.tcp.recvspace = 65228 net.inet.tcp.sendspace = 65228 net.inet.udp.maxdgram = 57344 net.link.bridge.pfil_onlyip = 0 net.link.bridge.pfil_member = 1 net.link.bridge.pfil_bridge = 0 net.link.tap.user_open = 1 net.link.vlan.mtag_pcp = 1 kern.randompid = 347 net.inet.ip.intr_queue_maxlen = 1000 hw.syscons.kbd_reboot = 0 vfs.read_max = 32 kern.ipc.maxsockbuf = 4262144 net.inet.ip.process_options = 0 kern.random.harvest.mask = 351 net.route.netisr_maxqlen = 1024 net.inet.udp.checksum = 1 net.inet.icmp.reply_from_interface = 1 net.inet6.ip6.rfc6204w3 = 1 net.enc.out.ipsec_bpf_mask = 0x0001 net.enc.out.ipsec_filter_mask = 0x0001 net.enc.in.ipsec_bpf_mask = 0x0002 net.enc.in.ipsec_filter_mask = 0x0002 net.key.preferred_oldsa = 0 net.inet.carp.senderr_demotion_factor = 0 net.pfsync.carp_demotion_factor = 0 net.raw.recvspace = 65536 net.raw.sendspace = 65536 net.inet.raw.recvspace = 131072 net.inet.raw.maxdgram = 131072 kern.corefile = /root/%N.core
I have increased my "speed" however these settings have done nothing for the errors. I am beginning to thing I'm worrying about the errors for nothing as Im getting 515 Mbps down over Wifi and close to 850 Mbps via ethernet.
I have gigabit Internet over cable.
What do you all think? Am I overly concerned regarding the errors for nothing? Im not seeing any indications of "serious errors" in dmesg Just these errors regarding timestamps missing, but that's the tcp debug. I was hoping to find if I could find out what packets are being dropped.
TCP: [192.168.50.30]:62732 to [192.168.50.1]:80 tcpflags 0x10<ack>; tcp_do_segment: Timestamp missing, no action TCP: [192.168.50.30]:62735 to [192.168.50.1]:80 tcpflags 0x10<ack>; tcp_do_segment: Timestamp missing, no action TCP: [192.168.50.30]:62734 to [192.168.50.1]:80 tcpflags 0x10<ack>; tcp_do_segment: Timestamp missing, no action TCP: [192.168.50.30]:62730 to [192.168.50.1]:80 tcpflags 0x10<ack>; tcp_do_segment: Timestamp missing, no action TCP: [192.168.50.30]:62732 to [192.168.50.1]:80 tcpflags 0x10<ack>; tcp_do_segment: Timestamp missing, no action TCP: [192.168.50.30]:62735 to [192.168.50.1]:80 tcpflags 0x10<ack>; tcp_do_segment: Timestamp missing, no action TCP: [192.168.50.30]:62734 to [192.168.50.1]:80 tcpflags 0x10<ack>; tcp_do_segment: Timestamp missing, no action TCP: [208.123.73.93]:443 to [72.47.40.251]:44841 tcpflags 0x12<syn,ack>; tcp_do_segment: Timestamp not expected, no action TCP: [208.123.73.93]:443 to [72.47.40.251]:28208 tcpflags 0x12<syn,ack>; tcp_do_segment: Timestamp not expected, no action TCP: [10.0.0.1]:45674 to [192.168.50.1]:22 tcpflags 0x2<syn>; syncache_add: Received duplicate SYN, resetting timer and retransmitting SYN|ACK TCP: [10.0.0.1]:45674 to [192.168.50.1]:22 tcpflags 0x2<syn>; syncache_add: Received duplicate SYN, resetting timer and retransmitting SYN|ACK</syn></syn></syn,ack></syn,ack></ack></ack></ack></ack></ack></ack></ack>
Are there any other buffer tunables I can use to increase the buffer size?
Below are my state table sizes and MBUF:
State table size 0% (317/814000) MBUF Usage 0% (3046/1000000)
In case I havent stated previously Here is the CPU info:
Intel(R) Xeon(R) CPU X3360 @ 2.83GHz Current: 2000 MHz, Max: 2834 MHz 4 CPUs: 1 package(s) x 4 core(s) AES-NI CPU Crypto: No
-
Can you generate traffic via the Asus without using wifi? Via the switch ports there of from the device itself?
Wifi in inherently prone to errors just due to random interference, reflections etc. If you have a wifi interface in pfSense directly for example you will always see some errors on it. However I would expect to see most of those layer 1 type errors on the Asus and not passed to pfSense. Anything at layer 2 may be though, if you have the AP connected at L2.
Broadcom Ethernet was never anywhere near as bad as wifi. In fact they were second to Intel for a while IMO. If you can use an Intel NIC though you should.
Can you try putting a switch in between pfSense and the Asus device? That would rule out some obscure incompatibility.
Steve