Analyze / solve "erros in" on interface and "errors out" on vlan
-
Hey everyone,
so I recently switched from a x550-T2 to a x710-DA4 and also switched to LC OM2 which works great.
After a minor firmware nvm Upgrade (8.xx to 9.40) for the Nic the card works also now with Flow Control Full and stuff but today I was noticing some errors in on the interfaces which wasn't the case before but the errors out on the other hand on the vlans are "normal".is there any way to figure out where this errors come from and also how to resolve them?
I mean its not a big number yet but its rising.To network is configured like that:
LAN(ixl0) -> MikroTik SW C3
VMNET(ixl1) -> MikroTik SW C4
Guests(Ixl0.100) -> MikroTik SW C3
TESTLAB(ixl1.30) -> MikroTik SW C4
IOT(ixl1.40) -> MikroTik SW C4I already have all the offloading disabled:
Update:
The SFP+ Modules used are all the same:
Ubiquiti OM-MM-10G-D 850nm -
Which interface there is the X710?
How many packets has that passed to see those errors?
If you check the sysctl mac stats you can probably see what type of errors those are.
sysctl dev.ixl
Steve
-
@stephenw10 so I see MAC Checksum errors, which are enabled by default. Does this mean the card / driver does not support it?
sysctl dev.ixl | grep err dev.ixl.3.mac.checksum_errors: 0 dev.ixl.3.mac.rx_length_errors: 0 dev.ixl.3.mac.crc_errors: 0 dev.ixl.3.pf.rxq03.desc_err: 0 dev.ixl.3.pf.rxq02.desc_err: 0 dev.ixl.3.pf.rxq01.desc_err: 0 dev.ixl.3.pf.rxq00.desc_err: 0 dev.ixl.3.pf.rx_errors: 0 dev.ixl.3.iflib.override_nrxds: 0 dev.ixl.3.iflib.override_ntxds: 0 dev.ixl.3.iflib.override_qs_enable: 0 dev.ixl.3.iflib.override_nrxqs: 0 dev.ixl.3.iflib.override_ntxqs: 0 dev.ixl.2.mac.checksum_errors: 0 dev.ixl.2.mac.rx_length_errors: 0 dev.ixl.2.mac.crc_errors: 0 dev.ixl.2.pf.rxq03.desc_err: 0 dev.ixl.2.pf.rxq02.desc_err: 0 dev.ixl.2.pf.rxq01.desc_err: 0 dev.ixl.2.pf.rxq00.desc_err: 0 dev.ixl.2.pf.rx_errors: 0 dev.ixl.2.iflib.override_nrxds: 0 dev.ixl.2.iflib.override_ntxds: 0 dev.ixl.2.iflib.override_qs_enable: 0 dev.ixl.2.iflib.override_nrxqs: 0 dev.ixl.2.iflib.override_ntxqs: 0 dev.ixl.1.mac.checksum_errors: 169 dev.ixl.1.mac.rx_length_errors: 0 dev.ixl.1.mac.crc_errors: 0 dev.ixl.1.pf.rxq03.desc_err: 0 dev.ixl.1.pf.rxq02.desc_err: 0 dev.ixl.1.pf.rxq01.desc_err: 0 dev.ixl.1.pf.rxq00.desc_err: 0 dev.ixl.1.pf.rx_errors: 169 dev.ixl.1.iflib.override_nrxds: 0 dev.ixl.1.iflib.override_ntxds: 0 dev.ixl.1.iflib.override_qs_enable: 0 dev.ixl.1.iflib.override_nrxqs: 0 dev.ixl.1.iflib.override_ntxqs: 0 dev.ixl.0.mac.checksum_errors: 116 dev.ixl.0.mac.rx_length_errors: 0 dev.ixl.0.mac.crc_errors: 0 dev.ixl.0.pf.rxq03.desc_err: 0 dev.ixl.0.pf.rxq02.desc_err: 0 dev.ixl.0.pf.rxq01.desc_err: 0 dev.ixl.0.pf.rxq00.desc_err: 0 dev.ixl.0.pf.rx_errors: 116 dev.ixl.0.iflib.override_nrxds: 0 dev.ixl.0.iflib.override_ntxds: 0 dev.ixl.0.iflib.override_qs_enable: 0 dev.ixl.0.iflib.override_nrxqs: 0 dev.ixl.0.iflib.override_ntxqs: 0
To answer you question:
I just screenshoted the x710 interfaces, I only got 2 others which arent on the screenshot:
pppoe over ix0 and modemaccess ix0 (both go into the modem) -
Hmm, well you can certainly try just disabling hardware checksum offloading and see if that changes anything. You will have to reboot to apply that.
-
@sysadminfromhell how many packets have been processed? Unless its a really low number I wouldn't worry about a few errors.
So looked for which of my interface had the highest number of errors
In/out packets 200980945/288683912 (177.97 GiB/303.77 GiB) In/out packets (pass) 200980945/288683912 (177.97 GiB/303.77 GiB) In/out packets (block) 50855/15 (8.82 MiB/900 B) In/out errors 7841/0
7841 out of 200980945 is like 0.00003901364 % I wouldn't worry about such a thing ;)
Are the numbers constantly increasing? Is it a high percentage value for total number of packets that interface has seen?
-
@stephenw10 said in Analyze / solve "erros in" on interface and "errors out" on vlan:
Hmm, well you can certainly try just disabling hardware checksum offloading and see if that changes anything. You will have to reboot to apply that.
I disabled it and the errors stop raising.
So what does it mean in total? The driver / card has issues with that or is my hardware configuration just not "good"?@johnpoz said in Analyze / solve "erros in" on interface and "errors out" on vlan:
@sysadminfromhell how many packets have been processed? Unless its a really low number I wouldn't worry about a few errors.
So looked for which of my interface had the highest number of errors
In/out packets 200980945/288683912 (177.97 GiB/303.77 GiB) In/out packets (pass) 200980945/288683912 (177.97 GiB/303.77 GiB) In/out packets (block) 50855/15 (8.82 MiB/900 B) In/out errors 7841/0
7841 out of 200980945 is like 0.00003901364 % I wouldn't worry about such a thing ;)
Are the numbers constantly increasing? Is it a high percentage value for total number of packets that interface has seen?
So here are the numbers so far, its a small percentage tbh but does it mean that "errors" in a small % is normal?
Kind regards,
PS.: for my inner monk is weird to see errors in the combination of the words "aren't to worry about", its just that I know the concept of a firewall and also how to operate one but not in terms of that.
-
@sysadminfromhell now and then you will see errors.. What would be the point or reporting them if you never saw them.. Some errors are bound to happen.. Got something throws out a few mangled packets when it looses power maybe, etc. Or just now and then for whatever software reason, etc.
Unless the number was increasing where you could see it actually changing as you hit refresh, etc.. Not like you come back next week and the number is 2 higher then it was last week, etc.
169 out of 5 million, is a small tiny fraction, I mean tiny hehe.. I would just chalk that up to yeah now and then your going to see some errors. When we are checking the network for issues, and we come about a interface that has errors on it - we do reset the counters and look to see if numbers are going up in real time, etc.
Your seeing like 1 error for every like 30k packets.. Not sure I would be concerned ;)
-
@johnpoz The funny part is after I enabeld TLO, TSO and the Checksum now (do untick all options) the counter is still zero. I will have a eye on it and see if this works like that. For now the packet count is low (due to restart).
The VLAN erros come from I dont know where but this count I normally encounter and never raised above 10.
-
Yup some errors are not unexpected. Especially if the link gets reconnected.
-
@stephenw10 Update after about a day:
-
Hmm, that's a pretty low percentage. Does it increment slowly?
-
@stephenw10 yes it does. But the errors are always mac checksum errors but now they´re only on one interface, LAN.
-
Hmm, and that's after disabling hardware checksum offloading?
Does it actually show as disabled on the LAN NIC in the
ifconfig -v
output? -
@sysadminfromhell you have in 3355323 packets / 79 is 1 error every like 42k packets..
Maybe you just have those errors coming in? Be it you offload the checksum or not..
-
@johnpoz I was going to answer that at @stephenw10 , I still get this errors with offload enabled or disabled. Funny part: its now less then before with LSO ans TSO also enabled. Still rising but I think im not concerned due to the nice side effect that the bufferbloat is also vanished. No problems there anymore with 100% util from my download rate. still stable browsing / gaming at the same time.
-
@sysadminfromhell I just wanted to figure out where to start digging where they come from. On the LAN side they're aren't much devices but some wifi devices which maybe can cause this but will I be able to find out where this errors come from to at least understand whats going on?
-
@sysadminfromhell can you disable your wifi for a while, to isolate it to wired or wireless devices?
Or are there only wifi on this connection? I would turn off wifi to validate it really is a wifi device, and then its matter of figuring out which one or ones..
-
@johnpoz Currently I cannot disable the wifi right now but in a few days I can verify if this is the issue. I also reed that some MikroTik switches have some issues creating this problem. I'm going to verify if the switch is the problem aswell.
-
@sysadminfromhell so short update here @johnpoz, the switch isnt the issue and I also getting errors on the VMNet now. The number is small but rising hourly.
Due to the new upcoming errors on the other Interface I believe that Wifi isnt the problem because there isnt any Wifi SSID Broadcasting for this Network.
I might have an idea but need to redesign the network a bit so I try that tomorrow and will Update you.
Updated Numbers:
-
@sysadminfromhell while I personally wouldn't be too concerned with such minor amount of errors - unless there was something actually not working how it should and tracked it down to these sorts of errors.
But I would be interested in what you find, etc. Sometimes such little minor things can be fun to track down, but they can also be huge time sucks - hehehe
I can not tell you the amount of time I spent trying to figure out why plex will send out ssdp every freaking 10 seconds, when all the things are disabled for why it might or could have use for doing such a thing.
Posted over on the plex forums - got back crickets.. Couple of users posted that they noticed it too.. But no solution, in the long run I just ended blocking such traffic at the switch port.. Plex can send them out every 10 seconds, it goes no farther than switch port at the end of its wire... Stupid shit!! hehehe
So yeah would be very interested in what you find.. You never know might run into such a thing sometime down the road and what you find could be the solution there.. So good luck! Hope you track it down..
I recall something sim as well, on some cheap smart switch.. It would mark RxBadPkt, and the counter would constantly go up - even though everything was working fine.. It was just a cosmetic error, any packets marked with tags got marked as RxBadPkt, So native untagged wouldn't trigger the stat, but all tags coming in would.. All the vlans actually worked, etc. but they would just increase that counter.. That was a time suck for sure.. Finally just had to let it go ;)