Any hints on disambiguating NIC failure?
-
Here's the issue: i have a dual ethernet Marvel-Yukon PCIexpress card installed in my system.
I noticed that on one port I had close to 80% packet loss when pinging it, and reassigning the logical interface that was on that hardware interface to a USB Ethernet interface instantly cleared up the networking mess I was experiencing. I also swapped cables, and use the same cables now with the USB interface.
On the other hand, the first interface on the card seems to behave just fine (from what I can tell).So I know that the card isn't totally fried, I know the OS recognizes it, I know the cables are OK and I know the configuration isn't to blame for the matter.
What I don't know, if there's a plausible way for a card to fail such that both interfaces are visible, and one of the interfaces is damaged enough to only pass a small amount of traffic, rather than no traffic at all.
Basically, I'm trying to figure out, if this could potentially be a driver issue (I'm on an amd64 install).
I can try to track it down by swapping the nic card into some other system and/or moving to an i386 install, but both of these are going to take major amounts of time, so if there are known issues with cards like these, now would be a good time to know, before I spend the better part of a day taking the system apart, switching components, reinstalling the OS, etc.Also, if you have any other ideas on how to test this matter further, I'm all ears and eyes…
Here's the OS' idea of what this card is:
mskc0: <marvell yukon="" 88e8062cu="" gigabit="" ethernet="">port 0xd800-0xd8ff mem 0xfaefc000-0xfaefffff irq 16 at device 0.0 on pci2
msk0: <marvell technology="" group="" ltd.="" yukon="" xl="" id="" 0xb3="" rev="" 0x03="">on mskc0
miibus1: <mii bus="">on msk0
e1000phy0: <marvell 88e1112="" gigabit="" phy="">PHY 0 on miibus1
e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
msk1: <marvell technology="" group="" ltd.="" yukon="" xl="" id="" 0xb3="" rev="" 0x03="">on mskc0
miibus2: <mii bus="">on msk1
e1000phy1: <marvell 88e1112="" gigabit="" phy="">PHY 0 on miibus2
e1000phy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
mskc0: [ITHREAD]mskc0@pci0:2:0:0: class=0x020000 card=0x622211ab chip=0x434311ab rev=0x14 hdr=0x00
class = network
subclass = ethernet
bar [10] = type Memory, range 64, base 0xfaefc000, size 16384, enabled
bar [18] = type I/O Port, range 32, base 0xd800, size 256, enabled
cap 01[48] = powerspec 2 supports D0 D1 D2 D3 current D0
cap 03[50] = VPD
cap 05[5c] = MSI supports 2 messages, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 legacy endpoint max data 128(128) link x4(x4)</marvell></mii></marvell></marvell></mii></marvell></marvell> -
Have you tried disabling hardware checksum offload?
-
Have you tried disabling hardware checksum offload?
Nope, have not. Will do. I'll post if that helps the matter.
Is there something about it that would make it likely to work with only one out of two ports on the same card, or is it just that the symptoms indicate it could be related to that?
-
Is there something about it that would make it likely to work with only one out of two ports on the same card, or is it just that the symptoms indicate it could be related to that?
Some drivers have been known to have problems with hardware checksum offloads and (if I recall correctly) I have seen reports of similar sorts of error rates. I have know idea if the msk driver is affected by these problems nor if these errors are in any way traffic dependent.
-
Have you tried disabling hardware checksum offload?
Is there a way to turn this off on a per device or per driver family basis?
i.e. if the Marvel-Yukon driver or card has an issue with this, I don't necessarily want to turn this off on my built-in motherboard's NIC.Also, since the first port on the card seems to work with hardware checksum offload enabled, it would be even better if that could be enabled/disabled per port rather than just per device or driver family.
-
Is there a way to turn this off on a per device or per driver family basis?
Not from the web GUI. I suggest you try it and see if it makes a difference. If there is no difference you will have to look further.
Also, does it make a difference to swap the use of the two msk interfaces?
-
So I know that the card isn't totally fried, I know the OS recognizes it, I know the cables are OK and I know the configuration isn't to blame for the matter.
What I don't know, if there's a plausible way for a card to fail such that both interfaces are visible, and one of the interfaces is damaged enough to only pass a small amount of traffic, rather than no traffic at all.
Basically, I'm trying to figure out, if this could potentially be a driver issue (I'm on an amd64 install).
I can try to track it down by swapping the nic card into some other system and/or moving to an i386 install, but both of these are going to take major amounts of time, so if there are known issues with cards like these, now would be a good time to know, before I spend the better part of a day taking the system apart, switching components, reinstalling the OS, etc.Also, if you have any other ideas on how to test this matter further, I'm all ears and eyes…
Inspect the RJ45 jack's and look for any deformation of the pins. Or maybee some "dirt" has ended up in there… use a good torch!