Never tried this kind of link issue before… SG1537 to Aruba CX6100 on 10Gbe SFP+

keyser

Hi all.

I consider myself fairly skilled in network troubleshooting and have 20+ years of experience in link issues and switch “behaviour”.

But i have never yet met such a mystery I have uncovered this week attempting to get an sg-1537 to play nice with an Aruba CX 6100 switch using an 10Gbe SFP+ interface.

TESTSETUP: Greenfield link test with factory new switch (tried 3 different) and factory reset SG1537 (also tried 3 different here).
CX6100 Config apart from factory default VLAN 1:
Disabled spanning tree, Client on 1gbe and SG1537 on 10Gbe SFP+
SG1537 Config apart from factory default:
Static Interface IP address configured on IX0 (Opt1) and an allow any/any rule for OPT1

PROBLEM: At every linkup it is completely random if the link works exactly as expected, or if it will drop a few, alot or all frames in the 1537 TX -> 6100 RX direction with CRC Errors logged on the swithport side. There is no RX or TX errors recorded on the pfSense interface (sysctl -a for ix0). The test is a simple PING from the 1Gbe Linked client to the OPT1 10Gbe linked interface. So the packets dropped is actually the Echo reply frame from pfSense to the Client.
WIthout touching the cables but administratively shutting and unshutting the switch interface the randomness reoccours and I might have a good link or even worse link this time. When the link is good I can start saturating it with 1460bytes pings with 5 ms in between and nothing is dropped. Once good, the link remains good until linkdown/poweroff.

JUST TO RULE HARDWARE OUT: I have tried 3 different 1m DAC cables (one Netgate, one Aruba and one Cisco), 6 different supported SR tranceivers in combination with 3 different OM4 50 micron cables! Obvisouly tested all four SFP+ ports on the switches and both IX0 and IX1 on the Firewall. IT’s NOT the used cable, the used tranciever, the actual switch or the actual 1537 causing this.
The setup behaves completely as expected if I instead use a switch 1Gbe link and IGB0 og IGB1 in the 1537.

I’m good at methodically testing if is speed autonegotiate, Flow control autonegotiate or combinations thereof that causes issues like this. That is not the issue here. I have tried all validated configs with confirmed full auto on both sides, confirmed no auto - aka. Static 10Gbe, again confirmed with or without Flow Control autonegotiation on both sides - and everything in between. Makes no difference.

In other tests with devices linked to the CX 6100 switch(es) using 10Gbe, I do not see CRC error drops. I have tried servers, SG6100’s and other switches. Likewise - with the SG1537(‘s) linked to fx. Aruba CX6200 or CX6300 switches i do not see any drops of echo reply frames in the switches. I have tried 3 different latest LSR release firmware generations on the CX switches (10.7, 10.10 and 10.13)

CONCLUSION: I’m going borderline mad… It just seems these two device models cannot interoperate in a predictable manner.

AN OBSERVATION: When the link is really bad or just bad (dropping all, most, or just a lot of frames) I have oberserved that if I start hammering the link with the 1460bytes 5ms spaced Ping frames, the Link will still not fail, but suddenly no packets are passed for about 5 sec. And then it passes frames again, but now - usually - with A LOT better quality, some times even a perfect link.
It’s like the switch goes through a recalibration of it’s RX link upon learning its dropping A LOT of frames. And usually that results in a much better link - though not always the perfect link.

QUESTION: Is there any merrit to this theory?
Is there well established vendors that has well known and widely used devices out there that just does not link dependently to one particular model of another well known vendors widely used device - both of them with a very good track record and reviews?

Any ideas???

keyser

This post is deleted!