Redundant connection Blade servers
-
I'm going crazy - what do I do wrong?
I have a 3U Microblade - it is a bladserver from Supermicro: https://www.supermicro.com/en/products/microblade. It has two switches built in (hot-swappable of course). It has room for 14 servers.
The 14 servers have two network connections each - each connected to one port on the two internal switches through hidden internal wiring. So my understanding is that that the blade server switches operates as a normal switches. Therefor, I have an Active-Backup setup on all 14 blades (either in Windows or Linux).
A handfull of the servers will not be pingable and loose connection if I disconnect one of the blade switches (by pulling out the cable) and I don't understand why. The rest - 70% - will keep on working. I do not see a difference that it works on Linux vs Windows, it happens on both OSes.
Once per 12-24 hours, I also see some ping timeouts for some of the servers (1-2), but not as apparent as when I disconnect one of the blade switches where 30% falls out. When they fall out by themselves (and not be pulling out the cable), it automatically corrects/recover within seconds or minutes.
I have tried to disable the individual port (for instance, disable eno1/eth1 or eno2/eth2) inside the blade servers for one of the ports in the team and it automatically switches the 2nd and transfers data there. No issue. Both in Linux and Windows. I have also tried to instruct the network teams to give preference (be active) to one of the switch ports and it seems to work all until I disconnect the switch.
There are MSTP running on everything, but I can't see that beeing an issue here. Could it be something to do with it needing a lot of time to notice the disconnect somehow and it would be solved if I waited for long time?
image url)