Weird issue
-
I have TNSR installed on a couple of HP servers with 4 10g NICs each, divided into 2x 20g LACP bonds per server (LAN and WAN).
VRRP is configured, and NAT.
When I set the VRRP LAN IP as a default gateway for a client device and ping something on the internet, I get 3 pings, a timeout, and repeat. When doing a speedtest I get a transient drop in speed every few seconds. Otherwise it works fine. What did I do wrong?...
Thanks for any help! -
I would check the switch and the server maybe mismatched settings on lacp ?
-
@kiokoman Good point but I think it's all OK. bond settings:
<bond-table> <bond> <instance>0</instance> <mode>lacp</mode> <load-balance>l34</load-balance>
As far as I can tell this should be compatible with the Unifi switches. I am using budget Chinese SFP+ DAC cables but hopefully that has nothing to do with it...
-
@schnitzel_itdept
from the documentation:
there is a default timeout of 3 seconds when monitoring bonding peers with LACP.
could this be related to the problem? (3 sec = 3 ping -> timeout )
https://docs.netgate.com/tnsr/en/latest/interfaces/types-bond.html#bond-interface-settings -
@kiokoman
Works great if I turn off one of the servers and/or disable the bonds on one of the servers, so the other one takes over VRRP master. So I think the problem is to do with VRRP.... -
I set it up as per https://docs.netgate.com/tnsr/en/latest/recipes/vrrp-nat/index.html and I can see the second node occasionally being elected master every few seconds.... Argh!
-
@schnitzel_itdept
storm control / rate limiting on multicast?master will transmit advertisements. If other nodes fail to see advertisements from a higher priority node in a timely manner defined by the settings, control of the virtual address is assumed by the backup node with the next highest priority
-
Can you try different cables just to rule that out as a cause?
-
@kiokoman we have flow control enabled on a few Unifi switches in order to speed up wifi, but they are quite a ways downstream from where the TNSR machines are
@audian I tried different cables to different switches, different SFP+ cards (Intel X520), and a whole different server for node A. No luck...
Here are the VRRP settings... node A is internal IP .11 and external .181, B is 12 and 182. NAT is configured
-
@schnitzel_itdept Can do you do a packet capture on server B to double check all advertisements are received correctly?