We found a link aggregation that looped rather than aggregating in a switch as well as some ports that was untagged for multiple networks that are now disabled. This has lowered the load, but not fixed the problem.
I saw this in the gateway log this morning:
Feb 28 04:04:32 dpinger 81222 WANGWv6_GC a:b:c::1: sendto error: 55 Feb 27 21:06:24 dpinger 80779 WAN_GlobalConnectGW a.b.c.193: sendto error: 55 Feb 27 21:06:24 dpinger 81222 WANGWv6_GC a:b:c::1: sendto error: 55 Feb 27 21:06:15 dpinger 81222 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr a:b:c::1 bind_addr a:b:c::3 identifier "WANGWv6_GC " Feb 27 21:06:15 dpinger 80779 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr a.b.c.193 bind_addr a.b.c.195 identifier "WAN_GlobalConnectGW " Feb 27 21:02:37 dpinger 40064 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr a:b:c::1 bind_addr a:b:c::3 identifier "WANGWv6_GC " Feb 27 21:02:37 dpinger 36210 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr a.b.c.193 bind_addr a.b.c.195 identifier "WAN_GlobalConnectGW "Unlike before it seems to affect cpu and ram a lot too. around 8GB ram and a load average of 4.29, 5.47, 5.91. Swap is unused.
specs are:
Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
16 CPUs: 1 package(s) x 8 core(s) x 2 hardware threads
I did a state reset to clean up, however no change to latency.
I am starting to think this is merely traffic hitting a "threshold" that we haven't hit before, which is impacting the processing speed, yet we are not really processing any more data than we did previously, as far as I can see.
I am unsure how to proceed, can anyone give any good ways I could troubleshoot and locate the source of the problems?