XG-7100 member randomly stops passing traffic
sippycups last edited by sippycups
HA pair in CARP running 2.4.4_1. Single uplinks to a pair of MC-LAG edge routers. Single downlinks to core switch. These things ran for 6 months with zero issues. I upgraded to 2.4.4_1 36 days ago. Today, out of the blue, 01 stops routing traffic. In a panic, I go to the CARP config and hit "Temporarily disable CARP", then re-enable it. That brings things back up, and it stopped passing traffic again after about 30 minutes. This time I go and put 01 CARP into maintenance mode, keeping things on 02. Smooth sailing.
Any ideas what could have caused this? The logs are fairly benign. There's some gateway/dpinger alarms that have been happening as far back as I can see. Nothing has changed on edge or core config. I know Juniper treats traffic destined for the routing and management engine (itself) like dirt. Your ping times one hop away can be all over the place, because it always prioritizes transit traffic.
My plan for after hours tonight is to upgrade the pair to 2.4.4_2 (even though the release notes have nothing related) and tune the gateway monitoring a bit and hope for the best.