SG-5100 WAN failover at gigabit saturation
-
@steveits Thanks for your reply. Just seems odd that the gateway stays live and has no latency issues when it is the only gateway but once failover is introduced it starts misbehaving.
Will look into shaping / limiting, but its definitely a band aid and not a solution.
-
@ashlm The latency triggers the failover. Changing the latency threshold to say 1500ms would not trigger the failover. Or changing the "Time Period" on the gateway which makes it average over a longer time. That's of course not ideal if it is always that slow, but that's what we found to avoid the 30-second-busy failovers.
And yes limiter/shaping is in some ways a band aid but avoids the latency. IOW it's not really a pfSense problem, the problem is the device is flooding the connection, so pfSense is doing what it's been told and failing over when latency spikes.
In our case it was a client and we aren't on site so it took a long time to catch it while it was happening and track it down to a Mac, by MAC address. We think it was doing a backup or maybe a long video upload, never quite figured that out as we didn't get a great answer from the person. (which is why I think it was a backup)
-
@steveits Thanks again for the reply, it's very helpful.
@steveits said in SG-5100 WAN failover at gigabit saturation:
The latency triggers the failover.
Yes, but latency on the gigabit interface reaches the failover threshold (>1s) only when failover is enabled. RTTsd remains below 400ms, well below the failover threshold, when the gigabit interface is set as the solitary gateway, and the gigabit interface remains up for the entire test.
RTTsd only exceeds 1000ms when failover is enabled.
-
@ashlm Oh, I get what you're saying now! I hadn't noticed that but wasn't looking for it. That would explain why we only saw it at that client. We thought it was the Mac because that's the only device we ever saw "cause" the problem, on several occasions.
Since it sounds like you can reproduce it I suggest opening a case at redmine.pfsense.org and link to this thread.
-
-
@ashlm Is that the right URL? It talks about traffic shaping, and is from 8 days ago. :)
-
You're testing that in 21.02?
Can you upgrade to 22.01 and see if it's still happening?
Steve
-
@stephenw10 Apologies, that's a mistake. "22.01-RELEASE (amd64)
built on Mon Feb 07 16:37:59 UTC 2022." -
Ah, Ok. Do you know if this is new behaviour in 22.01?
-
@stephenw10 The same failover scenario manifested in 21.02 on the SG-3100, though that device couldn't achieve gigabit down on the WAN interface and was replaced with the SG-5100 without further testing with a solitary gateway. I updated th SG-5100 to the latest release before deployment, so can't say for certain if it would happen on 21.02 on the SG-5100.
-
@ashlm The issue issue is resolved, or rather is not an issue / not an accurate description. The same latency increase to >1s was recorded while testing the solitary gateway config this morning, therefore is no longer confined / attributable to enabling failover.
-
Ah, Ok thanks for the update. I couldn't replicate it here.