SG-3100 Loadbalance and failover
-
Hello all, I am having problems setting up load balancing and failover for a dual WAN setup. I see the gatewate groups and alarms being generated, but it is never failing over:
May 23 16:31:14 rc.gateway_alarm 82750 >>> Gateway alarm: WAN_STARLINK_DHCP (Addr:8.8.8.8 Alarm:0 RTT:39.250ms RTTsd:10.998ms Loss:5%) May 23 16:31:14 check_reload_status 374 updating dyndns WAN_STARLINK_DHCP May 23 16:31:14 check_reload_status 374 Restarting ipsec tunnels May 23 16:31:14 check_reload_status 374 Restarting OpenVPN tunnels/interfaces May 23 16:31:14 check_reload_status 374 Reloading filter May 23 16:31:15 php-fpm 360 /rc.openvpn: MONITOR: WAN_STARLINK_DHCP is available now, adding to routing group LoadBalanced May 23 16:31:15 php-fpm 360 8.8.8.8|192.168.1.232|WAN_STARLINK_DHCP|39.065ms|10.932ms|2%|online|none May 23 16:32:08 rc.gateway_alarm 67343 >>> Gateway alarm: WAN_STARLINK_DHCP (Addr:8.8.8.8 Alarm:1 RTT:37.056ms RTTsd:8.504ms Loss:21%) May 23 16:32:08 check_reload_status 374 updating dyndns WAN_STARLINK_DHCP May 23 16:32:08 check_reload_status 374 Restarting ipsec tunnels May 23 16:32:08 check_reload_status 374 Restarting OpenVPN tunnels/interfaces May 23 16:32:08 check_reload_status 374 Reloading filter May 23 16:32:09 php-fpm 360 /rc.openvpn: MONITOR: WAN_STARLINK_DHCP has packet loss, omitting from routing group LoadBalanced May 23 16:32:09 php-fpm 360 8.8.8.8|192.168.1.232|WAN_STARLINK_DHCP|37.204ms|8.526ms|23%|down|highloss May 23 16:33:39 rc.gateway_alarm 86689 >>> Gateway alarm: WAN_STARLINK_DHCP (Addr:8.8.8.8 Alarm:0 RTT:116.274ms RTTsd:281.754ms Loss:5%) May 23 16:33:39 check_reload_status 374 updating dyndns WAN_STARLINK_DHCP May 23 16:33:39 check_reload_status 374 Restarting ipsec tunnels May 23 16:33:39 check_reload_status 374 Restarting OpenVPN tunnels/interfaces May 23 16:33:39 check_reload_status 374 Reloading filter May 23 16:33:41 php-fpm 23581 /rc.openvpn: MONITOR: WAN_STARLINK_DHCP is available now, adding to routing group LoadBalanced May 23 16:33:41 php-fpm 23581 8.8.8.8|192.168.1.232|WAN_STARLINK_DHCP|114.721ms|278.25ms|3%|online|none
Does anyone have a good troubleshooting guide to try and determine what is setup wrong here? I have the gateway groups, firewall lan rules, and DNS setup in the general settings tab.
-
@kramer9 said in SG-3100 Loadbalance and failover:
I see the gatewate groups and alarms being generated, but it is never failing over
Hi,
What is your exact GW Group setting, can you show us?
-tier(s)
and
Trigger Level:-member down
-high latency
-packet loss
-loss + high latencyBTW:
Known DNS server(s) as minitor IP, not a good choice here, as it does not exactly show your ISP connection, as the PING depends on the DNS server load too....
Can you remove your Dual-NAT configuration? (bridge mode)
-
-
@kramer9 said in SG-3100 Loadbalance and failover:
best way to show
(?) What I'd do, if I can't eliminate the Dual-NAT, so it stays the monitor IP on the DNS server(s), or "tracert" to find a nearby ISPs upstream GW that responds to the PING.
(results in a better measurement and a more stable value for the "dpinger")I would not configure it for packet loss, but I would choose member down.
(Is there a big speed difference between the two ISP links?
if not then a plain loadbalance will solve all your question and no failover setup is neededThis will help (this is a rough link - suddenly- I couldn't find a better one for you):
https://www.cyberciti.biz/faq/howto-configure-dual-wan-load-balance-failover-pfsense-router/?cf_chl_captcha_tk=b19a8d5b347fd3f6a25579b8c123f3ca7dd76d3a-1621868538-0-AaaAJyc-XA0E_URuyvq0PWv1HMcVWaLA4YlA9uq7f61D_EDbT6SdOjLrN1YNALceSrBn9ni3SZ0nlGyt5I_Tq84TJGAbMGvFE9M7ZUbtNDxplLM-ZDHu6NnftrAaEQiFjYg0SgL9q-83tjIlR1-hq6N5VWtGAqZW-u-sKKAHkSDa1EG4FRJdiQHDSekvGkAr93cuC4GnTw2McCMXeac3PZGteBkSCKnT5IkEPmR1oP7rJur3TAmtorH07uMw3O73r53cFKo29BCVD04qJ07Qqe86tKSZw2SQEskOz20mes1NUh1CMK1LPO7vJaSfqjgEl6pVzIX_tK-0-pzww_zsjSaX0iNlwF5JfEMBwmvxlgRnodHOCufP-w35cf8KbvnRKQGLaKS__z1tTiZiS5WiDldda7TcLE8xLL10jbHjV0eMrUrmmbxYSl_KiInn8845gbYf4I2yNrt2T6GMCAXXtQpWD6v3kQcl4VMKwCD_LL_BP9uy0ufhoBoFhjS-j1cbThASyTs8WufVhg143Rj2seGN4SKQsXmwHdUNzzJ_DOv7TucHqZhY0ZmiCG2QNqRLPRZ2rsl5wJi1oXadTQTrTpLVvfWVXdePbuzjslThiK10ztKkbfr6JqOAxQ2xWXnRG7fRqKFXE5Z5p_bVWVh8yoKa78YY2ag107cLwOp3J2lJtNiWSiIGC-mcRFx7FyMPqSitREY1-u-1gJh95ulIogyvrYz_LNtVDcyJ-WEgVhKah2KFo6Kg6cuFzHDiFEMf4w
-
Many thanks! Made a couple of minor tweaks based on the url and your notes. Works like a champ now for both load balancing and failover, changed to speedtest.net to test the connection from the endpoints to make sure, the only thing they see if the speed cut drastically. There is a HUGE speed difference Starlink gives me about 140M/down and USCellular 15M/down. Since starlink still has lags and gaps, thats why cellular is the backup for LB and failover.
So where are you seeing the Dual-Nat? I haven't switched the house over to this setup until I get the mesh hardware, on backorder.
-
@kramer9 said in SG-3100 Loadbalance and failover:
There is a HUGE speed difference Starlink gives me about 140M/down and USCellular 15M/down.
Okay
so I understand your dual approach loadbalance / failover@kramer9 "So where are you seeing the Dual-Nat?"
I think I saw an RFC1918 IP address on the WAN_STARLINK_DHCP gateway, correct me if I'm wrong and this is just a test....
-
@daddygo said in SG-3100 Loadbalance and failover:
RFC1918 IP address on the WAN_STARLINK_DHCP gateway
I've seen comments elsewhere that Starlink uses CGNAT.
-
@steveits said in SG-3100 Loadbalance and failover:
I've seen comments elsewhere that Starlink uses CGNAT.
Well then I saw it right
Aha, this is not the best situation, because you can only hope that the CGNAT is only because of the few IPv4 address space of the provider and there are no nonsense filtering rules on the NAT.
It's like when you're at work and you need two hands and it's one fixed behind your back.
It's also strange that they use 192.168.0.0/16 and not 10.0.0.0/8, they're not that out of addresses then, hmmm?