Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Dual WAN Failover bounces on and off…

    Routing and Multi WAN
    5
    16
    7.0k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T
      tseanf
      last edited by

      Last night I switched the two connections.  Everything seemed to work fine, when I pulled the plug on the DSL it switched over to cable modem and seemed to stay active.

      When I switched it back the other way around my issue persisted unfortunately.  The DSL connection on the failover is just really flaky.  I really think its something with my setup and the monitoring or something, not sure.

      But, its not a big deal.  Just would have been nice to get working.  Thanks for your help though.

      -Sean

      1 Reply Last reply Reply Quote 0
      • T
        tseanf
        last edited by

        The other thing to note in case anyone has any weird things to try, is that I flipped my pool to load balance, and everything is working fine.  Both connections show Online constantly.  And I verified it was balancing between the two.

        1 Reply Last reply Reply Quote 0
        • M
          MindTwist
          last edited by

          @tseanf:

          The other thing to note in case anyone has any weird things to try, is that I flipped my pool to load balance, and everything is working fine.  Both connections show Online constantly.  And I verified it was balancing between the two.

          Tseanf,
          I am finishing seting up multi wan with failover, and I was wondering, how or where do you see if one of the links is "Online" or not? WHere do you go to see that status?
          Thanks!

          1 Reply Last reply Reply Quote 0
          • H
            hoba
            last edited by

            current status can be seen at status>loadbalancer and historical changes at status>systemlogs, loadbalancer.

            1 Reply Last reply Reply Quote 0
            • D
              drees
              last edited by

              You know, I noticed a similar issue with failover "flapping" just today on my Multi WAN setup (T1 and DSL line, static IPs on both). Running pfSense 1.2.

              I have three pools, a load balance pool, and a WAN1-failto-WAN2 and a WAN2-failto-WAN1 pool so that I can prefer certain gateways depending on the type of traffic.

              What happened was that over a period of about 5-10 minutes (I can get logs tomorrow) both WAN connections would fail ping tests for a few seconds, then they'd be OK for a few seconds, then they'd fail again for a few seconds and so on. It sounds strangely familar to the issue tseanf describes, so I will try testing failover of each line tomorrow to see if one line failing triggers the flapping for some reason.

              1 Reply Last reply Reply Quote 0
              • T
                tseanf
                last edited by

                I didn't know about the load balancer logs until Hoba mentioned them.  I will also play with it more tonight and report what logs I have when my connection is flapping.

                1 Reply Last reply Reply Quote 0
                • H
                  hoba
                  last edited by

                  status>rrd graphs, quality graphs might be interesting as well. If the connections become unreliable you should see it there as well. You might even be able by comparing other graphs (states, pps,…) and the times when your wans go down if something else is going on.

                  1 Reply Last reply Reply Quote 0
                  • D
                    drees
                    last edited by

                    Looking at my incident yesterday, here is a log snippet:

                    Apr 29 16:12:16 	slbd[427]: Service WAN1FailsToWAN2 changed status, reloading filter policy
                    Apr 29 16:12:16 	slbd[427]: ICMP poll succeeded for 12.213.4.24, marking service UP
                    Apr 29 16:12:16 	slbd[427]: ICMP poll succeeded for 68.94.156.1, marking service UP
                    Apr 29 16:12:15 	slbd[427]: Switching to sitedown for VIP 127.0.0.1:666
                    Apr 29 16:12:15 	slbd[427]: Switching to sitedown for VIP 127.0.0.1:666
                    Apr 29 16:12:14 	slbd[427]: Service WAN2FailsToWAN1 changed status, reloading filter policy
                    Apr 29 16:12:14 	slbd[427]: ICMP poll succeeded for 68.94.156.1, marking service UP
                    Apr 29 16:12:14 	slbd[427]: ICMP poll succeeded for 12.213.4.24, marking service UP
                    Apr 29 16:12:11 	slbd[427]: Service WAN1FailsToWAN2 changed status, reloading filter policy
                    Apr 29 16:12:11 	slbd[427]: ICMP poll failed for 12.213.4.24, marking service DOWN
                    Apr 29 16:12:11 	slbd[427]: ICMP poll failed for 68.94.156.1, marking service DOWN
                    Apr 29 16:12:10 	slbd[427]: Switching to sitedown for VIP 127.0.0.1:666
                    Apr 29 16:12:10 	slbd[427]: Switching to sitedown for VIP 127.0.0.1:666
                    Apr 29 16:12:09 	slbd[427]: Service WAN2FailsToWAN1 changed status, reloading filter policy
                    Apr 29 16:12:09 	slbd[427]: ICMP poll failed for 68.94.156.1, marking service DOWN
                    Apr 29 16:12:09 	slbd[427]: ICMP poll failed for 12.213.4.24, marking service DOWN
                    Apr 29 16:12:06 	slbd[427]: Service LoadBalance changed status, reloading filter policy
                    Apr 29 16:12:06 	slbd[427]: ICMP poll failed for 68.94.156.1, marking service DOWN
                    Apr 29 16:12:06 	slbd[427]: ICMP poll failed for 12.213.4.24, marking service DOWN
                    

                    68.94.156.1 is the monitor IP for WAN1, 12.213.4.24 is the monitor IP for WAN2.

                    This basically continued for nearly 10 minutes (started at 16:12:06 and ended at 16:21:18) with the pools going up and down every few seconds when it mysteriously cleared itself up.

                    The quality graphs indicated heavy packet loss on both links during this time.

                    Now, I also use SmokePing to monitor the pfSense box and both lines using several other machines around the world, and while one did pick up some packet loss on both lines, the others picked up some packet loss on WAN2 but did not see any at all on WAN1.

                    Based on this I think it's possible that WAN2 did in fact go down for a bit, but WAN1 most likely did not.

                    How does the ICMP poll detect failures and then determine that a host is down?

                    1 Reply Last reply Reply Quote 0
                    • H
                      hoba
                      last edited by

                      It pings the monitor IPs every few seconds and if a line of x pings (not sure how many atm, iirc 5 or something in that range) fails the link is considered down. Maybe try some different Monitor IPs and see if that makes a difference? Unreliable monitor IPs can cause link down detection though the link is still up.

                      1 Reply Last reply Reply Quote 0
                      • T
                        tseanf
                        last edited by

                        Ok I did some testing tonight to grab the logs.

                        Cable Monitor IP: 68.87.77.130
                        DSL Monitor IP: 216.17.3.122

                        I disconnected Cable at 17:40:00 and plugged it back in at 17:44:30

                        Load Balancer Logs:

                        
                        Apr 30 17:40:11 	slbd[68488]: ICMP poll failed for 68.87.77.130, marking service DOWN
                        Apr 30 17:40:11 	slbd[68488]: Service FailoverInternet changed status, reloading filter policy
                        Apr 30 17:40:48 	slbd[68488]: ICMP poll failed for 216.17.3.122, marking service DOWN
                        Apr 30 17:40:48 	slbd[68488]: Service FailoverInternet changed status, reloading filter policy
                        Apr 30 17:40:49 	slbd[68488]: Switching to sitedown for VIP 127.0.0.1:666
                        Apr 30 17:41:00 	last message repeated 2 times
                        Apr 30 17:41:00 	slbd[68488]: ICMP poll succeeded for 216.17.3.122, marking service UP
                        Apr 30 17:41:00 	slbd[68488]: Service FailoverInternet changed status, reloading filter policy
                        Apr 30 17:42:03 	slbd[68488]: ICMP poll failed for 216.17.3.122, marking service DOWN
                        Apr 30 17:42:03 	slbd[68488]: Service FailoverInternet changed status, reloading filter policy
                        Apr 30 17:42:06 	slbd[68488]: Switching to sitedown for VIP 127.0.0.1:666
                        Apr 30 17:42:21 	last message repeated 3 times
                        Apr 30 17:42:25 	slbd[68488]: ICMP poll succeeded for 216.17.3.122, marking service UP
                        Apr 30 17:42:25 	slbd[68488]: Service FailoverInternet changed status, reloading filter policy
                        Apr 30 17:42:56 	slbd[68488]: ICMP poll failed for 216.17.3.122, marking service DOWN
                        Apr 30 17:42:56 	slbd[68488]: Service FailoverInternet changed status, reloading filter policy
                        Apr 30 17:42:58 	slbd[68488]: Switching to sitedown for VIP 127.0.0.1:666
                        Apr 30 17:43:13 	last message repeated 3 times
                        Apr 30 17:43:13 	slbd[68488]: ICMP poll succeeded for 216.17.3.122, marking service UP
                        Apr 30 17:43:13 	slbd[68488]: Service FailoverInternet changed status, reloading filter policy
                        Apr 30 17:43:55 	slbd[68488]: ICMP poll failed for 216.17.3.122, marking service DOWN
                        Apr 30 17:43:55 	slbd[68488]: Service FailoverInternet changed status, reloading filter policy
                        Apr 30 17:43:57 	slbd[68488]: Switching to sitedown for VIP 127.0.0.1:666
                        Apr 30 17:44:17 	last message repeated 4 times
                        Apr 30 17:44:17 	slbd[68488]: ICMP poll succeeded for 216.17.3.122, marking service UP
                        Apr 30 17:44:17 	slbd[68488]: Service FailoverInternet changed status, reloading filter policy
                        Apr 30 17:44:34 	slbd[68488]: ICMP poll succeeded for 68.87.77.130, marking service UP
                        Apr 30 17:44:34 	slbd[68488]: Service FailoverInternet changed status, reloading filter policy
                        
                        

                        RRD Quality Graph (Cable):

                        RRD Quality Graph (Cable):

                        Note on the graphs, I was messing a bit before 5:40 too, and both were flapping.  Also before settling on those Monitor IPs (which are DNS servers for each ISP), I tried many many different monitor IPs.

                        Thanks,

                        -Sean

                        1 Reply Last reply Reply Quote 0
                        • D
                          drees
                          last edited by

                          tseanf, is your Cable line OPT1 and your DSL line WAN or the other way around?

                          Your logs look just like mine (except you only have one load balance pool).

                          Have you checked your routing tables - are there static routes to the monitor IPs for each interface? Mine look OK now, but I have changed my loadbalance config since the flapping occurred. It seems like the pings for one of the monitor IPs are getting routed out the wrong interface.

                          1 Reply Last reply Reply Quote 0
                          • T
                            tseanf
                            last edited by

                            Cable is WAN and DSL is OPT1…

                            Yeah, I only set it up to fail over to DSL.  My Cable connection is 16mbps compared to my 1.5mbps DSL, so I don't really care about load balancing.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.