Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Multi-WAN - One of two WAN in failover drops ~1-2 min. for unknown reason

    Scheduled Pinned Locked Moved Routing and Multi WAN
    7 Posts 3 Posters 621 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • W
      wm408
      last edited by

      Hi,

      Note: the log sample below shows newer records at the top, older at the bottom.

      I get recurring messages regarding one of my gateways going down, intermittently. Sometimes within minutes, there will be a repeat. Other times, hours will go by without any activity:

      Sep 23 17:11:37	php-fpm	330	/rc.dyndns.update: MONITOR: WAN2GW is available now, adding to routing group WAN2failtoWAN 8.8.4.4|74.51.222.14|WAN2GW|15.522ms|4.223ms|0.0%|none
      Sep 23 17:11:36	check_reload_status		Reloading filter
      Sep 23 17:11:36	check_reload_status		Restarting OpenVPN tunnels/interfaces
      Sep 23 17:11:36	check_reload_status		Restarting ipsec tunnels
      Sep 23 17:11:36	check_reload_status		updating dyndns WAN2GW
      Sep 23 17:10:20	php-fpm	328	/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN2GW.
      Sep 23 17:10:20	php-fpm	328	/rc.dyndns.update: MONITOR: WAN2GW is down, omitting from routing group WAN2failtoWAN 8.8.4.4|74.51.222.14|WAN2GW|1341.057ms|2554.332ms|0.0%|down
      Sep 23 17:10:19	check_reload_status		Reloading filter
      Sep 23 17:10:19	check_reload_status		Restarting OpenVPN tunnels/interfaces
      Sep 23 17:10:19	check_reload_status		Restarting ipsec tunnels
      Sep 23 17:10:19	check_reload_status		updating dyndns WAN2GW
      Sep 23 17:06:32	php-fpm	329	/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN2GW.
      Sep 23 17:06:32	php-fpm	329	/rc.dyndns.update: MONITOR: WAN2GW is available now, adding to routing group WAN2failtoWAN 8.8.4.4|74.51.222.14|WAN2GW|69.166ms|119.861ms|0.0%|none
      Sep 23 17:06:31	check_reload_status		Reloading filter
      Sep 23 17:06:31	check_reload_status		Restarting OpenVPN tunnels/interfaces
      Sep 23 17:06:31	check_reload_status		Restarting ipsec tunnels
      Sep 23 17:06:31	check_reload_status		updating dyndns WAN2GW
      
      

      This only seems to happen when the check_reload_status occurs.

      Any thoughts or comments?

      Thanks.

      1 Reply Last reply Reply Quote 0
      • W
        wm408
        last edited by

        Bump.

        Jimp? or cmb?

        Any thoughts from you guys?  8)

        1 Reply Last reply Reply Quote 0
        • H
          heper
          last edited by

          CMB has gone to where the grass appears greener.

          Check your gateway logs. That should provide more insights in the reason why the gateway goes down

          1 Reply Last reply Reply Quote 0
          • W
            wm408
            last edited by

            Hi,

            I am going to test "Set ping payload size" to the problematic gateway. cmb advised this for "…buggy upstream devices...", which I may be experiencing here. Refer to this post: https://forum.pfsense.org/index.php?topic=110043.0

            This is a sample from my Gateways log:

            Oct 1 10:45:23 dpinger WAN2GW 8.8.4.4: Clear latency 209224us stddev 412652us loss 0%
            Oct 1 10:43:40 dpinger WAN2GW 8.8.4.4: Alarm latency 762289us stddev 1626523us loss 0%
            Oct 1 10:29:15 dpinger WAN2GW 8.8.4.4: Clear latency 42259us stddev 123654us loss 0%
            Oct 1 10:27:10 dpinger WAN2GW 8.8.4.4: Alarm latency 755434us stddev 2017180us loss 4%
            Oct 1 10:14:02 dpinger WAN2GW 8.8.4.4: Clear latency 443760us stddev 978113us loss 0%
            Oct 1 10:13:35 dpinger WAN2GW 8.8.4.4: Alarm latency 504314us stddev 971819us loss 0%

            I didn't know Chris had moved on, till now. I saw his post. Makes sense! thanks for the heads up.

            @heper:

            CMB has gone to where the grass appears greener.

            Check your gateway logs. That should provide more insights in the reason why the gateway goes down

            1 Reply Last reply Reply Quote 0
            • DerelictD
              Derelict LAYER 8 Netgate
              last edited by

              Well, there you go. dpinger is doing its job.

              If you have gateway monitoring on WAN (the default setting), the system is automatically keeping track of two pings per second in Status > Monitoring.

              From there select settings, change the left axis to Quality / WANGW (or the local equivalent).

              A good place to start with Options: 8 hours, Resolution: 1 minute.

              Another place to check is in Status > System Logs, Gateways. Any events there with "Alarm" in them are times when the ping monitor had excessive loss or latency.

              A failure will look something like this: Jan 7 15:05:31 dpinger WANGW 8.8.8.8: Alarm latency 0us stddev 0us loss 100%

              Lines like this are just the dpinger process starting or reloading and are normal:

              dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.4.4 bind_addr 198.51.0.16 identifier "DSLGW "

              Sometimes it is beneficial to change your monitoring address to something further out. In that example you can see that I am monitoring a google DNS server there. In general, monitoring the ISP gateway is fine if it reliably responds to pings. Changes to the monitor IP address can be made in System > Routing and editing the appropriate gateway.

              Chattanooga, Tennessee, USA
              A comprehensive network diagram is worth 10,000 words and 15 conference calls.
              DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
              Do Not Chat For Help! NO_WAN_EGRESS(TM)

              1 Reply Last reply Reply Quote 0
              • W
                wm408
                last edited by

                Hi Derelict,

                Typically for the Monitor IP, I choose the ISP gateway or one hop past (as observed with traceroute). But lately for at least testing, I've set the problematic gateway's Monitor IP to a google DNS server also as that's been a popular choice throughout the forums.

                Thanks for your other tips. I will circle back and review each of your points after I look at the results with the topic I mentioned in an earlier post, re: ping payload size.

                @Derelict:

                Well, there you go. dpinger is doing its job.

                If you have gateway monitoring on WAN (the default setting), the system is automatically keeping track of two pings per second in Status > Monitoring.

                From there select settings, change the left axis to Quality / WANGW (or the local equivalent).

                A good place to start with Options: 8 hours, Resolution: 1 minute.

                Another place to check is in Status > System Logs, Gateways. Any events there with "Alarm" in them are times when the ping monitor had excessive loss or latency.

                A failure will look something like this: Jan 7 15:05:31 dpinger WANGW 8.8.8.8: Alarm latency 0us stddev 0us loss 100%

                Lines like this are just the dpinger process starting or reloading and are normal:

                dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.4.4 bind_addr 198.51.0.16 identifier "DSLGW "

                Sometimes it is beneficial to change your monitoring address to something further out. In that example you can see that I am monitoring a google DNS server there. In general, monitoring the ISP gateway is fine if it reliably responds to pings. Changes to the monitor IP address can be made in System > Routing and editing the appropriate gateway.

                1 Reply Last reply Reply Quote 0
                • W
                  wm408
                  last edited by

                  Hi,

                  After reviewing the ping payload size, and also your recommendations, I still have the same issue.
                  Let me know if any other suggestions come to mind. Thx.

                  Oct 7 15:31:19	dpinger		WAN2GW 8.8.4.4: duplicate echo reply received
                  Oct 7 15:31:19	dpinger		WAN2GW 8.8.4.4: duplicate echo reply received
                  Oct 7 15:29:46	dpinger		WAN2GW 8.8.4.4: Alarm latency 46725667us stddev 0us loss 95%
                  Oct 7 15:28:14	dpinger		WAN2GW 8.8.4.4: Alarm latency 15032us stddev 3426us loss 25%
                  Oct 7 15:26:44	dpinger		WAN2GW 8.8.4.4: Clear latency 15014us stddev 2740us loss 0%
                  

                  @wm408:

                  Hi Derelict,

                  Typically for the Monitor IP, I choose the ISP gateway or one hop past (as observed with traceroute). But lately for at least testing, I've set the problematic gateway's Monitor IP to a google DNS server also as that's been a popular choice throughout the forums.

                  Thanks for your other tips. I will circle back and review each of your points after I look at the results with the topic I mentioned in an earlier post, re: ping payload size.

                  @Derelict:

                  Well, there you go. dpinger is doing its job.

                  If you have gateway monitoring on WAN (the default setting), the system is automatically keeping track of two pings per second in Status > Monitoring.

                  From there select settings, change the left axis to Quality / WANGW (or the local equivalent).

                  A good place to start with Options: 8 hours, Resolution: 1 minute.

                  Another place to check is in Status > System Logs, Gateways. Any events there with "Alarm" in them are times when the ping monitor had excessive loss or latency.

                  A failure will look something like this: Jan 7 15:05:31 dpinger WANGW 8.8.8.8: Alarm latency 0us stddev 0us loss 100%

                  Lines like this are just the dpinger process starting or reloading and are normal:

                  dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.4.4 bind_addr 198.51.0.16 identifier "DSLGW "

                  Sometimes it is beneficial to change your monitoring address to something further out. In that example you can see that I am monitoring a google DNS server there. In general, monitoring the ISP gateway is fine if it reliably responds to pings. Changes to the monitor IP address can be made in System > Routing and editing the appropriate gateway.

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.