Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    WAN2 goes down for packet loss, doesn't come back up until gateways page viewed

    Scheduled Pinned Locked Moved Routing and Multi WAN
    10 Posts 3 Posters 1.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      SteveITS Galactic Empire
      last edited by

      Have a client's SG-2440 with two WANs (it's the same one as https://forum.netgate.com/topic/147889/member-down-triggering-with-0-loss actually, though that's probably not relevant as that's on WAN1). It's on 2.4.4-p3. On multiple occasions this year if WAN2 goes down, it stays down until I log in to the router and view the gateways page, at which point pfSense suddenly realizes the connection is up again. Logs:

      Jul 6 18:01:32 	php-cgi 		notify_monitor.php: Message sent to support@example.com OK
      Jul 6 18:01:30 	php-fpm 	77236 	/system_gateways.php: 77236MONITOR: WAN2_DHCP is available now, adding to routing group GWGROUP 8.8.8.8|172.16.0.51|WAN2_DHCP|18.194ms|0.461ms|0.0%|none
      Jul 6 18:01:22 	php-fpm 	3179 	/index.php: Successful login for user 'admin' from: 173.x.x.x (Local Database)
      Jul 6 17:06:12 	check_reload_status 		Reloading filter
      Jul 6 17:06:12 	check_reload_status 		Restarting OpenVPN tunnels/interfaces
      Jul 6 17:06:12 	check_reload_status 		Restarting ipsec tunnels
      Jul 6 17:06:12 	check_reload_status 		updating dyndns WAN2_DHCP
      Jul 6 17:06:12 	rc.gateway_alarm 	99608 	>>> Gateway alarm: WAN2_DHCP (Addr:8.8.8.8 Alarm:0 RTT:18.124ms RTTsd:.421ms Loss:13%)
      Jul 6 17:03:26 	php-cgi 		notify_monitor.php: Message sent to support@example.com OK
      Jul 6 17:03:26 	php-fpm 	3179 	/rc.openvpn: MONITOR: WAN2_DHCP is down, omitting from routing group GWGROUP 8.8.8.8|172.16.0.51|WAN2_DHCP|18.312ms|0.485ms|22%|down
      Jul 6 17:03:25 	check_reload_status 		Reloading filter
      Jul 6 17:03:25 	check_reload_status 		Restarting OpenVPN tunnels/interfaces
      Jul 6 17:03:25 	check_reload_status 		Restarting ipsec tunnels
      Jul 6 17:03:25 	check_reload_status 		updating dyndns WAN2_DHCP
      Jul 6 17:03:25 	rc.gateway_alarm 	79942 	>>> Gateway alarm: WAN2_DHCP (Addr:8.8.8.8 Alarm:1 RTT:18.312ms RTTsd:.460ms Loss:21%)
      

      It doesn't matter if the delay for logging in is an hour or a couple days, it's immediate upon viewing the system_gateways.php page. Is there some way to get it to realize WAN2 is online again?

      WAN1 doesn't seem to have this problem.

      Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
      When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
      Upvote ๐Ÿ‘ helpful posts!

      1 Reply Last reply Reply Quote 0
      • S
        serbus
        last edited by

        Hello!

        Could be related to:

        https://redmine.pfsense.org/issues/9450

        John

        Lex parsimoniae

        1 Reply Last reply Reply Quote 0
        • S
          SteveITS Galactic Empire
          last edited by

          Hmm, sounds similar. dpinger logged:

          Jul 6 17:06:12 dpinger WAN2_DHCP 8.8.8.8: Clear latency 18124us stddev 421us loss 13%
          Jul 6 17:03:25 dpinger WAN2_DHCP 8.8.8.8: Alarm latency 18312us stddev 460us loss 21%

          So that cleared the gateway down because it was under 20% packet loss?

          I definitely do not have to save the gateway but I have clicked the edit button to open the gateway. I can try next time to just sit on the system_gateways.php page for a bit and see if it sends the email.

          Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
          When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
          Upvote ๐Ÿ‘ helpful posts!

          1 Reply Last reply Reply Quote 0
          • S
            serbus
            last edited by

            Hello!

            If you want to go-kludge, you could run some code like this when the gw status is out of sync :

            /***********************************************************************/
            #!/usr/local/bin/php-cgi -q
            <?php
            require_once("gwlb.inc");

            $options = getopt("g:");

            $members = [];

            if ($options['g'] <> "") {
            $gwgroup = $options['g'];
            }

            if (!empty($gwgroup)) {
            $members = get_gwgroup_members($gwgroup);
            }

            var_dump ($members);

            ?>
            /***********************************************************************/

            Run...

            php /saved/here/named_this.php -g="GWGRP_Name"

            ...from a shell/cron/DiagCommandPrompt/etc...

            This might prod get_gwgroup_members_inner() to reactivate the member.

            John

            Lex parsimoniae

            N 1 Reply Last reply Reply Quote 0
            • N
              netblues @serbus
              last edited by

              I wouldn't trust pinging google dns for gateway availability. I have seen google rate limiting pings leading to failing pings, (when at the same time everything else works.)
              You can always find something closer within your isp for such checks.

              As for the redmine bug, just hitting edit certainly doesn't do anything until you save..
              I don't see this in other multiwans though.

              S 1 Reply Last reply Reply Quote 0
              • S
                SteveITS Galactic Empire
                last edited by

                I edited that a bit and ran this from Diagnostics/Command Prompt:

                require_once("gwlb.inc");
                $members = [];
                $gwgroup = 'GWGROUP';
                if (!empty($gwgroup)) {
                $members = get_gwgroup_members($gwgroup);
                }
                var_dump ($members);
                

                That reconnected the gateway as you theorized. In practice of course just viewing the gateways page is easier. :)

                re: which IP to ping, I've tried picking an ISP's router partway up the chain and over time those can change. Since this is at a client's site that would be difficult to correct if the link goes down, though, in this case both WANs would likely not drop together. Pinging the ISP's router at the other end of the patch cable is of course not that helpful, though I've seen people leave the monitoring IP empty which does that. :)

                Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                Upvote ๐Ÿ‘ helpful posts!

                1 Reply Last reply Reply Quote 0
                • S
                  serbus
                  last edited by

                  Hello!

                  You could schedule the command to run every so often and then you wouldnt have to login to refresh the group manually.

                  John

                  Lex parsimoniae

                  1 Reply Last reply Reply Quote 0
                  • S
                    SteveITS Galactic Empire
                    last edited by

                    I set up a cron job. Before that, some interesting notes for posterity:

                    Twice in the last couple of weeks the WAN2 gateway status reset by itself at 1:01 am. There is a cron job that runs /etc/rc.dyndns.update at that time. No we don't have a DDNS set up. There are however other days in the last few months it did not reset itself at that time. Unclear why the difference.

                    I found by accident this morning that if I edit/add a firewall rule and save/reload, that also updates the gateway status.

                    Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                    When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                    Upvote ๐Ÿ‘ helpful posts!

                    1 Reply Last reply Reply Quote 0
                    • S
                      SteveITS Galactic Empire @netblues
                      last edited by

                      @netblues For what it's worth changing off using Google DNS as the gateway target didn't "prevent" the packet loss.

                      Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                      When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                      Upvote ๐Ÿ‘ helpful posts!

                      S 1 Reply Last reply Reply Quote 0
                      • S
                        SteveITS Galactic Empire @SteveITS
                        last edited by SteveITS

                        I noticed this was fixed in 2.5/21.2:
                        https://redmine.pfsense.org/issues/10546
                        "In this case, pfsense will consider a gateway down when it has actually returned to a normal state, necessitating administrator action to return it back to a proper state."

                        Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                        When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                        Upvote ๐Ÿ‘ helpful posts!

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.