Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    "Member Down" problem

    Scheduled Pinned Locked Moved Routing and Multi WAN
    34 Posts 5 Posters 7.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K Offline
      kevindd992002
      last edited by

      I'm sorry, I thought it's fine to do a daily bump?

      1 Reply Last reply Reply Quote 0
      • P Offline
        phil.davis
        last edited by

        I always thought "Member down" meant you had to take away the electrical signals on the physical port (unplug the cable, power off the thing at the other end of the cable…) for pfSense to consider the interface down.
        I will also be happy to hear from someone who knows what the intend behaviour of "Member Down" is.

        As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
        If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

        1 Reply Last reply Reply Quote 0
        • K Offline
          kevindd992002
          last edited by

          @phil.davis:

          I always thought "Member down" meant you had to take away the electrical signals on the physical port (unplug the cable, power off the thing at the other end of the cable…) for pfSense to consider the interface down.
          I will also be happy to hear from someone who knows what the intend behaviour of "Member Down" is.

          Well, there are "thresholds" in "System: Gateways: Edit gateway" advanced section that you can set for the member down feature. So it constantly probes the monitor IP THROUGH that specific interface for replies before it considers it as member down.

          1 Reply Last reply Reply Quote 0
          • K Offline
            kevindd992002
            last edited by

            Anybody has more ideas?

            1 Reply Last reply Reply Quote 0
            • luckman212L Offline
              luckman212 LAYER 8
              last edited by

              In my testing I was also under the impression that a "member down" event was only triggered by a physical interruption i.e. the attached device was powered down or the cable was unplugged etc. That's why I usually choose "packet loss or high latency" when setting up my gateway groups- as far as I understand it unplugging the cable certainly causes packet loss so that usually covers both cases.

              1 Reply Last reply Reply Quote 0
              • K Offline
                kevindd992002
                last edited by

                @luckman212:

                In my testing I was also under the impression that a "member down" event was only triggered by a physical interruption i.e. the attached device was powered down or the cable was unplugged etc. That's why I usually choose "packet loss or high latency" when setting up my gateway groups- as far as I understand it unplugging the cable certainly causes packet loss so that usually covers both cases.

                That is the first impression, it was mine too at first. But if you look at the threshold settings in the monitor IP settings, you'll see something like the information in the screenshot I've just attached here and you'll realize that there is still probing that will happen first before it considers a member as down.

                Capture.JPG
                Capture.JPG_thumb

                1 Reply Last reply Reply Quote 0
                • C Offline
                  cmb
                  last edited by

                  Down == above the defined thresholds you have on the gateway for what should be considered down.

                  1 Reply Last reply Reply Quote 0
                  • luckman212L Offline
                    luckman212 LAYER 8
                    last edited by

                    Chris thanks for the clarification … very good to know. I've definitely been misinterpreting this for a long time!

                    1 Reply Last reply Reply Quote 0
                    • K Offline
                      kevindd992002
                      last edited by

                      @cmb:

                      Down == above the defined thresholds you have on the gateway for what should be considered down.

                      Exactly my point. Any idea why the issue is happening on my end?

                      1 Reply Last reply Reply Quote 0
                      • luckman212L Offline
                        luckman212 LAYER 8
                        last edited by

                        What does your System Logs > Gateways look like when this happens?

                        1 Reply Last reply Reply Quote 0
                        • K Offline
                          kevindd992002
                          last edited by

                          Nov 11 13:12:29 apinger: alarm canceled: WAN1_DHCP(8.8.8.8) *** loss ***
                          Nov 11 13:21:23 apinger: ALARM: WAN1_DHCP(8.8.8.8) *** loss ***
                          Nov 11 13:22:04 apinger: alarm canceled: WAN1_DHCP(8.8.8.8) *** loss ***
                          Nov 11 14:28:51 apinger: ALARM: WAN2_DHCP(8.8.4.4) *** delay ***
                          Nov 11 14:28:59 apinger: alarm canceled: WAN2_DHCP(8.8.4.4) *** delay ***
                          Nov 11 21:42:44 apinger: ALARM: WAN3_DHCP(208.67.222.222) *** delay ***
                          Nov 11 21:42:54 apinger: alarm canceled: WAN3_DHCP(208.67.222.222) *** delay ***
                          Nov 11 21:48:24 apinger: ALARM: WAN3_DHCP(208.67.222.222) *** delay ***
                          Nov 11 21:49:04 apinger: alarm canceled: WAN3_DHCP(208.67.222.222) *** delay ***
                          Nov 11 21:51:18 apinger: ALARM: WAN3_DHCP(208.67.222.222) *** delay ***
                          Nov 11 21:52:10 apinger: alarm canceled: WAN3_DHCP(208.67.222.222) *** delay ***
                          Nov 11 21:52:46 apinger: ALARM: WAN3_DHCP(208.67.222.222) *** delay ***
                          Nov 11 21:53:01 apinger: alarm canceled: WAN3_DHCP(208.67.222.222) *** delay ***
                          Nov 12 06:06:03 apinger: ALARM: WAN2_DHCP(8.8.4.4) *** loss ***
                          Nov 12 06:06:11 apinger: ALARM: WAN1_DHCP(8.8.8.8) *** loss ***
                          Nov 12 06:06:44 apinger: alarm canceled: WAN2_DHCP(8.8.4.4) *** loss ***
                          Nov 12 06:06:59 apinger: alarm canceled: WAN1_DHCP(8.8.8.8) *** loss ***
                          Nov 12 06:28:57 apinger: ALARM: WAN1_DHCP(8.8.8.8) *** loss ***
                          Nov 12 06:29:43 apinger: alarm canceled: WAN1_DHCP(8.8.8.8) *** loss ***
                          Nov 12 17:38:58 apinger: ALARM: WAN3_DHCP(208.67.222.222) *** loss ***
                          Nov 12 17:38:59 apinger: ALARM: WAN1_DHCP(8.8.8.8) *** loss ***
                          Nov 12 17:39:38 apinger: alarm canceled: WAN1_DHCP(8.8.8.8) *** loss ***
                          Nov 12 17:40:58 apinger: alarm canceled: WAN3_DHCP(208.67.222.222) *** loss ***
                          Nov 12 19:28:12 apinger: ALARM: WAN1_DHCP(8.8.8.8) *** delay ***
                          Nov 12 19:30:50 apinger: alarm canceled: WAN1_DHCP(8.8.8.8) *** delay ***
                          Nov 12 19:30:58 apinger: ALARM: WAN1_DHCP(8.8.8.8) *** delay ***
                          Nov 12 19:38:44 apinger: alarm canceled: WAN1_DHCP(8.8.8.8) *** delay ***
                          Nov 12 19:38:59 apinger: ALARM: WAN1_DHCP(8.8.8.8) *** delay ***
                          Nov 12 19:39:28 apinger: alarm canceled: WAN1_DHCP(8.8.8.8) *** delay ***
                          Nov 12 19:43:09 apinger: ALARM: WAN1_DHCP(8.8.8.8) *** delay ***
                          Nov 12 19:48:12 apinger: alarm canceled: WAN1_DHCP(8.8.8.8) *** delay ***
                          Nov 13 13:20:26 apinger: ALARM: WAN1_DHCP(8.8.8.8) *** WAN1_DHCPdown ***
                          Nov 13 13:20:26 apinger: ALARM: WAN3_DHCP(208.67.222.222) *** WAN3_DHCPdown ***
                          Nov 13 13:20:26 apinger: ALARM: WAN2_DHCP(8.8.4.4) *** WAN2_DHCPdown ***
                          Nov 13 13:23:35 apinger: alarm canceled: WAN3_DHCP(208.67.222.222) *** WAN3_DHCPdown ***
                          Nov 13 13:23:36 apinger: alarm canceled: WAN2_DHCP(8.8.4.4) *** WAN2_DHCPdown ***
                          Nov 13 13:23:36 apinger: alarm canceled: WAN1_DHCP(8.8.8.8) *** WAN1_DHCPdown ***
                          Nov 13 13:25:47 apinger: ALARM: WAN2_DHCP(8.8.4.4) *** loss ***
                          Nov 13 13:25:50 apinger: ALARM: WAN1_DHCP(8.8.8.8) *** loss ***
                          Nov 13 13:26:33 apinger: alarm canceled: WAN2_DHCP(8.8.4.4) *** loss ***
                          Nov 13 13:26:34 apinger: alarm canceled: WAN1_DHCP(8.8.8.8) *** loss ***
                          Nov 15 04:28:55 apinger: Starting Alarm Pinger, apinger(23592)
                          Nov 15 04:28:59 apinger: SIGHUP received, reloading configuration.
                          Nov 15 04:29:00 apinger: SIGHUP received, reloading configuration.
                          Nov 15 04:29:03 apinger: SIGHUP received, reloading configuration.
                          Nov 15 17:22:49 apinger: ALARM: WAN3_DHCP(208.67.222.222) *** delay ***
                          Nov 15 17:22:51 apinger: ALARM: WAN1_DHCP(8.8.8.8) *** delay ***
                          Nov 15 17:23:01 apinger: alarm canceled: WAN3_DHCP(208.67.222.222) *** delay ***
                          Nov 15 17:23:03 apinger: alarm canceled: WAN1_DHCP(8.8.8.8) *** delay ***
                          Nov 15 17:23:14 apinger: SIGHUP received, reloading configuration.

                          1 Reply Last reply Reply Quote 0
                          • C Offline
                            cmb
                            last edited by

                            How are your latency/loss settings configured in your gateway? What latency and loss is Status>Gateways showing when that happens, or check the quality RRD Graph (Status>RRD Graph) to see in the past.

                            1 Reply Last reply Reply Quote 0
                            • K Offline
                              kevindd992002
                              last edited by

                              @cmb:

                              How are your latency/loss settings configured in your gateway? What latency and loss is Status>Gateways showing when that happens, or check the quality RRD Graph (Status>RRD Graph) to see in the past.

                              My latency and loss settings in all three gateways are blank (default). What exact infromation do I need to check in the RRD Graphs? There are a ton of information there.

                              EDIT: I've attached the RRD graph that I think is relevant. I just got another notification from pfsense that my WAN2_DHCP gateway went down  and it seems that the packet loss and latency at that time is quite high but why would that affect the probing of the interface to cause it to be tagged as "down"?

                              Capture.JPG
                              Capture.JPG_thumb

                              1 Reply Last reply Reply Quote 0
                              • K Offline
                                kejianshi
                                last edited by

                                I just turned off gateway monitoring on one of mine not long ago because it was more important that my pfsense work than that I have a pretty graph.

                                1 Reply Last reply Reply Quote 0
                                • C Offline
                                  cmb
                                  last edited by

                                  Your averaged out loss is upwards of 18%, you're definitely getting cycles where it's over 20%, and 20% will take down the WAN. Increase the loss threshold if that's normal behavior for your WAN. I suspect you either have shaping or limiters configured in such a way that you're de-prioritizing and dropping your monitor pings, or you have an issue of some sort with that connection if it gets that bad under load.

                                  1 Reply Last reply Reply Quote 0
                                  • K Offline
                                    kevindd992002
                                    last edited by

                                    @cmb:

                                    Your averaged out loss is upwards of 18%, you're definitely getting cycles where it's over 20%, and 20% will take down the WAN. Increase the loss threshold if that's normal behavior for your WAN. I suspect you either have shaping or limiters configured in such a way that you're de-prioritizing and dropping your monitor pings, or you have an issue of some sort with that connection if it gets that bad under load.

                                    No shaping or limiters configured, I guess it's just the normal behavior of our ISP since I'm from the Philippines. So the packet loss there can translate to a failed "probe" for the member down criterion?

                                    1 Reply Last reply Reply Quote 0
                                    • K Offline
                                      kejianshi
                                      last edited by

                                      Mine here is globe DSL.  For sure they do LOTS of really poorly executed traffic shaping.
                                      Especially where UDP VPNs are concerned.  Pretty much only TCP 80 and 443 are reliable.

                                      1 Reply Last reply Reply Quote 0
                                      • K Offline
                                        kevindd992002
                                        last edited by

                                        @kejianshi:

                                        Mine here is globe DSL.  For sure they do LOTS of really poorly executed traffic shaping.
                                        Especially where UDP VPNs are concerned.  Pretty much only TCP 80 and 443 are reliable.

                                        That's on their side. What we're talking about here is a simple probe of IP address (in my case public DNS servers). cmb is talking about traffic shaping on the pfsense side itself and not by the ISP.

                                        1 Reply Last reply Reply Quote 0
                                        • K Offline
                                          kejianshi
                                          last edited by

                                          Could be.
                                          But I didn't necessarily see it that way.
                                          I think traffic shaping on the ISP side done badly is just as bad.
                                          BTW - I have same problem as yours with one of these running in texas on Time Warner Cable.
                                          No shaping on pfsense.  Definitely the ISP. Just crap latency.  Terrible network.
                                          Thats the one that I gave up on, turned of gateway monitor and things were then much improved.

                                          BTW - My globe dsl router has a few things I had to change.
                                          One of which was DDOS protection.  Particularly "ping to death" protection.
                                          That was screwing things up here.

                                          1 Reply Last reply Reply Quote 0
                                          • K Offline
                                            kevindd992002
                                            last edited by

                                            @kejianshi:

                                            Could be.
                                            But I didn't necessarily see it that way.
                                            I think traffic shaping on the ISP side done badly is just as bad.
                                            BTW - I have same problem as yours with one of these running in texas on Time Warner Cable.
                                            No shaping on pfsense.  Definitely the ISP. Just crap latency.  Terrible network.
                                            Thats the one that I gave up on, turned of gateway monitor and things were then much improved.

                                            Yeah but I wouldn't want to disable gateway monitoring altogether as failover won't work if you do that. Increasing the thresholds should fix this problem, no brainer.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.