Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Monitor is FALSE detecting one of my WANs as DOWN and another WAN as UP

    Scheduled Pinned Locked Moved Routing and Multi WAN
    39 Posts 7 Posters 3.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      dims
      last edited by

      Here is the proof that ping is OK:

      1 Reply Last reply Reply Quote 0
      • D
        dims
        last edited by

        And here is the tcpdump. Address 192.168.100.2 is an address of pfSense re3 interface, which is connected to provider3 modem.

        I made a gap at the moment before I started a ping from another session in parallel of tcpdump running.

         tcpdump -vnni re3 icmp
        tcpdump: listening on re3, link-type EN10MB (Ethernet), capture size 65535 bytes
        17:24:27.148884 IP (tos 0x0, ttl 64, id 4107, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->6685)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 774, length 8
        17:24:37.150298 IP (tos 0x0, ttl 64, id 4605, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->6493)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 775, length 8
        17:24:37.151978 IP (tos 0x20, ttl 254, id 4605, offset 0, flags [none], proto ICMP (1), length 28)
            95.165.128.1 > 192.168.100.2: ICMP echo reply, id 44623, seq 775, length 8
        17:24:47.152278 IP (tos 0x0, ttl 64, id 13649, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->413f)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 776, length 8
        17:24:57.154279 IP (tos 0x0, ttl 64, id 56463, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->9a00)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 777, length 8
        17:25:07.155922 IP (tos 0x0, ttl 64, id 37697, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->e34e)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 778, length 8
        17:25:07.157606 IP (tos 0x20, ttl 254, id 37697, offset 0, flags [none], proto ICMP (1), length 28)
            95.165.128.1 > 192.168.100.2: ICMP echo reply, id 44623, seq 778, length 8
        17:25:11.218697 IP (tos 0x0, ttl 64, id 64431, offset 0, flags [DF], proto ICMP (1), length 84, bad cksum 0 (->f5a4)!)
            192.168.100.2 > 192.168.100.1: ICMP echo request, id 33766, seq 0, length 64
        17:25:11.219159 IP (tos 0x0, ttl 64, id 5506, offset 0, flags [none], proto ICMP (1), length 84)
            192.168.100.1 > 192.168.100.2: ICMP echo reply, id 33766, seq 0, length 64
        17:25:12.219773 IP (tos 0x0, ttl 64, id 64514, offset 0, flags [DF], proto ICMP (1), length 84, bad cksum 0 (->f551)!)
            192.168.100.2 > 192.168.100.1: ICMP echo request, id 33766, seq 1, length 64
        17:25:12.220202 IP (tos 0x0, ttl 64, id 5567, offset 0, flags [none], proto ICMP (1), length 84)
            192.168.100.1 > 192.168.100.2: ICMP echo reply, id 33766, seq 1, length 64
        17:25:13.220858 IP (tos 0x0, ttl 64, id 64601, offset 0, flags [DF], proto ICMP (1), length 84, bad cksum 0 (->f4fa)!)
            192.168.100.2 > 192.168.100.1: ICMP echo request, id 33766, seq 2, length 64
        17:25:13.221296 IP (tos 0x0, ttl 64, id 5593, offset 0, flags [none], proto ICMP (1), length 84)
            192.168.100.1 > 192.168.100.2: ICMP echo reply, id 33766, seq 2, length 64
        17:25:17.157276 IP (tos 0x0, ttl 64, id 61699, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->858c)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 779, length 8
        17:25:17.158900 IP (tos 0x20, ttl 254, id 61699, offset 0, flags [none], proto ICMP (1), length 28)
            95.165.128.1 > 192.168.100.2: ICMP echo reply, id 44623, seq 779, length 8
        17:25:27.159275 IP (tos 0x0, ttl 64, id 53364, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->a61b)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 780, length 8
        17:25:37.161277 IP (tos 0x0, ttl 64, id 4829, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->63b3)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 781, length 8
        17:25:47.163278 IP (tos 0x0, ttl 64, id 34933, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->ee1a)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 782, length 8
        17:25:57.165273 IP (tos 0x0, ttl 64, id 18232, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->2f58)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 783, length 8
        
        17:26:00.753899 IP (tos 0x0, ttl 64, id 13197, offset 0, flags [none], proto ICMP (1), length 84, bad cksum 0 (->42cb)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 62267, seq 0, length 64
        17:26:00.758603 IP (tos 0x20, ttl 254, id 13197, offset 0, flags [none], proto ICMP (1), length 84)
            95.165.128.1 > 192.168.100.2: ICMP echo reply, id 62267, seq 0, length 64
        17:26:01.755266 IP (tos 0x0, ttl 64, id 38153, offset 0, flags [none], proto ICMP (1), length 84, bad cksum 0 (->e14e)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 62267, seq 1, length 64
        17:26:01.756999 IP (tos 0x20, ttl 254, id 38153, offset 0, flags [none], proto ICMP (1), length 84)
            95.165.128.1 > 192.168.100.2: ICMP echo reply, id 62267, seq 1, length 64
        17:26:02.756272 IP (tos 0x0, ttl 64, id 49644, offset 0, flags [none], proto ICMP (1), length 84, bad cksum 0 (->b46b)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 62267, seq 2, length 64
        17:26:02.758051 IP (tos 0x20, ttl 254, id 49644, offset 0, flags [none], proto ICMP (1), length 84)
            95.165.128.1 > 192.168.100.2: ICMP echo reply, id 62267, seq 2, length 64
        17:26:03.757266 IP (tos 0x0, ttl 64, id 18970, offset 0, flags [none], proto ICMP (1), length 84, bad cksum 0 (->2c3e)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 62267, seq 3, length 64
        17:26:03.759090 IP (tos 0x20, ttl 254, id 18970, offset 0, flags [none], proto ICMP (1), length 84)
            95.165.128.1 > 192.168.100.2: ICMP echo reply, id 62267, seq 3, length 64
        17:26:04.758269 IP (tos 0x0, ttl 64, id 39956, offset 0, flags [none], proto ICMP (1), length 84, bad cksum 0 (->da43)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 62267, seq 4, length 64
        17:26:04.760083 IP (tos 0x20, ttl 254, id 39956, offset 0, flags [none], proto ICMP (1), length 84)
            95.165.128.1 > 192.168.100.2: ICMP echo reply, id 62267, seq 4, length 64
        17:26:07.167272 IP (tos 0x0, ttl 64, id 19983, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->2881)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 784, length 8
        17:26:07.169021 IP (tos 0x20, ttl 254, id 19983, offset 0, flags [none], proto ICMP (1), length 28)
            95.165.128.1 > 192.168.100.2: ICMP echo reply, id 44623, seq 784, length 8
        17:26:11.265949 IP (tos 0x0, ttl 64, id 10046, offset 0, flags [DF], proto ICMP (1), length 84, bad cksum 0 (->ca16)!)
            192.168.100.2 > 192.168.100.1: ICMP echo request, id 53386, seq 0, length 64
        17:26:11.266466 IP (tos 0x0, ttl 64, id 5877, offset 0, flags [none], proto ICMP (1), length 84)
            192.168.100.1 > 192.168.100.2: ICMP echo reply, id 53386, seq 0, length 64
        17:26:12.267013 IP (tos 0x0, ttl 64, id 10061, offset 0, flags [DF], proto ICMP (1), length 84, bad cksum 0 (->ca07)!)
            192.168.100.2 > 192.168.100.1: ICMP echo request, id 53386, seq 1, length 64
        17:26:12.267436 IP (tos 0x0, ttl 64, id 5911, offset 0, flags [none], proto ICMP (1), length 84)
            192.168.100.1 > 192.168.100.2: ICMP echo reply, id 53386, seq 1, length 64
        17:26:13.268090 IP (tos 0x0, ttl 64, id 10073, offset 0, flags [DF], proto ICMP (1), length 84, bad cksum 0 (->c9fb)!)
            192.168.100.2 > 192.168.100.1: ICMP echo request, id 53386, seq 2, length 64
        17:26:13.268560 IP (tos 0x0, ttl 64, id 5913, offset 0, flags [none], proto ICMP (1), length 84)
            192.168.100.1 > 192.168.100.2: ICMP echo reply, id 53386, seq 2, length 64
        17:26:17.169272 IP (tos 0x0, ttl 64, id 18055, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->3009)!)
            192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 785, length 8
        17:26:17.170811 IP (tos 0x20, ttl 254, id 18055, offset 0, flags [none], proto ICMP (1), length 28)
            95.165.128.1 > 192.168.100.2: ICMP echo reply, id 44623, seq 785, length 8
        
        
        1 Reply Last reply Reply Quote 0
        • D
          dims
          last edited by

          @Derelict:

          host route is created to steer all traffic to that address out a specific interface.

          It should not happen silently. A link to this route should appear near appropriate configuration window so that administrator could check if this route interferes with something he set in other places. This is bad design case.

          If the system doesn't smart enogh to run out of the box automatically, it should be configurable. If it is not configurable, it should be smart enough to run automatically.

          You can't do Apple iOS which doesn't run and unable to configure.

          1 Reply Last reply Reply Quote 0
          • johnpozJ
            johnpoz LAYER 8 Global Moderator
            last edited by

            You have bad checksums.. and sending multiple requests and getting back 1 reply..

            You have something borked there… So looks like your monitor is going out BAD... And what your dump is showing as answer is your normal ping..

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 24.11 | Lab VMs 2.8, 24.11

            1 Reply Last reply Reply Quote 0
            • D
              dims
              last edited by

              @johnpoz:

              You have bad checksums.. and sending multiple requests and getting back 1 reply..

              Yes. Why can it happen? It never happen with normal ping. At least I was running it for dozens of minutes and saw no case of packet loss.

              And what your dump is showing as answer is your normal ping..

              I did only 5 pings, then I pressed Ctrl-C. All other pings are from someone else, I expect dpinger.

              1 Reply Last reply Reply Quote 0
              • johnpozJ
                johnpoz LAYER 8 Global Moderator
                last edited by

                exactly if the monitors are going out bad and not getting a response then yes it would show it offline since its not getting an answer to its ping..

                An intelligent man is sometimes forced to be drunk to spend time with his fools
                If you get confused: Listen to the Music Play
                Please don't Chat/PM me for help, unless mod related
                SG-4860 24.11 | Lab VMs 2.8, 24.11

                1 Reply Last reply Reply Quote 0
                • DerelictD
                  Derelict LAYER 8 Netgate
                  last edited by

                  @dims:

                  @Derelict:

                  host route is created to steer all traffic to that address out a specific interface.

                  It should not happen silently. A link to this route should appear near appropriate configuration window so that administrator could check if this route interferes with something he set in other places. This is bad design case.

                  There is no choice. It MUST force that route out the set gateway or it will go out the default gateway. The whole point is to bind the traffic to the specific interface. You can always look at the entire routing table in Diagnostics > Routes which is what a competent administrator would do.

                  If the system doesn't smart enogh to run out of the box automatically, it should be configurable. If it is not configurable, it should be smart enough to run automatically.

                  You have the special case. That is never going to work out-of-the-box. It will require configuration and tuning to make multi-wan work how you need it to work.

                  You can't do Apple iOS which doesn't run and unable to configure.

                  I have no idea what that means.
                  Diagnostics > Routes

                  It is smart enough to run automatically. You have the special case/requirements.

                  Chattanooga, Tennessee, USA
                  A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                  DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                  Do Not Chat For Help! NO_WAN_EGRESS(TM)

                  1 Reply Last reply Reply Quote 0
                  • DerelictD
                    Derelict LAYER 8 Netgate
                    last edited by

                    @johnpoz:

                    You have bad checksums.. and sending multiple requests and getting back 1 reply..

                    You have something borked there… So looks like your monitor is going out BAD... And what your dump is showing as answer is your normal ping..

                    The bad checksums are probably the result of checksum offloading. The OS doesn't calculate the checksums so they are 0 when tcpdump sees it. They are calculated and inserted by the NIC hardware. They are only being displayed because of his tcpdump flags.

                    
                    17:24:27.148884 IP (tos 0x0, ttl 64, id 4107, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->6685)!)
                        192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 774, length 8
                    17:24:37.150298 IP (tos 0x0, ttl 64, id 4605, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->6493)!)
                        192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 775, length 8
                    17:24:37.151978 IP (tos 0x20, ttl 254, id 4605, offset 0, flags [none], proto ICMP (1), length 28)
                        95.165.128.1 > 192.168.100.2: ICMP echo reply, id 44623, seq 775, length 8
                    17:24:47.152278 IP (tos 0x0, ttl 64, id 13649, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->413f)!)
                        192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 776, length 8
                    17:24:57.154279 IP (tos 0x0, ttl 64, id 56463, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->9a00)!)
                        192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 777, length 8
                    17:25:07.155922 IP (tos 0x0, ttl 64, id 37697, offset 0, flags [none], proto ICMP (1), length 28, bad cksum 0 (->e34e)!)
                        192.168.100.2 > 95.165.128.1: ICMP echo request, id 44623, seq 778, length 8
                    17:25:07.157606 IP (tos 0x20, ttl 254, id 37697, offset 0, flags [none], proto ICMP (1), length 28)
                        95.165.128.1 > 192.168.100.2: ICMP echo reply, id 44623, seq 778, length 8
                    
                    

                    You are not getting a LOT of responses. Of course the GW is marked as down.

                    The default ping interval is two per second. You are doing one every 10 seconds. What else have you changed? How about you post the settings for that gateway - particularly the advanced settings which you have obviously changed from the defaults.

                    Chattanooga, Tennessee, USA
                    A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                    DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                    Do Not Chat For Help! NO_WAN_EGRESS(TM)

                    1 Reply Last reply Reply Quote 0
                    • D
                      dims
                      last edited by

                      @Derelict:

                      You are not getting a LOT of responses.

                      Why? How does dpinger able to achieve this?

                      The default ping interval is two per second. You are doing one every 10 seconds. What else have you changed?

                      I played with intervals because I was thinking target host has some sort of flood protection. Neither setting helped, including default one.

                      How about you post the settings for that gateway - particularly the advanced settings which you have obviously changed from the defaults.

                      I would not change defaults if they worked.

                      You are welcome:

                      1 Reply Last reply Reply Quote 0
                      • D
                        dims
                        last edited by

                        @Derelict:

                        It is smart enough to run automatically. You have the special case/requirements.

                        Sure, this is what I am saying: pfSense does not work in "special case" of having 3 WANs round robbin.

                        1 Reply Last reply Reply Quote 0
                        • D
                          dims
                          last edited by

                          @johnpoz:

                          exactly if the monitors are going out bad and not getting a response

                          Why this can happen? Taking into consideration, that normal ping works?

                          1 Reply Last reply Reply Quote 0
                          • DerelictD
                            Derelict LAYER 8 Netgate
                            last edited by

                            Only your ISP can tell you why sent echo requests are not responded to.

                            Set all of those settings back to the default. Does it work?

                            If not, take a quick packet capture for posting here then try setting the data payload to 2. Does it work?

                            If not, take a quick packet capture for posting here then try setting the data payload to 64. Does it work?

                            If not, take a quick packet capture for posting here then post all of the above.

                            Chattanooga, Tennessee, USA
                            A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                            DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                            Do Not Chat For Help! NO_WAN_EGRESS(TM)

                            1 Reply Last reply Reply Quote 0
                            • D
                              dims
                              last edited by

                              @Derelict:

                              Only your ISP can tell you why sent echo requests are not responded to.

                              How provider can technically distinguish pings from ping command and from dpinger?

                              1 Reply Last reply Reply Quote 0
                              • DerelictD
                                Derelict LAYER 8 Netgate
                                last edited by

                                The payload size of a normal ping from the ping command is 56 bytes. The payload size of a dpinger ping is 0 by default.

                                Hence why I asked you to do what I did.

                                It's pretty rare but some devices freak out with the 0-byte payload even though it's completely legal.

                                Chattanooga, Tennessee, USA
                                A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                                DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                                Do Not Chat For Help! NO_WAN_EGRESS(TM)

                                1 Reply Last reply Reply Quote 0
                                • D
                                  dims
                                  last edited by

                                  Setting payload to 56 immediately made monitor think gateway is up, thank you!

                                  I have set it to 56 in other monitors, and picture also became better: now zeros in RTT and RTTsd columns!

                                  1 Reply Last reply Reply Quote 0
                                  • D
                                    dims
                                    last edited by

                                    No, last statement was wrong: after some time these columns got some positive values.

                                    1 Reply Last reply Reply Quote 0
                                    • P
                                      pwood999
                                      last edited by

                                      I had flase GW down reports when I first configured multi-wan.  Then changed Monitor IP's from google to OpenDNS, and problems went away.

                                      Also this gives RTT & RTTsd figures which are probably more realistic for internet connectivity, rather than using the next-hop provider IP.

                                      DSL GW = 7.5mS & 0.2mS
                                      Cable GW = 15mS & 3.5mS

                                      Cable is always longer due to the way Docsis works !!

                                      1 Reply Last reply Reply Quote 0
                                      • G
                                        Gektor
                                        last edited by Gektor

                                        I am have same issue and same scheme as in author of topic with gateway group:
                                        1 gateway is tier 2 (connected directly to the internet provider, used as backup)
                                        2 gateway is tier 1 (connected to external router 1 with VLAN2 by DHCP)
                                        3 gateway is tier 1 (connected to external router 2 with VLAN3 by DHCP)
                                        and last 3 gateway can't resolve "UP" state of gateway WAN when external router 2 changes WAN IP. Monitoring IP on 2 gateway is 8.8.4.4, on 3 gateway 8.8.8.8.
                                        Payload and changing monitoring IP did not help at all, when i set "Trigger Level' as "High Latency" it works a little better, but when i set it to "Packet Loss" and in time when router 2 on 3 gateway changes WAN ip - on pfSense in100% case it will marks 3 gateway as offline forever, but in Diagnostics -> Ping i can ping any address from 3 gateway without any problems. If i will go to System -> Routing and save and apply any gateway without changes - 3 gateway will back to Online state till router 2 changes WAN IP address.
                                        2 gateway have same type router as on 3 gateway, but when it's change IP - pfSense make it Offline for few minutes and then make it back to Online state in any types of Trigger Level.
                                        It seems that pfSense buggy with dpinger on some scenarios.

                                        1 Reply Last reply Reply Quote 0
                                        • G
                                          Gektor
                                          last edited by Gektor

                                          It's more interesting, i switch Trigger Level to High Latency, and some time later pfSense himself switch it to Packet Loss! I didn't understand, why it happens.

                                          And for sure it's dpinger bugs related, case i have checked "Disable Gateway Monitoring Action" and then made reconnect on router 2 (3 gateway) and in "Gateway status" i get on 3 gateway "Danger, Packetloss: 100%" on 3 gateway, i have check - traffic still goes through 3 gateway (router 2) without any problems, but dpinger thinks that it's dead for sure forever, till i make "save and apply" in any gateway settings.

                                          I didn't now how to make monitoring work in pfSense. :(

                                          1 Reply Last reply Reply Quote 0
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.