Apinger doesn't recover opt wan when connection returns: still an issue?



  • I am using 2.0.1 stable release, in a dual wan configuration and experience the symptoms described in bug #742 and in here: http://forum.pfsense.org/index.php/topic,32010.0.html

    In the dashboard, the gateway shows as offline, RTT ~30ms and loss 100%. It is quite strange that both these values are shown when apinger thinks the gateway is down.
    I am not sure right now (I will have to confirm), but I think when the gateway is really down RTT is zero. <– This is not true, I was wrong. It simply shows the last RTT, so ignore this comment.

    Also in my logs I have:
    May 2 11:16:36 php: : Gateways status could not be determined, considering all as up/active.
    May 2 11:16:36 php: : MONITOR: GW_WIND is down, removing from routing group
    May 2 11:16:36 php: : MONITOR: GW_WIND is down, removing from routing group
    May 2 11:16:36 php: : MONITOR: GW_WIND is down, removing from routing group
    May 2 11:16:36 php: : Message sent to @.gr OK
    May 2 11:15:58 php: : MONITOR: GW_WIND is down, removing from routing group
    May 2 11:15:53 check_reload_status: Reloading filter
    May 2 11:15:43 apinger: ALARM: GW_WIND(62.169.255.45) *** GW_WINDdown ***

    Edit2: I have this issue in both a regular PC installation and an embedded installation on an ALIX board.

    I was under the impression that this was resolved in 2.0, is it not?

    Thanks in advance.



  • hasn’t been an issue since long before 2.0 release.



  • Thanks for confirming.
    This is however odd because I have the exact behaviour described in #742. I can easily reproduce by temporarily dropping the connection of the 2nd WAN router (ADSL).

    The 2 WAN connections are set for load-balancing. Both have static IPs. I have tried using a local subnet or public IP between pfsense and ADSL routers, made no difference.

    Anyone have any suggestions on how to troubleshoot the issue? Any info I could provide that would shed some light?
    I have some limited experience in linux and almost none in FreeBSD, but I have tried everything I could think of without any success.

    Any help would be greatly appreciated.



  • I did some more debugging. Here’s a packet capture from WAN2 (using pfSense’s own packet capture so some packets my have been dropped) while apinger fails to ping the gateway even though the link is up:

    12:30:55.017351 IP (tos 0x0, ttl 64, id 35835, offset 0, flags [none], proto ICMP (1), length 64)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 8193, length 44
    12:30:56.017341 IP (tos 0x0, ttl 64, id 57869, offset 0, flags [none], proto ICMP (1), length 64)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 8449, length 44
    12:30:57.017374 IP (tos 0x0, ttl 64, id 43995, offset 0, flags [none], proto ICMP (1), length 64)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 8705, length 44
    12:30:58.017486 IP (tos 0x0, ttl 64, id 3231, offset 0, flags [none], proto ICMP (1), length 64)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 8961, length 44
    12:30:59.017906 IP (tos 0x0, ttl 64, id 55248, offset 0, flags [none], proto ICMP (1), length 64)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 9217, length 44
    12:31:00.017994 IP (tos 0x0, ttl 64, id 42689, offset 0, flags [none], proto ICMP (1), length 64)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 9473, length 44
    12:31:01.018099 IP (tos 0x0, ttl 64, id 43292, offset 0, flags [none], proto ICMP (1), length 64)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 9729, length 44

    Then without changing anything, I did a:

    ping -S 10.0.2.103 -c 4  62.169.255.45

    and here is the packet capture (note that apinger ICMP requests are also included):

    12:41:25.119149 IP (tos 0x0, ttl 64, id 16190, offset 0, flags [none], proto ICMP (1), length 64)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 38403, length 44
    12:41:26.120109 IP (tos 0x0, ttl 64, id 1185, offset 0, flags [none], proto ICMP (1), length 64)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 38659, length 44
    12:41:26.751510 IP (tos 0x0, ttl 64, id 15068, offset 0, flags [none], proto ICMP (1), length 84)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 15242, seq 0, length 64
    12:41:26.781472 IP (tos 0x0, ttl 126, id 15068, offset 0, flags [none], proto ICMP (1), length 84)
        62.169.255.45 > 10.0.2.103: ICMP echo reply, id 15242, seq 0, length 64
    12:41:27.121100 IP (tos 0x0, ttl 64, id 29878, offset 0, flags [none], proto ICMP (1), length 64)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 38915, length 44
    12:41:27.752088 IP (tos 0x0, ttl 64, id 7106, offset 0, flags [none], proto ICMP (1), length 84)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 15242, seq 1, length 64
    12:41:27.780434 IP (tos 0x0, ttl 126, id 7106, offset 0, flags [none], proto ICMP (1), length 84)
        62.169.255.45 > 10.0.2.103: ICMP echo reply, id 15242, seq 1, length 64
    12:41:28.122462 IP (tos 0x0, ttl 64, id 64529, offset 0, flags [none], proto ICMP (1), length 64)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 39171, length 44
    12:41:28.753091 IP (tos 0x0, ttl 64, id 6296, offset 0, flags [none], proto ICMP (1), length 84)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 15242, seq 2, length 64
    12:41:28.781626 IP (tos 0x0, ttl 126, id 6296, offset 0, flags [none], proto ICMP (1), length 84)
        62.169.255.45 > 10.0.2.103: ICMP echo reply, id 15242, seq 2, length 64
    12:41:29.123152 IP (tos 0x0, ttl 64, id 29476, offset 0, flags [none], proto ICMP (1), length 64)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 39427, length 44
    12:41:29.754032 IP (tos 0x0, ttl 64, id 46050, offset 0, flags [none], proto ICMP (1), length 84)
        10.0.2.103 > 62.169.255.45: ICMP echo request, id 15242, seq 3, length 64
    12:41:29.781492 IP (tos 0x0, ttl 126, id 46050, offset 0, flags [none], proto ICMP (1), length 84)
        62.169.255.45 > 10.0.2.103: ICMP echo reply, id 15242, seq 3, length 64

    Notice that ping gets a response, while apinger does not. If I kill apinger then restart it it works fine until the line drops.
    The only differences I can see are:

    • the packet/data length - it shouldn’t matter because it works if I restart apinger

    • the sequence #: in ping it starts from 0 while in apinger it continues from where it left off in the previous try. If I restart apinger, sequence # restarts at 0. Could this be the issue?

    Next I want to try to insert a sniffer directly on the LAN segment between pfSense and ADSL router (I’ve done this before but do not remember if the ADSL router actually replied - I think it did but I have to confirm)



  • i’ve had a similar issue before … changing the dsl router solved this for me.

    In my case it was an old cisco 800 series dsl router that caused the problem … it was replaced by a cheap dlink



  • I’ve experienced this with a Gennet Oxygen router and a Pirelli router. At least for the Pirelli one, replacing it is not an option - the router is provided and controlled by the ISP


Locked
 

© Copyright 2002 - 2018 Rubicon Communications, LLC | Privacy Policy