Apinger doesn't recover opt wan when connection returns: still an issue?
-
I am using 2.0.1 stable release, in a dual wan configuration and experience the symptoms described in bug #742 and in here: http://forum.pfsense.org/index.php/topic,32010.0.html
In the dashboard, the gateway shows as offline, RTT ~30ms and loss 100%. It is quite strange that both these values are shown when apinger thinks the gateway is down.
I am not sure right now (I will have to confirm), but I think when the gateway is really down RTT is zero.<– This is not true, I was wrong. It simply shows the last RTT, so ignore this comment.Also in my logs I have:
May 2 11:16:36 php: : Gateways status could not be determined, considering all as up/active.
May 2 11:16:36 php: : MONITOR: GW_WIND is down, removing from routing group
May 2 11:16:36 php: : MONITOR: GW_WIND is down, removing from routing group
May 2 11:16:36 php: : MONITOR: GW_WIND is down, removing from routing group
May 2 11:16:36 php: : Message sent to @.gr OK
May 2 11:15:58 php: : MONITOR: GW_WIND is down, removing from routing group
May 2 11:15:53 check_reload_status: Reloading filter
May 2 11:15:43 apinger: ALARM: GW_WIND(62.169.255.45) *** GW_WINDdown ***Edit2: I have this issue in both a regular PC installation and an embedded installation on an ALIX board.
I was under the impression that this was resolved in 2.0, is it not?
Thanks in advance.
-
hasn't been an issue since long before 2.0 release.
-
Thanks for confirming.
This is however odd because I have the exact behaviour described in #742. I can easily reproduce by temporarily dropping the connection of the 2nd WAN router (ADSL).The 2 WAN connections are set for load-balancing. Both have static IPs. I have tried using a local subnet or public IP between pfsense and ADSL routers, made no difference.
Anyone have any suggestions on how to troubleshoot the issue? Any info I could provide that would shed some light?
I have some limited experience in linux and almost none in FreeBSD, but I have tried everything I could think of without any success.Any help would be greatly appreciated.
-
I did some more debugging. Here's a packet capture from WAN2 (using pfSense's own packet capture so some packets my have been dropped) while apinger fails to ping the gateway even though the link is up:
12:30:55.017351 IP (tos 0x0, ttl 64, id 35835, offset 0, flags [none], proto ICMP (1), length 64)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 8193, length 44
12:30:56.017341 IP (tos 0x0, ttl 64, id 57869, offset 0, flags [none], proto ICMP (1), length 64)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 8449, length 44
12:30:57.017374 IP (tos 0x0, ttl 64, id 43995, offset 0, flags [none], proto ICMP (1), length 64)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 8705, length 44
12:30:58.017486 IP (tos 0x0, ttl 64, id 3231, offset 0, flags [none], proto ICMP (1), length 64)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 8961, length 44
12:30:59.017906 IP (tos 0x0, ttl 64, id 55248, offset 0, flags [none], proto ICMP (1), length 64)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 9217, length 44
12:31:00.017994 IP (tos 0x0, ttl 64, id 42689, offset 0, flags [none], proto ICMP (1), length 64)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 9473, length 44
12:31:01.018099 IP (tos 0x0, ttl 64, id 43292, offset 0, flags [none], proto ICMP (1), length 64)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 9729, length 44Then without changing anything, I did a:
ping -S 10.0.2.103 -c 4 62.169.255.45
and here is the packet capture (note that apinger ICMP requests are also included):
12:41:25.119149 IP (tos 0x0, ttl 64, id 16190, offset 0, flags [none], proto ICMP (1), length 64)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 38403, length 44
12:41:26.120109 IP (tos 0x0, ttl 64, id 1185, offset 0, flags [none], proto ICMP (1), length 64)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 38659, length 44
12:41:26.751510 IP (tos 0x0, ttl 64, id 15068, offset 0, flags [none], proto ICMP (1), length 84)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 15242, seq 0, length 64
12:41:26.781472 IP (tos 0x0, ttl 126, id 15068, offset 0, flags [none], proto ICMP (1), length 84)
62.169.255.45 > 10.0.2.103: ICMP echo reply, id 15242, seq 0, length 64
12:41:27.121100 IP (tos 0x0, ttl 64, id 29878, offset 0, flags [none], proto ICMP (1), length 64)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 38915, length 44
12:41:27.752088 IP (tos 0x0, ttl 64, id 7106, offset 0, flags [none], proto ICMP (1), length 84)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 15242, seq 1, length 64
12:41:27.780434 IP (tos 0x0, ttl 126, id 7106, offset 0, flags [none], proto ICMP (1), length 84)
62.169.255.45 > 10.0.2.103: ICMP echo reply, id 15242, seq 1, length 64
12:41:28.122462 IP (tos 0x0, ttl 64, id 64529, offset 0, flags [none], proto ICMP (1), length 64)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 39171, length 44
12:41:28.753091 IP (tos 0x0, ttl 64, id 6296, offset 0, flags [none], proto ICMP (1), length 84)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 15242, seq 2, length 64
12:41:28.781626 IP (tos 0x0, ttl 126, id 6296, offset 0, flags [none], proto ICMP (1), length 84)
62.169.255.45 > 10.0.2.103: ICMP echo reply, id 15242, seq 2, length 64
12:41:29.123152 IP (tos 0x0, ttl 64, id 29476, offset 0, flags [none], proto ICMP (1), length 64)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 7969, seq 39427, length 44
12:41:29.754032 IP (tos 0x0, ttl 64, id 46050, offset 0, flags [none], proto ICMP (1), length 84)
10.0.2.103 > 62.169.255.45: ICMP echo request, id 15242, seq 3, length 64
12:41:29.781492 IP (tos 0x0, ttl 126, id 46050, offset 0, flags [none], proto ICMP (1), length 84)
62.169.255.45 > 10.0.2.103: ICMP echo reply, id 15242, seq 3, length 64Notice that ping gets a response, while apinger does not. If I kill apinger then restart it it works fine until the line drops.
The only differences I can see are:-
the packet/data length - it shouldn't matter because it works if I restart apinger
-
the sequence #: in ping it starts from 0 while in apinger it continues from where it left off in the previous try. If I restart apinger, sequence # restarts at 0. Could this be the issue?
Next I want to try to insert a sniffer directly on the LAN segment between pfSense and ADSL router (I've done this before but do not remember if the ADSL router actually replied - I think it did but I have to confirm)
-
-
i've had a similar issue before … changing the dsl router solved this for me.
In my case it was an old cisco 800 series dsl router that caused the problem ... it was replaced by a cheap dlink
-
I've experienced this with a Gennet Oxygen router and a Pirelli router. At least for the Pirelli one, replacing it is not an option - the router is provided and controlled by the ISP