Apinger and check_reload_status
-
After a number of failed attempts to download a 2GB movie file through Firefox I started to look around to see if I could see something that might contribute to the problem. (Often the download stalls for over ten minutes with no failure reported, web pages downloaded fine; maybe Firefox bug but I think I have seen similar things in Chrome. Sometimes the browser transfer only part of a file, most recently 135MB of a 2GB file, but doesn't report any error.)
I am running
2.0.1-RELEASE (i386)
built on Mon Dec 12 17:53:52 EST 2011
FreeBSD 8.1-RELEASE-p6I looked at my pfSense system log and saw numerous instances of sequences like: Apr 27 22:51:49 pfsense apinger: ALARM: WAN(121.50.212.9) *** delay ***
Apr 27 22:51:59 pfsense apinger: ALARM: WAN(121.50.212.9) *** down ***
Apr 27 22:51:59 pfsense check_reload_status: Reloading filter
Apr 27 22:52:00 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** down ***
Apr 27 22:52:09 pfsense check_reload_status: Reloading filter
Apr 27 22:52:10 pfsense check_reload_status: Reloading filter
apinger reports something, then 10 seconds later check_reload_status report "Reloading filter".In System -> Routing -> Gateways I have set Frequency Probe to 10. Does that also set the 10 seconds between apinger reports and the reloading filter?
What does the "Reloading Filter indicate"?
In Apr 27 23:00:50 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** delay ***
Apr 27 23:00:54 pfsense kernel: Bump sched buckets to 64 (was 0)
Apr 27 23:00:59 pfsense kernel: Bump sched buckets to 64 (was 0)
Apr 27 23:01:00 pfsense apinger: ALARM: WAN(121.50.212.9) *** down ***
Apr 27 23:01:00 pfsense check_reload_status: Reloading filter
Apr 27 23:01:00 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** down ***
Apr 27 23:01:10 pfsense check_reload_status: Reloading filter
Apr 27 23:01:10 pfsense apinger: ALARM: WAN(121.50.212.9) *** down ***
Apr 27 23:01:10 pfsense check_reload_status: Reloading filter
Apr 27 23:01:10 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** down ***
Apr 27 23:01:11 pfsense kernel: Bump sched buckets to 64 (was 0)
Apr 27 23:01:20 pfsense check_reload_status: Reloading filter
Apr 27 23:01:28 pfsense kernel: Bump sched buckets to 64 (was 0)
Apr 27 23:01:33 pfsense kernel: Bump sched buckets to 64 (was 0)
Apr 27 23:01:39 pfsense kernel: Bump sched buckets to 64 (was 0)
Apr 27 23:01:40 pfsense apinger: ALARM: WAN(121.50.212.9) *** down ***
Apr 27 23:01:40 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** down ***
Apr 27 23:01:50 pfsense check_reload_status: Reloading filter
Apr 27 23:01:50 pfsense check_reload_status: Reloading filterI ran a tcpdump packet capture filtering on the ping target and compared with apinger and check_reload_status entries in the system log. I have removed the packet capture entries that don't seem related to the apinger/check_reload_status system log reports:
14:19:33.989677 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 52757, length 44
14:19:34.344118 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 52757, length 44
14:19:43.989693 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 53013, length 44
14:19:44.287608 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 53013, length 44
14:19:53.989713 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 53269, length 44
14:20:03.989710 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 53525, length 44
14:20:04.825708 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 53525, length 44
14:20:13.989802 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 53781, length 44
14:20:14.542881 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 53781, length 4414:32:14.002351 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 6678, length 44
14:32:14.304134 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 6678, length 44
14:32:24.002356 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 6934, length 44
14:32:34.002373 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 7190, length 44
14:32:34.195932 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 7190, length 44
14:32:44.002495 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 7446, length 44
14:32:54.002603 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 7702, length 44
14:32:54.043619 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 7702, length 44
14:33:04.002510 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 7958, length 44
14:33:04.044866 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 7958, length 44
14:33:14.002512 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 8214, length 44
14:33:14.043729 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 8214, length 44Corresponding system log entries:
Apr 28 14:19:55 pfsense apinger: ALARM: WAN(121.50.212.9) *** down ***
Apr 28 14:20:04 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** down ***
Apr 28 14:20:05 pfsense check_reload_status: Reloading filter
Apr 28 14:20:14 pfsense check_reload_status: Reloading filter
Apr 28 14:32:25 pfsense apinger: ALARM: WAN(121.50.212.9) *** down ***
Apr 28 14:32:34 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** down ***
Apr 28 14:32:35 pfsense check_reload_status: Reloading filter
Apr 28 14:32:44 pfsense apinger: ALARM: WAN(121.50.212.9) *** down ***
Apr 28 14:32:44 pfsense check_reload_status: Reloading filter
Apr 28 14:32:54 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** down ***
Apr 28 14:32:54 pfsense check_reload_status: Reloading filter
Apr 28 14:33:04 pfsense check_reload_status: Reloading filter
Apr 28 14:33:54 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** delay ***
Apr 28 14:34:04 pfsense check_reload_status: Reloading filterI have set the gateway Frequency Probe (probably better called Probe Interval) to 10.
It seems that 1 lost ping reply causes apinger to decide the line is down. That SEEMS to trigger a filter reload 10 seconds later. The next ping getting a reply apparently "cancels" the preceding "line down" (the cancel is really a "line up"?) and seems to trigger a filter reload 10 seconds later. This seems far too sensitive to lost ping ping replies. Maybe I also need to tweak the gateway Down time (currently "default") to be a bit more to require a number of consecutive lost pings and a bit for slow response, say 33 to require 3 consecutive lost pings. I'll try that to see if it reduces the filter reload frequency and if the downloads become more reliable.
-
Changing the gateway down time to 33 didn't seem to help the download problem but it did seem to reduce the noise in the syslog log file considerably.
I wonder if an unfortunate set of gateway parameters might be contributing to the problem a number of people have reported with link startup. If I recall correctly, a number of those reports included apinger down reports and reloading filter reports.
-
I know this doesn't resolve the question itself but if you are downloading 2G files in Firefox, I would recommend installing the DownThemAll extension to assist you with it. It does a great job of multi-threading, error recovery and will search for mirrors of commonly downloaded files.