Apinger and check_reload_status

wallabybob

After a number of failed attempts to download a 2GB movie file through Firefox I started to look around to see if I could see something that might contribute to the problem. (Often the download stalls for over ten minutes with no failure reported, web pages downloaded fine; maybe Firefox bug but I think I have seen similar things in Chrome. Sometimes the browser transfer only part of a file, most recently 135MB of a 2GB file, but doesn't report any error.)

I am running

2.0.1-RELEASE (i386)
built on Mon Dec 12 17:53:52 EST 2011
FreeBSD 8.1-RELEASE-p6

I looked at my pfSense system log and saw numerous instances of sequences like: Apr 27 22:51:49 pfsense apinger: ALARM: WAN(121.50.212.9) *** delay ***
Apr 27 22:51:59 pfsense apinger: ALARM: WAN(121.50.212.9) *** down ***
Apr 27 22:51:59 pfsense check_reload_status: Reloading filter
Apr 27 22:52:00 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** down ***
Apr 27 22:52:09 pfsense check_reload_status: Reloading filter
Apr 27 22:52:10 pfsense check_reload_status: Reloading filter
apinger reports something, then 10 seconds later check_reload_status report "Reloading filter".

In System -> Routing -> Gateways I have set Frequency Probe to 10. Does that also set the 10 seconds between apinger reports and the reloading filter?

What does the "Reloading Filter indicate"?

In Apr 27 23:00:50 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** delay ***
Apr 27 23:00:54 pfsense kernel: Bump sched buckets to 64 (was 0)
Apr 27 23:00:59 pfsense kernel: Bump sched buckets to 64 (was 0)
Apr 27 23:01:00 pfsense apinger: ALARM: WAN(121.50.212.9) *** down ***
Apr 27 23:01:00 pfsense check_reload_status: Reloading filter
Apr 27 23:01:00 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** down ***
Apr 27 23:01:10 pfsense check_reload_status: Reloading filter
Apr 27 23:01:10 pfsense apinger: ALARM: WAN(121.50.212.9) *** down ***
Apr 27 23:01:10 pfsense check_reload_status: Reloading filter
Apr 27 23:01:10 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** down ***
Apr 27 23:01:11 pfsense kernel: Bump sched buckets to 64 (was 0)
Apr 27 23:01:20 pfsense check_reload_status: Reloading filter
Apr 27 23:01:28 pfsense kernel: Bump sched buckets to 64 (was 0)
Apr 27 23:01:33 pfsense kernel: Bump sched buckets to 64 (was 0)
Apr 27 23:01:39 pfsense kernel: Bump sched buckets to 64 (was 0)
Apr 27 23:01:40 pfsense apinger: ALARM: WAN(121.50.212.9) *** down ***
Apr 27 23:01:40 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** down ***
Apr 27 23:01:50 pfsense check_reload_status: Reloading filter
Apr 27 23:01:50 pfsense check_reload_status: Reloading filter

I ran a tcpdump packet capture filtering on the ping target and compared with apinger and check_reload_status entries in the system log. I have removed the packet capture entries that don't seem related to the apinger/check_reload_status system log reports:

14:19:33.989677 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 52757, length 44
14:19:34.344118 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 52757, length 44
14:19:43.989693 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 53013, length 44
14:19:44.287608 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 53013, length 44
14:19:53.989713 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 53269, length 44
14:20:03.989710 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 53525, length 44
14:20:04.825708 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 53525, length 44
14:20:13.989802 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 53781, length 44
14:20:14.542881 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 53781, length 44

14:32:14.002351 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 6678, length 44
14:32:14.304134 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 6678, length 44
14:32:24.002356 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 6934, length 44
14:32:34.002373 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 7190, length 44
14:32:34.195932 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 7190, length 44
14:32:44.002495 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 7446, length 44
14:32:54.002603 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 7702, length 44
14:32:54.043619 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 7702, length 44
14:33:04.002510 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 7958, length 44
14:33:04.044866 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 7958, length 44
14:33:14.002512 IP 203.144.23.83 > 121.50.212.9: ICMP echo request, id 31505, seq 8214, length 44
14:33:14.043729 IP 121.50.212.9 > 203.144.23.83: ICMP echo reply, id 31505, seq 8214, length 44

Corresponding system log entries:
Apr 28 14:19:55 pfsense apinger: ALARM: WAN(121.50.212.9) *** down ***
Apr 28 14:20:04 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** down ***
Apr 28 14:20:05 pfsense check_reload_status: Reloading filter
Apr 28 14:20:14 pfsense check_reload_status: Reloading filter
Apr 28 14:32:25 pfsense apinger: ALARM: WAN(121.50.212.9) *** down ***
Apr 28 14:32:34 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** down ***
Apr 28 14:32:35 pfsense check_reload_status: Reloading filter
Apr 28 14:32:44 pfsense apinger: ALARM: WAN(121.50.212.9) *** down ***
Apr 28 14:32:44 pfsense check_reload_status: Reloading filter
Apr 28 14:32:54 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** down ***
Apr 28 14:32:54 pfsense check_reload_status: Reloading filter
Apr 28 14:33:04 pfsense check_reload_status: Reloading filter
Apr 28 14:33:54 pfsense apinger: alarm canceled: WAN(121.50.212.9) *** delay ***
Apr 28 14:34:04 pfsense check_reload_status: Reloading filter

I have set the gateway Frequency Probe (probably better called Probe Interval) to 10.

It seems that 1 lost ping reply causes apinger to decide the line is down. That SEEMS to trigger a filter reload 10 seconds later. The next ping getting a reply apparently "cancels" the preceding "line down" (the cancel is really a "line up"?) and seems to trigger a filter reload 10 seconds later. This seems far too sensitive to lost ping ping replies. Maybe I also need to tweak the gateway Down time (currently "default") to be a bit more to require a number of consecutive lost pings and a bit for slow response, say 33 to require 3 consecutive lost pings. I'll try that to see if it reduces the filter reload frequency and if the downloads become more reliable.

wallabybob

Changing the gateway down time to 33 didn't seem to help the download problem but it did seem to reduce the noise in the syslog log file considerably.

I wonder if an unfortunate set of gateway parameters might be contributing to the problem a number of people have reported with link startup. If I recall correctly, a number of those reports included apinger down reports and reloading filter reports.

acald

I know this doesn't resolve the question itself but if you are downloading 2G files in Firefox, I would recommend installing the DownThemAll extension to assist you with it. It does a great job of multi-threading, error recovery and will search for mirrors of commonly downloaded files.