Adjusting apinger impact

sonicbuddha

My standard router just died so I took this opportunity to build a new router and upgrade to the newest version of pfsense and I've been updating to the newest build every day. I've noticed that whenever the router is under heavy traffic, such as routing my VPN traffic to work or a bittorrent that maxes out my throughput, apinger gives an alarm that it cannot ping my ISP gateway and, when it returns, it reloads the firewall. This seems to result in any persistent connections being dropped, such as my VPN connection. This is understandably frustrating. I've seen several threads commenting on apinger but no discussion on how to tune or resolve this impact. Is there a way to reduce apinger's sensitivity or make its result informative instead of an alarm that results in a reload of the firewall?

Efonnes

The advanced settings for the gateway under System: Routing may be useful.

I don't really know if there is a way to disable this behavior entirely, but it probably isn't supposed to be doing it in a single-WAN configuration.

sonicbuddha

I love responding to my own posts. ;)

Under routing>route (default) there is an advanced section where Latency thresholds, Packaet Loss thresholds, and Down are adjustable. To these apply? And, if so, what are reasonable values? I'm loathe to set a value that is ridiculously high and therefore defer the actual value of this alarm just to avoid the issue of my ISP gateway not responding to pings, assuming this actually applies.

sonicbuddha

Glad to know I found the right place- funny how taking time to post is enough to get the mind rolling. It also means this is probably resolvable, which is fantastic.

So, can you recommend reasonable values?

sonicbuddha

You know, there's a reason why I prefer Unix based open source firewalls.

I popped open a shell and checked the auto-generated /var/etc/autopinger.conf and the file that generates it, /etc/inc/gwlb.inc. Yes, the advanced settings under routes define the configuration of apinger. The defaults:

Alarm down: 10s
Alarm delay: low (alarm off) 200ms, high (alarm on) 500ms
Alarm loss: low 10% high 20%

Strange but the rrd graph for this gateway under quality doesn't seem to show my router breaking these high marks, specifically the 'delay' alarm, ever. I've set all the values up by at least 2x (in the gui) to see the result.

sonicbuddha

I'm able to mitigate the impact by setting the alarm level very high, which is good. The question I have, however, is why that apinger resets the firewall when it detects an alarm state. What is the reason for this? And can I disable it? I appreciate the alarm but the firewall reload seems to be extreme.

Adding the entries from the log:
Sep 17 17:44:54 apinger: ALARM: ISP_Gateway(64.142.6.1) *** ISP_Gatewaydelay ***
Sep 17 17:45:04 check_reload_status: reloading filter
Sep 17 17:45:16 apinger: alarm canceled: ISP_Gateway(64.142.6.1) *** ISP_Gatewaydelay ***
Sep 17 17:45:26 check_reload_status: reloading filter
Sep 17 17:45:29 apinger: ALARM: ISP_Gateway(64.142.6.1) *** ISP_Gatewaydelay ***
Sep 17 17:45:39 apinger: alarm canceled: ISP_Gateway(64.142.6.1) *** ISP_Gatewaydelay ***
Sep 17 17:45:39 check_reload_status: reloading filter
Sep 17 17:45:49 check_reload_status: reloading filter
Sep 17 17:46:08 apinger: ALARM: ISP_Gateway(64.142.6.1) *** ISP_Gatewaydelay ***
Sep 17 17:46:18 check_reload_status: reloading filter
Sep 17 17:46:19 apinger: alarm canceled: ISP_Gateway(64.142.6.1) *** ISP_Gatewaydelay ***

cmb

If a connection is down, you don't want your states to be stuck on that connection, as active connections will never fail over to your remaining available connections. So a dead connection triggers killing all states on that connection. Just set your levels appropriately so it's not triggered unless it's truly down.

sonicbuddha

Thank you for the reply.

What if this is a single WAN configuration (which it is)? Why set it to fail over when there is nothing to fail over to? Why not just let connections eventfully time out- or resume, if the connection returns. Is there an advantage to having an alarm trigger a reload, breaking any persistent connections, when it is a single WAN situation?

dreamslacker

High latency on the line is a sign of your connection being saturated. You might want to use the Traffic shaper to limit the bandwidth of the line to the actual line capabilities. This will prevent the latencies from sky-rocketing.
Another thing I've found useful to do in pfsense 2.0 is to set floating rules to direct ICMP into the highest priority queue (qACKs in my case).
If not, ICMP will be relegated to the default queue where, if the line is under heavy usage, may drop or greatly delay the packets that aPinger uses to test your line status. Since the ICMP packets are small and we only need apinger to determine if the line is up, this works very well and doesn't adversely affect the connection on a whole.

sonicbuddha

These are some great ideas and something I'll definitely work on in the near future, thanks for the recommendations, esp when I start tuning the connection.

I've also noticed, by monitoring my quality rrd graphs, that they can occur randomly when there is no heavy network usage. It may be that my ISP gateway occasionally throttles back on responses- or something completely different.

cmb

Ah yeah, that runs now regardless of whether you have multi-WAN, and that behavior isn't desirable if you have a single connection.
http://redmine.pfsense.org/issues/911

sonicbuddha

Thank you very much! Your responsiveness is appreciated.

lyserge

@cmb:

Ah yeah, that runs now regardless of whether you have multi-WAN, and that behavior isn't desirable if you have a single connection.
http://redmine.pfsense.org/issues/911

Hmm, I think this explains a problem I have had (it sounds related to Bug #911):

A couple of weeks ago my ISP had a failure that made my monitor IP (GW) on WAN (DHCP) unavailable for a longer period.

I thought it was extra weird that I wasn't able to connect to my other router residing outside WAN2 when the GW on my primary WAN was down.

What I remember was that I was able to get the login screen but then the login process just appeared to be hung (states got cleared?).

The monitor IP on WAN2 (DHCP) was never down. I can't trigger this again as it is not the same thing as just unplugging the cable on WAN.

AhnHEL

@dreamslacker:

High latency on the line is a sign of your connection being saturated. You might want to use the Traffic shaper to limit the bandwidth of the line to the actual line capabilities. This will prevent the latencies from sky-rocketing.
Another thing I've found useful to do in pfsense 2.0 is to set floating rules to direct ICMP into the highest priority queue (qACKs in my case).
If not, ICMP will be relegated to the default queue where, if the line is under heavy usage, may drop or greatly delay the packets that aPinger uses to test your line status. Since the ICMP packets are small and we only need apinger to determine if the line is up, this works very well and doesn't adversely affect the connection on a whole.

This, along with the new System/Advanced/Miscellaneous/Gateway Monitoring Setting has solved all my apinger problems I was having, most especially with gaming. Thank you, I believe this should be stickied in the 2.0 Beta forum

RossN

Happy New Year folks!

Despite of the adjustment I have tried to make I still constantly observe this issue.
Please advice how to disable it or to stop getting this warning and dropping the connections.
It is definitely something wrong with the router, because with the spare Linksys on interruptions are observed.
Jan 1 15:43:57 check_reload_status: reloading filter
Jan 1 15:43:57 check_reload_status: reloading filter
Jan 1 15:43:47 apinger: alarm canceled: Gateway(xx.xx.xx.1) *** Gatewaydown ***
Jan 1 15:43:47 apinger: ALARM: Gateway(xx.xx.xx.1) *** Gatewaydown ***
Jan 1 15:40:36 check_reload_status: reloading filter
Jan 1 15:40:36 check_reload_status: reloading filter
Jan 1 15:40:26 apinger: alarm canceled: Gateway(xx.xx.xx.1) *** Gatewaydown ***
Jan 1 15:40:26 apinger: ALARM: Gateway(xx.xx.xx.1) *** Gatewaydown ***
Jan 1 15:37:21 check_reload_status: reloading filter
Jan 1 15:37:21 check_reload_status: reloading filter
Jan 1 15:37:11 apinger: alarm canceled: Gateway(xx.xx.xx.1) *** Gatewaydown ***
Jan 1 15:37:11 apinger: ALARM: Gateway(xx.xx.xx.1) *** Gatewaydown ***
Jan 1 15:33:33 check_reload_status: reloading filter
Jan 1 15:33:33 check_reload_status: reloading filter
Jan 1 15:33:23 apinger: alarm canceled: Gateway(xx.xx.xx.1) *** Gatewaydown ***
Jan 1 15:33:23 apinger: ALARM: Gateway(xx.xx.xx.1) *** Gatewaydown ***

Some addtional information, I have one WAN, static IP.
Gateway is set to 10 seconds down time. The rest is empty (but I did test with the values mentioned above).
Gateway monitor is disabled.

eri--

Disabling state killing will not disable notification on logs.
It just means that the states will not get killed and that's it.