WAN Gateway randomly down

Inperpetuammemoriam

Hey guys,

I am running pfSense in a single WAN configuration and am experiencing the problem that my WAN Gateway shuts down randomly. This behaviour already started weeks ago when I had "squid" installed (but not even running). The frequency of this problem at that time was about once a day and I could only restart the Gateway when "Snort" and "pfBlockerNG" where not running. After uninstalling "squid" I thought I got rid of the problem since it didn't occur for some time now. I have never been so wrong… Is anyone experiencing the same? Does anybody have an idea what could cause the problem?

Probably related to the same problem is that I won't get internet connectivity after rebooting my device if I don't manually start "snort" before "pfBlockerNG" is active.

Feel free to ask for logs!

Thanks in advance for any support provided!

muswellhillbilly

@Inperpetuammemoriam:

Feel free to ask for logs!

Have you checked the logs yourself to see if anything is showing up? Might be worth having a look first and posting the relevant bits.

Inperpetuammemoriam

Thank you for your quick reply! I understand that you might be concerned about users posting questions they might answer themselves by just taking a short look into their logs. However, if I could have managed to solve this burning issue myself, I guarantee you that I would have done that weeks ago without this forum's help.

The problem occurred again today and - happily before they where overwritten - I managed to get some of the log entries. In the gateway log this line seems to be suspicious since it marks the beginning of increased packet loss:


Nov 13 17:07:10 	apinger: Starting: /usr/local/sbin/pfSctl -c 'service reload dyndns GW_WAN' -c 'service reload ipsecdns' -c 'service reload openvpn GW_WAN' -c 'filter reload'

At the same time the general log printed the following:


Nov 13 17:07:11 	php-fpm[98341]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use GW_WAN.
Nov 13 17:07:10 	check_reload_status: Reloading filter
Nov 13 17:07:10 	check_reload_status: Restarting OpenVPN tunnels/interfaces
Nov 13 17:07:10 	check_reload_status: Restarting ipsec tunnels
Nov 13 17:07:10 	check_reload_status: updating dyndns GW_WAN

Note that I have configured "OpenVPN" and "Ipsec" but currently both of them are disabled. "Dyndns" isn't even configured…

Neither "Snort" nor "pfBlockerNG" printed informative lines. "pfBlockerNG" ran its sync process just before:


Nov 13 17:04:01 	php: pfblockerng.php: [pfBlockerNG] Starting sync process.

PS: I increased the log file sizes in order to hopefully not miss some of the information.

David_W

@Inperpetuammemoriam:

The problem occurred again today and - happily before they where overwritten - I managed to get some of the log entries. In the gateway log this line seems to be suspicious since it marks the beginning of increased packet loss:
Nov 13 17:07:10 	apinger: Starting: /usr/local/sbin/pfSctl -c 'service reload dyndns GW_WAN' -c 'service reload ipsecdns' -c 'service reload openvpn GW_WAN' -c 'filter reload'

You're mixing up cause and effect. That line, and all the other log entries you mention, are what happens when apinger detects your gateway changes state (up -> down, or down -> up). Some of the logged actions might amount to a no-op, depending on your configuration.

Either apinger is correctly detecting your gateway is going down randomly, or apinger is malfunctioning. apinger can be rather troublesome and is, I believe, due for replacement for pfSense 2.3. With a single WAN you can run without apinger enabled if you wish: System -> Routing, Gateways tab, 'e' next to your gateway, check 'Disable gateway monitoring' and Save. Rather than disabling gateway monitoring, you might find it better to tweak the monitoring thresholds using the Advanced button on that page.

Inperpetuammemoriam

Thank you for your hints!

I have set some of the values to what I would consider more conservative ones:

high latency threshold: 1000
high packet loss threshold: 50

I will try to let you know about the outcome as soon as possible. I am however still wondering why "Snort" and "pfBlockerNG" are preventing the gateway to go up again…

Either apinger is correctly detecting your gateway is going down randomly, or apinger is malfunctioning. apinger can be rather troublesome and is,

Could misbehaviour of "apinger" cause the gateway to go down (and the internet connectivity with it) or am I again mixing cause and effect?

I believe, due for replacement for pfSense 2.3.

Where did you get that information?

David_W

apinger is the component that current versions of pfSense use to monitor gateways - whether each gateway is up or down, also working out latency and packet loss. Unfortunately, apinger is a rather troublesome program, as searches of this forum and especially Redmine will attest.

There have been various comments endorsing the desirability of replacing apinger, such as Chris Buechler's comment on Redmine #4081 suggesting pfSense 2.3 will use something different.

What is clear is that apinger thinks your gateway is going down, and pfSense is responding accordingly. What is unclear is whether your gateway is actually going down, or whether apinger is falsely concluding the gateway is down when it is not.

If you are experiencing problems with apinger, it can help to check 'State Killing on Gateway Failure' in System -> Advanced, Miscellaneous tab. This option is arguably named incorrectly - it should really be called 'No state killing on gateway failure'. When checked, it stops pfSense from resetting all states using a gateway when that gateway is reported to have gone down.