PfSense 2.1 Randomly Reloading and dropping all connections during day



  • During the work day my users and randomly getting kicked out of RDP and Teamview/Logmein sessions.

    Looking at the system log I am seeing this logged everytime these disconnects happen.

    Jun 3 16:50:10 check_reload_status: updating dyndns WANGW
    Jun 3 16:50:10 check_reload_status: Restarting ipsec tunnels
    Jun 3 16:50:10 check_reload_status: Restarting OpenVPN tunnels/interfaces
    Jun 3 16:50:10 check_reload_status: Reloading filter
    Jun 3 16:50:12 php: : OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
    Jun 3 16:50:12 php: : OpenVPN: Resync server1 Split VPN Tunnel
    Jun 3 16:50:12 php: : IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.
    Jun 3 16:50:12 kernel: in6_purgeaddr: link-local all-nodesmulticast address deletion error
    Jun 3 16:50:12 kernel: in6_purgeaddr: node-local all-nodesmulticast address deletion error
    Jun 3 16:50:12 kernel: ovpns1: link state changed to DOWN
    Jun 3 16:50:13 kernel: ovpns1: link state changed to UP
    Jun 3 16:50:13 check_reload_status: rc.newwanip starting ovpns1
    Jun 3 16:50:13 kernel: rn_addmask: mask impossibly already in tree
    Jun 3 16:50:15 php: : rc.newwanip: Informational is starting ovpns1.
    Jun 3 16:50:15 php: : rc.newwanip: on (IP address: 10.10.1.1) (interface: ) (real interface: ovpns1).
    Jun 3 16:50:15 php: : pfSense package system has detected an ip change -> 10.10.1.1 … Restarting packages.
    Jun 3 16:50:15 check_reload_status: Starting packages
    Jun 3 16:50:17 php: : Restarting/Starting all packages.
    Jun 3 16:50:39 check_reload_status: updating dyndns WANGW
    Jun 3 16:50:39 check_reload_status: Restarting ipsec tunnels
    Jun 3 16:50:39 check_reload_status: Restarting OpenVPN tunnels/interfaces
    Jun 3 16:50:39 check_reload_status: Reloading filter
    Jun 3 16:50:41 php: : IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.
    Jun 3 16:50:41 php: : OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
    Jun 3 16:50:41 php: : OpenVPN: Resync server1 Split VPN Tunnel
    Jun 3 16:50:41 kernel:
    Jun 3 16:50:41 kernel: in6_purgeaddr: link-local all-nodesmulticast address deletion error
    Jun 3 16:50:41 kernel: in6_purgeaddr: node-local all-nodesmulticast address deletion error
    Jun 3 16:50:41 kernel: ovpns1: link state changed to DOWN
    Jun 3 16:50:42 kernel: ovpns1: link state changed to UP
    Jun 3 16:50:42 check_reload_status: rc.newwanip starting ovpns1
    Jun 3 16:50:42 kernel: Non-unique normal route, mask not entered
    Jun 3 16:50:42 kernel: pfr_unroute_kentry: delete failed.
    Jun 3 16:50:44 php: : rc.newwanip: Informational is starting ovpns1.
    Jun 3 16:50:44 php: : rc.newwanip: on (IP address: 10.10.1.1) (interface: ) (real interface: ovpns1).
    Jun 3 16:50:44 php: : pfSense package system has detected an ip change -> 10.10.1.1 ... Restarting packages.
    Jun 3 16:50:44 check_reload_status: Starting packages
    Jun 3 16:50:46 php: : Restarting/Starting all packages.

    how do I resolve this?



  • Guess 1: are there any HOTPLUG events in the system log before this happens? If so, then something physical is happening on WAN that makes pfSense see it go away and come back, thus setting a bunch of stuff.
    Guess 2: The WAN gateway monitoring has decided it is down, maybe due to either packet loss or latency measures. If there are huge downloads happening then the gateway monitoring ping packets can easily get delayed (or even lost).
    Change the gateway monitoring advanced parameters to allow for higher packet loss and/or latency before the gateway is declared down; or
    If WAN is your only gateway to the internet, then you don't gain much in real life having gateway monitoring, as there is nowhere to failover to anyway when WAN is down. Disable gateway monitoring on WAN gateway.

    That's my 2c worth:)



  • I had the same problem.  I noticed that my gateways logs were filling with:

    
    Jun 8 02:48:54 	apinger: alarm canceled: WANGW(109.224.xx.xxx) *** delay ***
    Jun 8 02:48:34 	apinger: ALARM: WANGW(109.224.xx.xxx) *** delay ***
    Jun 8 02:47:13 	apinger: alarm canceled: WANGW(109.224.xx.xxx) *** delay ***
    Jun 8 02:47:05 	apinger: ALARM: WANGW(109.224.xx.xxx) *** delay ***
    
    

    Disabling gateway checks fixed the problem.  Bandwidth still stutters from upstream, but the FW resetting and reloading rules every time the gateway check decided it was failing was both burning up the CPU and creating additional interruptions in service.

    Most apps seem to work ok with the flakiness imposed by the ISP, but were failing with the amplification of that flakiness from the reloads.



  • Hi, I confirm the existence this problem… disabling the gateway monitoring fixes the problem.

    I also deleted with the "cron" package the dyndns update task, because every time that this problem occurs I also find an entry in the system log related to the dyndns update, I don't know if this can be useful... unfortunately this has no impact on the problem.

    Thanks,
    Michele



  • Disabling gateway monitoring isn't the right solution, you should disable state killing upon gateway failure instead under System>Advanced. The root cause is your gateway is going down. If you don't want that to drop connections, disable the state killing.



  • @cmb:

    Disabling gateway monitoring isn't the right solution, you should disable state killing upon gateway failure instead under System>Advanced. The root cause is your gateway is going down. If you don't want that to drop connections, disable the state killing.

    Ok, I did that (System>Advanced>Misc.>States>checked!), tomorrow I will tell you if this workaround works. Actually, I see some packet loss sometimes, I already called my provider to investigate, but in the while at least I don't get crazy…

    Thanks a lot!
    Michele



  • @cmb:

    Disabling gateway monitoring isn't the right solution, you should disable state killing upon gateway failure instead under System>Advanced. The root cause is your gateway is going down. If you don't want that to drop connections, disable the state killing.

    I am curious why disabling state killing is a better solution for single gateway systems.  The only reason I just disable state killing is I like having the gateway latency graphs which I think only work with gateway monitoring.  This is on a static IP system.  Is there another reason why you should leave gateway monitoring on?  I assumed it was only to kill states for dual WAN which I don't want to happen on single gateway HA setups.



  • Because you don't want to lose the quality graphs, unless you just don't want them at all. Disabling monitoring on a flaky Internet connection is less than desirable, hides what in many circumstances is the best if not only way to clearly see the flakiness.



  • That is what I suspected.  Thanks for the clarification.



  • Great.  I will make this change tomorrow and see where it leaves me and also see if I can get some additional info from my ISP, Verizon FIOS.

    Thank you.


Log in to reply