[Solved] Site-to-Site and Client-Server OpenVPN randomly reconnecting from client side



  • This a solved question but I post it here for someone to save their time.

    We have a Multi Site-to-Site plus Remote Access VPN servers.
    Central Office, Satellite offices and Road Warriors
    pfSense on any side of the tunnel.
    Central office runs a pfSense in a HyperV Virtual Machine.
    Satellite Offices runs pfSense on PCEngines APU2D2 Hardware
    All versions are fully updated.
    All hardware with AES-NI enabled

    Site-toSite was deployed with:

    • Peer to Peer SSL configuration
    • AES-128-GCM
    • SHA256 Auth
    • LZ4-v2
    • Subnet Toplogy
    • UDP Fast I/O

    After some weeks running in test mode like a charm we changed to production mode.
    Some weeks later we noticed all client side (Site-to-Site and Remote Users) restarting the tunnel once, twice or even more in a hour with the same log event

    Nov 15 15:22:20	openvpn	90690	SIGUSR1[soft,ping-restart] received, process restarting
    Nov 15 15:22:20	openvpn	90690	[VPN Server] Inactivity timeout (--ping-restart), restarting
    

    A 10 second packet loss was detected on the tunnel while restarting de tunnel and some apps went problematic due to this.
    Running continous pings from client side network to server side network did not help to keep tunnel alive.
    No problem were found in any other link (internet, intranet, etc.) managed by the firewalls.
    TCPDump showed nothing but loss of ICMP returns when restarting the tunnel and OpenVPN renegotiation due to client restarting.

    We tried any solution related to the problem, similar to this unsolved question https://forum.netgate.com/topic/115125/openvpn-tunnel-allways-reconnects

    Tried changing compression, encryption algorithm, keeps alive but nothing worked.
    Tried changing some hardware but it was of no use.
    Then changed from SSl to Shared Key and then the tunnel kept established without restarts. This was a workarround but not a solution as we wanted to use AES-GCM as encryption algorith, but this is not possible with Shared Key.

    So we concluded that something were wrong with SSL or Keeps alive.
    A very usefull clue were found on https://forum.netgate.com/post/393487

    [The 60-second timeout is a generic timeout error, not indicative of any specific problem. The server-side logs are better indications of the problem in these cases.
    Most likely explanations:
    Server side blocking the traffic in firewall rules (or failing to pass it, as the case may be)
    ISP/Uplink blocking the traffic
    Time mismatch between client and server
    Certificate/CA mismatch between client and server
    TLS Key mismatch between client and server
    Other setting mismatch between client and server
    The exact mismatch or error would be found in the server logs.]

    Finallly we found that Central Server, were the VPN Server side is running, was out of time sync due to improper virtual machine configuration.
    The server was 3:30 minutes ahead of current NTP time of the rest of the pfSenses.
    Central PFSense server had NTP configured properly but had Time Sync Integration Service enabled, and Host Machine was not properly time synced with NTP Server.

    So PFSense synced with NTP properly but hardware inmediately corrected time to the wrong running time on the host machine.

    So finally the solution was:

    • Disable Time Sync Integration Service on HyperV Configuration
    • Forced ntpdate on Server side inorder to sync date-time
    • Enable the same pool of NTP Servers as time reference for all of them

    So far tunnel are working properly without any problem.

    Conclusion:

    Keep your infrastructure time synced with a reliable source and double check when using virtualization services and integrations


Log in to reply