OpenVPN drops all clients during late-night hours



  • Hello all!

    I have searched the web and this forum pretty well searching for the answer to my question, but I have not found any satisfactory answers, explanations or solutions.

    Here goes:

    During the late night hours (EDT) OpenVPN decides that it does not want to continue activity with users logged on and will not allow new connections. This happens between 0:00 and 06:00 (often multiple times a night), only lasts for 1 to 5 minutes  and does not happen during any other time of day.

    Here is a sample of the errors I see in the OpenVPN log:

    openvpn[14011]: User_F/[someone's public ip]:61970 write TCPv4_SERVER: Operation not permitted (code=1)
    May 14 04:00:08
    openvpn[14011]: User_L/[someone's public ip]:48308 write TCPv4_SERVER: Operation not permitted (code=1)
    May 14 04:00:07
    openvpn[14011]: User_E/[someone's public ip]:59777 write TCPv4_SERVER: Operation not permitted (code=1)
    May 14 04:00:07
    openvpn[14011]: User_C/[someone's public ip]:1902 write TCPv4_SERVER: Operation not permitted (code=1)
    May 14 04:00:06
    openvpn[14011]: User_D/[someone's public ip]:47225 write TCPv4_SERVER: Operation not permitted (code=1)
    May 14 04:00:05
    openvpn[14011]: User_B/[someone's public ip]:42140 write TCPv4_SERVER: Operation not permitted (code=1)

    I have attached a longer log so you can see how the OpenVPN server literally goes nuts for 2-3 minutes then decides to operate normally again.

    I am running pfSense 2.0-Release (i386) with the following specs:

    System: Dell R210
    NIC: Intel(R) PRO/1000 Network Connection version - 2.2.3 (4 Ports)
    OpenVPN version: OpenVPN 2.2.0 i386-portbld-freebsd8.1

    Here is a sample of our client configurations:

    Change these lines to match the .crt and .key files on your computer

    cert USER.crt
    key USER.key

    Don't change anything below this line

    port 8080
    client
    dev tun
    proto tcp
    remote myvpn.hostname.com
    resolv-retry infinite
    nobind
    persist-key
    persist-tun
    tls-client
    ca ca.crt
    ns-cert-type server
    route-method exe
    route-delay 2
    comp-lzo
    pull
    verb 3

    Here is our OpenVPN Server config:

    dev ovpns1
    dev-type tun
    dev-node /dev/tun1
    writepid /var/run/openvpn_server1.pid
    #user nobody
    #group nobody
    script-security 3
    daemon
    keepalive 10 60
    ping-timer-rem
    persist-tun
    persist-key
    proto tcp-server
    cipher BF-CBC
    up /usr/local/sbin/ovpn-linkup
    down /usr/local/sbin/ovpn-linkdown
    local [public ip]
    tls-server
    server [openVPN network and mask]
    client-config-dir /var/etc/openvpn-csc
    lport 8080
    management /var/etc/openvpn/server1.sock unix
    push "route [private net and mask]"
    push "dhcp-option DOMAIN mydomain.com"
    push "dhcp-option DNS <pfsense internal="" ip="">"
    duplicate-cn
    ca /var/etc/openvpn/server1.ca
    cert /var/etc/openvpn/server1.cert
    key /var/etc/openvpn/server1.key
    dh /etc/dh-parameters.1024
    crl-verify /var/etc/openvpn/server1.crl-verify
    comp-lzo
    persist-remote-ip
    float
    […. bunch of push statements for production systems]

    I am at a loss for what is causing this problem. No other services on the pfSense Firewall we have lose their minds like this one does.
    LAN stays up, WAN Stays up, IPSec Tunnel stays up, ntop stays up, no failures from our external monitoring server (pings, checks ports, etc).

    Any help you can provide given the information I've put here would be greatly appreciated.

    Thanks!

    openvpn_05-14.14.log-scrubbed.txt</pfsense>



  • Something is deleting some or all states. First, upgrade, there are a few bugs in the original 2.0-release along those lines. Possibly the monitor IP fails, and 2.0 will kill some states by default there, where 2.1.x doesn't. Also should be running 64 bit on that hardware, not 32, though that's unrelated to this problem.



  • Where would be a good place to start? We don't have anything that is going in and deleting states. Is there an option that needs to be changed?



  • @JonTheGuy:

    Where would be a good place to start?

    Upgrading. That in and of itself might fix it since state killing isn't done by default on gateway failure, or it might be related to other fixes in one of the 6 stable releases since. In 2.0-rel, there isn't an option to disable that state killing short of source editing, IIRC 2.0.1 was the first with that as a GUI option.


Log in to reply