Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    OpenVPN process(es) die sporadically

    OpenVPN
    2
    9
    4.5k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • JeGrJ
      JeGr LAYER 8 Moderator
      last edited by

      pfSense 1.2, embedded Version, 256MB RAM, 1Ghz Via Eden CPU

      Hi folks,

      after quite a while with this problem I definetly need some help solving it. I don't know if that's a configuration issue or an openvpn bug. Here we go:

      We have the mentioned device running as out (mulit-wan) firewall gateway at out company main location. It works very well besides some very very strange openvpn issues.
      Historically we had a openvpn site2site connection to our datacenter and an openvpn server at work, that our guys use for remote working. After I did some reorganization, I merged the VPN tunnel server and the VPN worker server together with the old and bad working firewall into one strong pfSense installation. Yay! That worked for quite some time, site2site tunnel is very solid (the mentioned device is configured as client for the tunnel). But the dialin OpenVPN services has grown bad.

      For our team, I merged the old configurations into pfSense by creating 11 server configs within pfSense. One for each team member (like the old openvpn server was configured). Now problem is, every now and then, some of these server processes simply die. Yesterday I checked after I got some bug report and found server #10 was not running, so I restarted it and checked the process-list:

      
      ??  Ss     1:09.45 openvpn --config /var/etc/openvpn_server0.conf
      ??  Ss     1:07.35 openvpn --config /var/etc/openvpn_server1.conf
      ??  Ss     1:02.15 openvpn --config /var/etc/openvpn_server2.conf
      ??  Ss     1:03.42 openvpn --config /var/etc/openvpn_server3.conf
      ??  Ss     1:01.38 openvpn --config /var/etc/openvpn_server4.conf
      ??  Ss     1:06.45 openvpn --config /var/etc/openvpn_server5.conf
      ??  Ss     1:10.11 openvpn --config /var/etc/openvpn_server6.conf
      ??  Ss     0:43.88 openvpn --config /var/etc/openvpn_server7.conf
      ??  Ss     0:43.12 openvpn --config /var/etc/openvpn_server8.conf
      ??  Ss     0:41.23 openvpn --config /var/etc/openvpn_server9.conf
      ??  Ss     0:40.20 openvpn --config /var/etc/openvpn_server10.conf
      ??  Ss    40:51.03 openvpn --config /var/etc/openvpn_client0.conf
      
      

      So as you can see, all servers and the client (running the s2s tunnel) are running and are up. No errors in the logs, team can dial in (via dynamic dsl lines).

      This morning I checked again:

      
        553  ??  Ss     1:09.45 openvpn --config /var/etc/openvpn_server0.conf
        792  ??  Ss     0:43.88 openvpn --config /var/etc/openvpn_server7.conf
        857  ??  Ss    40:51.03 openvpn --config /var/etc/openvpn_client0.conf
      
      

      Besides the s2s tunnel, only 2! servers are running, all other server processes have died over the night (including the 3 bosses' links, which was quite … stunning :()

      What's wrong with openvpn in this scenario? Isn't it possible to constantly run the processes?

      I see some errors in the logs like:

      
      Dec  2 19:13:47 gate23 openvpn[696]: event_wait : Interrupted system call (code=4)
      Dec  2 19:13:47 gate23 openvpn[696]: /etc/rc.filter_configure tun5 1500 1545 10.0.1.17 10.0.1.18 init
      Dec  2 19:14:07 gate23 openvpn[696]: SIGHUP[hard,] received, process restarting
      Dec  2 19:14:07 gate23 openvpn[696]: OpenVPN 2.0.6 i386-portbld-freebsd6.2 [SSL] [LZO] built on Sep 13 2007
      Dec  2 19:14:10 gate23 openvpn[696]: LZO compression initialized
      Dec  2 19:14:10 gate23 openvpn[696]: TUN/TAP device /dev/tun6 opened
      Dec  2 19:14:10 gate23 openvpn[696]: /sbin/ifconfig tun6 10.0.1.17 10.0.1.18 mtu 1500 netmask 255.255.255.255 up
      Dec  2 19:14:11 gate23 openvpn[696]: /etc/rc.filter_configure tun6 1500 1545 10.0.1.17 10.0.1.18 init
      Dec  2 19:14:27 gate23 openvpn[696]: UDPv4 link local (bound): [undef]:5005
      Dec  2 19:14:27 gate23 openvpn[696]: UDPv4 link remote: [undef]
      Dec  2 23:22:26 gate23 openvpn[696]: event_wait : Interrupted system call (code=4)
      Dec  2 23:22:26 gate23 openvpn[696]: /etc/rc.filter_configure tun6 1500 1545 10.0.1.17 10.0.1.18 init
      Dec  2 23:22:52 gate23 openvpn[696]: SIGHUP[hard,] received, process restarting
      Dec  2 23:22:52 gate23 openvpn[696]: OpenVPN 2.0.6 i386-portbld-freebsd6.2 [SSL] [LZO] built on Sep 13 2007
      Dec  2 23:22:55 gate23 openvpn[696]: WARNING: file '/var/etc/openvpn_server6.secret' is group or others accessible
      Dec  2 23:22:55 gate23 openvpn[696]: LZO compression initialized
      Dec  2 23:22:55 gate23 openvpn[696]: TUN/TAP device /dev/tun6 opened
      Dec  2 23:22:55 gate23 openvpn[696]: /sbin/ifconfig tun6 10.0.1.17 10.0.1.18 mtu 1500 netmask 255.255.255.255 up
      Dec  2 23:22:55 gate23 openvpn[696]: /etc/rc.filter_configure tun6 1500 1545 10.0.1.17 10.0.1.18 init
      Dec  2 23:22:58 gate23 openvpn[696]: UDPv4 link local (bound): [undef]:5005
      Dec  2 23:22:58 gate23 openvpn[696]: UDPv4 link remote: [undef]
      Dec  3 00:16:44 gate23 openvpn[696]: event_wait : Interrupted system call (code=4)
      Dec  3 00:16:44 gate23 openvpn[696]: /etc/rc.filter_configure tun6 1500 1545 10.0.1.17 10.0.1.18 init
      Dec  3 09:12:05 gate23 openvpn[696]: SIGHUP[hard,] received, process restarting
      Dec  3 09:12:05 gate23 openvpn[696]: OpenVPN 2.0.6 i386-portbld-freebsd6.2 [SSL] [LZO] built on Sep 13 2007
      Dec  3 09:12:07 gate23 openvpn[696]: WARNING: file '/var/etc/openvpn_server6.secret' is group or others accessible
      Dec  3 09:12:07 gate23 openvpn[696]: LZO compression initialized
      Dec  3 09:12:07 gate23 openvpn[696]: TUN/TAP device /dev/tun3 opened
      Dec  3 09:12:07 gate23 openvpn[696]: /sbin/ifconfig tun3 10.0.1.17 10.0.1.18 mtu 1500 netmask 255.255.255.255 up
      Dec  3 09:12:07 gate23 openvpn[696]: FreeBSD ifconfig failed: shell command exited with error status: 1
      Dec  3 09:12:07 gate23 openvpn[696]: Exiting
      
      

      I'm suspecting the LoadBalancer changing the routing or switching lines (UP state) near the "Exiting" timestamps of openvpn to have sth to do with the situation, but am not sure, if that is possible an if that can end up with those side effects, that my openvpn server processes may die when loadbalancing changes. As we have few problems with our second WAN line at the moment (connected to OPT1, VPN processes are running on the "good one" on WAN), the slbd changes weighting quite often, so if that is the problem I'm doomed ;) I can restart the processes manually, but that's no working perspective.

      Anyone? Any ideas to help?

      Don't forget to upvote 👍 those who kindly offered their time and brainpower to help you!

      If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

      1 Reply Last reply Reply Quote 0
      • JeGrJ
        JeGr LAYER 8 Moderator
        last edited by

        There seems some sort of connection between our bandwith problems with WAN2, the loadbalancer (slbd) very often has to realign routing and up'ing and down'ing the second IF. All VPN daemon kills are logged shortly after slbd balances the outgoing connection from WAN2->WAN and back. It seems like the VPN servers are restarted or sth. alike and don't re-use their former tunnel interface correctly. So the openvpn server daemon that was formerly using configuration #3 and tun3 as its interface tries to restart with tun5 for example and that fails, 'cause tun5 can not be configured with the same interface settings as tun3 (duplicate IP etc.). So the daemon terminates. And so on. After many (or most) daemons terminate the other ones can be configured correctly ('cause there are not that much other interfaces available to duplicate) .
        I tried that manually today after I had again lost 6 daemons (that were restarted this morning). After starting a few of them I ran into the tunX IP already configured issue. After I restarted them in a correct order that the tun-IFs won't collide, all was up again.

        But I don't fully understand the correlation with slbd and restarting the daemons. Perhaps some dev has some spare minutes to look at that issue? I suppose that didn't happens that much when one has only a few openvpn daemons configured as server.
        If someone needs further information on that to help, let me know.

        Greets
        Grey

        Don't forget to upvote 👍 those who kindly offered their time and brainpower to help you!

        If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

        1 Reply Last reply Reply Quote 0
        • JeGrJ
          JeGr LAYER 8 Moderator
          last edited by

          Another update:

          I disabled the line-failover via slbd completely last night and this morning, all OpenVPN server processes were still there. But after the service guy from our cable company worked on our bad WAN line2, we disconnected that line and plugged it in later. After the cable modem started working again I checked the pfSense device and: 5 server daemons were down including the client one (to our datacenter). Again the interface binding problem!

          Is there some possibility, that each openvpn configuration can be bound to a specific tunXY interface? I think that would solve that issue.

          Edit: Seems that command is causing the whole issue:

           sh -c killall -HUP openvpn 2>/dev/null 
          

          Don't forget to upvote 👍 those who kindly offered their time and brainpower to help you!

          If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

          1 Reply Last reply Reply Quote 0
          • JeGrJ
            JeGr LAYER 8 Moderator
            last edited by

            Sorry if that seems like I'm speaking to myself. Did modify all server configurations and added a custom devX parameter into all openvpn configurations so they have to use their given tunnel interface. But after running the SIGHUP of all openvpn processes, 2 were missing afterwards. So it seems like the "SIGHUP kill" does not restart all tunnels/servers of openvpn correctly

            Don't forget to upvote 👍 those who kindly offered their time and brainpower to help you!

            If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

            1 Reply Last reply Reply Quote 0
            • GruensFroeschliG
              GruensFroeschli
              last edited by

              Hmmm. I answered to this thread suggesting the -dev tunX , but it seems my post got lost somewhere ^^"

              About the SIGHUP: what exactly do you mean 2 process where missing afterwards?
              You mean they just died and didnt restart?

              We do what we must, because we can.

              Asking questions the smart way: http://www.catb.org/esr/faqs/smart-questions.html

              1 Reply Last reply Reply Quote 0
              • JeGrJ
                JeGr LAYER 8 Moderator
                last edited by

                Exactly. I did see the "killall" in the processlist and waited for ~5min, afterwards only 8 out of 10 openvpn daemons were still alive. And this one was after I added the "dev tunX" to the configs. I don't get it..

                Don't forget to upvote 👍 those who kindly offered their time and brainpower to help you!

                If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                1 Reply Last reply Reply Quote 0
                • GruensFroeschliG
                  GruensFroeschli
                  last edited by

                  Well a kind of a workaround would be to kill the processes via sigterm and then restart them manually in the correct order.

                  (8 processes, sounds like a 3 bit counter to me…)

                  We do what we must, because we can.

                  Asking questions the smart way: http://www.catb.org/esr/faqs/smart-questions.html

                  1 Reply Last reply Reply Quote 0
                  • JeGrJ
                    JeGr LAYER 8 Moderator
                    last edited by

                    There are even more. Its 12 alltogether. 1 client (tunnel to datacenter), 11 server, one per co-worker.

                    Don't forget to upvote 👍 those who kindly offered their time and brainpower to help you!

                    If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                    1 Reply Last reply Reply Quote 0
                    • JeGrJ
                      JeGr LAYER 8 Moderator
                      last edited by

                      OK re-enabled slbd today and it works, even after slbds ICMP poll states DOWN and filters are reloaded, daemons stay alive. I think the problem is related to two things:

                      1. one interface changing (dhcp, dis-/enabling)
                      2. reloading openvpn daemons via the stated command (sh killall -HUP openvpn)

                      The SIGHUP seems to kill a random number of daemons while restarting them (whysoever). ATM I'm ordering new CF-cards to try a clean new installation on one of these and do some modifications. If anyone knows more about that "restart phenomenom" or has problems alike I would be glad to hear some comments.

                      Don't forget to upvote 👍 those who kindly offered their time and brainpower to help you!

                      If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.