OpenVPN Not Accepting Connections



  • Hi,

    Another issue I have been seeing only in recent snapshots - OpenVPN seems to be running / up, but won't accept connections. If I restart the service from the web interface … then I can connect (for a while at least, then it crashes again, have to restart).

    Anyone else having this issue? Again, only recent builds ... before that it was rock solid.

    Thanks!



  • 2.4.0-BETA (amd64)
    built on Tue Jul 25 07:05:42 CDT 2017
    FreeBSD 11.0-RELEASE-p11

    Presently running 6 OpenVPN server instances.



  • I can get connections, but up and down (the server) … restart it, and it's good again for a while ... :(.

    Thanks.



  • FYI, it just happened again - I was actively connected to OpenVPN, then the connection was dropped (server issue it seems). I can no longer connect - I need to get in through SSH again, reset the server … :(.

    Are there any particular logs that may capture what is going on?

    Thanks.



  • system log / openvpn log ?
    Is the openvpn process still running? ps -A | grep openvpn
    Is php 'bussy' when its time to authenticate again? ps -A | grep php ? How many fpm processes are running?



  • Hi,

    OK, OpenVPN is running … two instances? To your question,

    ps -A | grep openvpn
    46484  -  Ss       0:01.85 /usr/local/sbin/openvpn --config /var/etc/openvpn/server1.conf
    47716  -  S        0:00.09 /usr/local/sbin/openvpn --config /var/etc/openvpn/server1.conf
    

    But also, when I check for the port it's on,

    netstat -ln -p tcp | grep 1194
    

    Nope, nothing out -> it's not listening, so that explains the lack of connections.

    I do see some errors in the logs, but it's odd - as I can connect sometimes (so these errors don't make sense to me) - thoughts?

    WARNING: Failed running command (--tls-verify script): external program exited with error status: 1
    OpenSSL: error:14089086:SSL routines:ssl3_get_client_certificate:certificate verify failed
    TLS_ERROR: BIO read tls_read_plaintext error
    TLS Error: TLS object -> incoming plaintext read error
    TLS Error: TLS handshake failed
    Fatal TLS error (check_tls_errors_co), restarting
    

    But this is connecting from the same client and remote IP that had just been connected … :(.

    Trying to get in to the web interface to check more out, but even on restart it seems to not be coming back up ... :(. Error in the logs as below,
    pfSense check_reload_status: Could not connect to /var/run/php-fpm.socket

    Any suggestions would be appreciated!



  • I may be on to something - it seems like renegotiation may be the issue … or at least today, it was up for exactly an hour (my renegotiation period), then failed. And now it'd down / dead ... :(.

    I want to try restarting, see if this is really it, but now I only have ssh access. Is there an easy way to restart the OpenVPN server over ssh (CLI), to be able to get back in again?

    Thanks!



  • @arrmo:

    I may be on to something - it seems like renegotiation may be the issue … or at least today, it was up for exactly an hour (my renegotiation period), then failed. And now it'd down / dead ... :(.

    I want to try restarting, see if this is really it, but now I only have ssh access. Is there an easy way to restart the OpenVPN server over ssh (CLI), to be able to get back in again?

    Thanks!

    something like below
    @heper:

    this seems to work for me:

    
    pfSense shell: include('service-utils.inc');
    pfSense shell: service_control_restart("openvpn", array('vpnmode' => 'client', 'id' => '1'));
    pfSense shell: exec
    
    

    you can create a new macro for it & call it the same way from cron.

    you can figure out the client 'ID' by capturing the post-data when clicking the restart-button in the GUI  or from config.xml or ….



  • While on ssh and the vpn wont connect, might first try restarting webgui and php, as php is used during authentication.. If the webgui is in a 'problem state' (nginx gateway timeout)  that can also affect openvpn connections..
    The " Failed running command (–tls-verify script)" sounds like this a bit..

    Also openvpn is using UDP (by default)  which could explain your netstat not returning anything.?.



  • That may be it - as my webgui is constantly down lately …  :(. Really only recent v2.4 builds - was rock solid until a couple weeks back, now having all sorts of issues, down more than it's up. But isn't php still working / available, even if php-fpm is down? But yes, nginx gateway timeout a lot lately.

    FYI, I'm using TCP for OpenVPN, given my client network access.

    Thanks!



  • I think you're on to something! I ssh'd in, and restarted php-fpm, no change to OpenVPN => but now I can connect again! At least until php-fpm goes down again … ;).

    So that may be it, the php side of things. That said ... any thoughts why php-fpm is crashing constantly lately? Anyone else seeing this? I don't recall changing anything really, just updating to the most recent v2.4 builds.

    Thanks!



  • If php-fpm is having issues, then openvpn authentication script will have issues..
    For checking the listening socket maybe use "sockstat -4 | grep openvpn" instead of netstat.

    Okay so lets say php-fmp is likely the problem..
    How many php-fpm processes are running? After restart, and when the problems happen?

    Maybe you can apply this patch?: https://github.com/pfsense/pfsense/pull/3769

    Which lets you run a "curl -k https://localhost/status" to show what scripts are currently running in php-fpm..
    Then to find out why they hang..

    p.s. No ipsec widget on your dashboard.?.



  • Are you meaning to apply it through system patches? Or it will get overwritten on upgrade, no?

    FYI, just updated to the latest release, still issues - up for ~ 3 hours this time (a recent record … ;)), but crashed again, took several services down.

    No, no ipsec widget - why would there be?

    Thanks!



  • OK, having a heck of a time - can't keep the web interface up 2-3 hours, and often it's taking down unbound as well => can't get anywhere on the internet … unhappy family ... :(.

    Looking at the logs, I'm seeing A LOT of this error (just an example one here) - mean anything to anyone?

    2017/07/28 19:36:49 [crit] 38264#100123: *18222 connect() to unix:/var/run/php-fpm.socket failed (2: No such file or directory) while connecting to upstream, client: 192.168.2.66, server: , request: "GET /widgets/widgets/thermal_sensors.widget.php?getThermalSensorsData=11501288610778 HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket:", host: "pfsense.local.home", referrer: "http://pfsense.local.home/"
    

    I just went through another cycle … web interface down, multiple attempted restarts (from the shell) of php-fpm and web configurator => no joy, not sure how to restart unbound from the shell => reboot (again). Ouch.

    Thanks!



  • FYI, cautiously optimistic … but it's looking like the issue is if I leave a web browser page open to the dashboard. Other pages, no nginx errors (so far at least), but I see errors as soon as I go to the dashboard.

    Anyone else leaving the dashboard open (and seeing nginx issues?)?

    Thanks!



  • OK, this is rather interesting. I have avoided the dashboard … and in doing os => no longer php-fpm crashing (for close to 24 hours now), and no issues with other services. It seems that something on the dashboard is causing php-fpm to crash, and in the process taking down all sorts of other services. This may be why others aren't seeing this - if you're not leaving the dashboard open, it doesn't seem to happen. I will go back to the dashboard in a bit, see if the failure occurs again.

    BTW, unbound did restart overnight (it's running, no issue there) - is it supposed to restart regularly / on some schedule?

    Thanks!



  • The patch could be applied through systempatches package, after upgrade it will need to 'apply' again. (There is an checkbox to 'auto apply' them.)

    What widgets do you have on the dashboard? For one i see the 'thermal_sensors.widget.php' in your error log. Could you remove that from the dashboard and try a day or two without it?


  • Developer Netgate

    Yes - Determining which (if any) particular widget is causing the issue would be very helpful.

    There is a "dashboard update period" setting on the "General" setup page. Does setting that to a larger value cause the failure to happen less often?



  • Hi,

    I think this is the culprit (NUT widget),
    https://forum.pfsense.org/index.php?topic=111485.0

    Not sure why a bad widget should cause php-fpm to crash, but I was running ~ 4.5 hours, no issue at all (with this widget removed). Add it back in, and within 3 minutes unbound restarted. Could be a coincidence, but seems unlikely. I'll keep running now without this one, let's see.

    Thanks!



  • The forum thread linked only talks about some webgui 'display issues' when the package got (re-)introduced.. Nothing about it crashing any processes or php in the background, that i read in it anyhow. (or you only provided it for reference as to what package your talking about.?)

    Anyhow it could be that the NUT widget is the issue, once you know for sure the problem is gone without it..
    Can you trigger the problem manually by visiting nut config page and possibly refreshing it a lot / fast?
    What settings do you use in that package?



  • Will keep poking - but first I want to make sure it's stable / staying up now (with this removed) … agreed? Just want to make sure I have a reasonable baseline first.

    Thanks!



  • Yes agreed, first make sure it stays stable with NUT widget removed



  • Hi,

    One thing that just occurred - that may be normal / expected, but given the issues before it was breaking when running … is unbound supposed to restart every 2 hours? I see in the log it seems to be (at least this last time, exactly 2 hours).

    Thanks!



  • How long do your dhcp leases last?



  • 24 hours.



  • So far, so good - up 24 hours now, which is a recent record … ;). I do see some unbound restarts, but I'm guessing this is nothing new, I just wasn't looking for it before. Not sure why unbound is being restarted, but you can see it below. Leaving the system 24 hours more before messing with it.

    clog /var/log/resolver.log | grep unbound | grep stopped
    Jul 30 12:26:34 pfSense unbound: [7522:0] info: service stopped (unbound 1.6.3).
    Jul 30 16:52:10 pfSense unbound: [7522:0] info: service stopped (unbound 1.6.3).
    Jul 30 16:52:10 pfSense unbound: [7522:0] info: service stopped (unbound 1.6.3).
    Jul 30 18:52:10 pfSense unbound: [7522:0] info: service stopped (unbound 1.6.3).
    Jul 30 18:52:10 pfSense unbound: [7522:0] info: service stopped (unbound 1.6.3).
    Jul 31 04:04:17 pfSense unbound: [7522:0] info: service stopped (unbound 1.6.3).
    Jul 31 04:04:17 pfSense unbound: [7522:0] info: service stopped (unbound 1.6.3).
    Jul 31 06:04:17 pfSense unbound: [7522:0] info: service stopped (unbound 1.6.3).
    Jul 31 06:04:17 pfSense unbound: [7522:0] info: service stopped (unbound 1.6.3).
    

    Thanks!



  • OK, found the smoking gun, finally … :). And I was wrong - got sidetracked by a couple items happening at the same time, but I put things back together very slowly. The widget is OK, the end culprit is the driver for my USB NIC. It's the axge driver, for the ASIX AX88179 chipset ... which is on the FreeBSD compatibility list, but seems to have issues with pfSense. There is actually an open ticket I found for it, https://redmine.pfsense.org/issues/4494

    When I put this device back in place ... fire and ashes in < 15 min ... ;). I actually got a crash report (submitted that), and a spontaneous reboot. Removed the adapter, and things were smooth again. Not quite sure how to handle this one now, as it should be supported HW.

    Thanks for all the debug help and pointers!


  • Rebel Alliance Developer Netgate

    While USB NICs may be on the FreeBSD HCL and they may operate, none of them are known for their stability. They are best to be avoided at all costs. If you have a managed switch, setup and use VLANs instead of trying to rely on a USB NIC.



  • Agreed, and already on that … ;). Just struggling a bit to put the pieces together there (setup wise). I have found a lot of partial solutions / info, nothing in one place all together.

    Thanks!



  • FYI, some very good info here,
    https://blog.spirotot.com/2016/06/28/pfsense-vlans-with-one-nic-nuc-a-tp-link-tl-sg108e/

    Perhaps capture this somewhere, so it's tucked away for folks? Just to make it easy for others.

    Thanks!


Log in to reply