Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    OpenVPN gateway monitoring shows offline until forced save

    Scheduled Pinned Locked Moved OpenVPN
    12 Posts 3 Posters 1.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Q
      q54e3w
      last edited by q54e3w

      I have a number of OpenVPN connections with gateways that use external DNS servers (4.2.2.1 > 4.2.2.5) to validate they are up and measure latency. When my pfSense box boots all the my VPN connections start and are accessible but the gateways report down until I open one and apply a "save" which refreshes them all correctly to up.
      Any ideas how to troubleshoot / resolve would be appreciated.

      EDIT: None of the 4.2.2.1 servers used are part of any DNS Resolver / Forwarder settings.

      1 Reply Last reply Reply Quote 0
      • jimpJ
        jimp Rebel Alliance Developer Netgate
        last edited by

        Do you see any traffic actually going to those IP addresses? In the states table, or packet captures?
        Are there dpinger instances running for those?

        Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        1 Reply Last reply Reply Quote 0
        • Q
          q54e3w
          last edited by

          Wen I packet capture against any of the VPN interfaces (ovpnc1 > 5) I see traffic flowing as would be expected from a up and valid VPN connection. All gateways report as down due to 100% loss.

          System General logs

          Oct 1 21:06:46	php-fpm	90819	/vpn_openvpn_client.php: Gateways status could not be determined, considering all as up/active. (Group: VPN_Group)
          Oct 1 21:06:43	php-fpm	91797	/vpn_openvpn_client.php: Gateways status could not be determined, considering all as up/active. (Group: VPN_Group)
          Oct 1 21:06:41	php-fpm	4253	/vpn_openvpn_server.php: Gateways status could not be determined, considering all as up/active. (Group: VPN_Group)
          Oct 1 21:05:15	php-fpm	342	/vpn_openvpn_client.php: Gateways status could not be determined, considering all as up/active. (Group: VPN_Group)
          Oct 1 21:05:13	php-fpm	90819	/vpn_openvpn_client.php: Gateways status could not be determined, considering all as up/active. (Group: VPN_Group)
          Oct 1 21:05:12	php-fpm	91797	/vpn_openvpn_server.php: Gateways status could not be determined, considering all as up/active. (Group: VPN_Group)
          Oct 1 21:04:18	php-fpm	4253	/index.php: Successful login for user 'admin' from: 192.168.20.101 (Local Database)
          

          dpinger logs

          Oct 1 21:02:48	dpinger		VPN1_WAN 4.2.2.1: Alarm latency 0us stddev 0us loss 100%
          Oct 1 21:02:48	dpinger		VPN3_WAN 4.2.2.3: Alarm latency 0us stddev 0us loss 100%
          Oct 1 21:02:48	dpinger		VPN2_WAN 4.2.2.2: Alarm latency 0us stddev 0us loss 100%
          Oct 1 21:02:48	dpinger		VPN4_WAN 4.2.2.4: Alarm latency 0us stddev 0us loss 100%
          Oct 1 21:02:46	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 75% dest_addr 4.2.2.4 bind_addr 10.18.xxx.xxx identifier "VPN4_WAN "
          Oct 1 21:02:46	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 75% dest_addr 4.2.2.1 bind_addr 10.9.xxx.xxx identifier "VPN1_WAN "
          Oct 1 21:02:46	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 192.168.200.1 bind_addr 192.168.200.1 identifier "RW_VPN "
          Oct 1 21:02:46	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 75% dest_addr 4.2.2.3 bind_addr 10.28.xxx.xxx identifier "VPN3_WAN "
          Oct 1 21:02:46	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 75% dest_addr 4.2.2.2 bind_addr 10.26.xxx.xx identifier "VPN2_WAN "
          Oct 1 21:02:46	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 90% dest_addr 76.91.0.1 bind_addr xx.xx.xx.xx identifier "WAN_DHCP "
          

          Not that I think it matters but these are configured into two failover groups as such

          0_1538453873171_gatewaygroups.jpg

          VPN states shows connections established

          0_1538453964851_vpn1_states.jpg

          and at the same time gateways all shown as down

          0_1538454077180_gateway_status.jpg

          and logs from when I edit a gateway, make no changes and save/apply... all gateway status return to up

          Oct 1 21:21:55	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 1000ms loss_alarm 80% dest_addr 8.8.4.4 bind_addr xxx.xxx.xx.xxx identifier "WAN3 "
          Oct 1 21:21:55	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 75% dest_addr 4.2.2.4 bind_addr xxx.xxx.xxx.xxx identifier "VPN4_WAN "
          Oct 1 21:21:55	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 75% dest_addr 4.2.2.1 bind_addr 1xx.xxx.xxx.xxx identifier "VPN1_WAN "
          Oct 1 21:21:55	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 192.168.200.1 bind_addr 192.168.200.1 identifier "RW_VPN "
          Oct 1 21:21:55	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 75% dest_addr 4.2.2.3 bind_addr 10.28.xxx.xxx identifier "VPN3_WAN "
          Oct 1 21:21:55	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 75% dest_addr 4.2.2.2 bind_addr 10.26.xxx.xxx identifier "VPN2_WAN "
          Oct 1 21:21:55	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 90% dest_addr 76.91.0.1 bind_addr xxx.xxx.xxx.xxx identifier "WAN_DHCP "
          

          0_1538454262036_gateways_up.jpg

          Ignore VPN5, my VPN provider allows 5 connections and from the reboot their servers havent processed the reboots disconnection correctly yet.

          What else could I help share to track this down?

          1 Reply Last reply Reply Quote 0
          • jimpJ
            jimp Rebel Alliance Developer Netgate
            last edited by

            Filter the states and the packet captures on icmp and/or the monitor IP address of one of the VPNs. Seeing traffic flowing doesn't mean anything to the gateway status. I'm curious if, for example, there is a state for 4.2.2.2 going out WAN and not a VPN, or going out the wrong VPN.

            Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

            Need help fast? Netgate Global Support!

            Do not Chat/PM for help!

            1 Reply Last reply Reply Quote 0
            • Q
              q54e3w
              last edited by

              thanks Jim

              0_1538494022192_states-filter-4221.jpg

              0_1538494133049_vpn1.jpg

              Traffic capture shows no traffic heading to 4.2.2.1 from the VPN1 gateway or any other WAN connection. I can however see all the 4.2.2.1 checks if I capture on lo0.

              0_1538494637243_lo0_cap.pcapng

              1 Reply Last reply Reply Quote 0
              • jimpJ
                jimp Rebel Alliance Developer Netgate
                last edited by

                That's a bit odd. It shouldn't be on localhost/lo0. Maybe it doesn't have a default route to start with?

                Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                1 Reply Last reply Reply Quote 0
                • Q
                  q54e3w
                  last edited by

                  yeah, I was surprised for sure. Soon as I hit save on any gateway they all revert to the correct state , this is what the states show post applying that "save"

                  0_1538495646928_post-save-gateways.jpg

                  1 Reply Last reply Reply Quote 0
                  • Q
                    q54e3w
                    last edited by

                    heres a pcap post saving, you can see the ICMP correctly....

                    0_1538495830645_post-save-gateway-traffic.pcapng

                    (please feel free to nuke this file after youve looked in case it contains any thing I shouldnt share in public. )

                    1 Reply Last reply Reply Quote 0
                    • Q
                      q54e3w
                      last edited by

                      Hi Jim, wanted to check if theres any actions I can do to produce a resolution? Should I create a bug report for this matter or is this a case of user error I should look further into?

                      1 Reply Last reply Reply Quote 0
                      • jimpJ
                        jimp Rebel Alliance Developer Netgate
                        last edited by

                        I still haven't managed to reproduce anything like it here locally. There must be some difference in the states or dpinger when it works vs when it doesn't. You might have to go to /status.php on the firewall and grab an archive from when it's broken and then again when it works, and then compare the contents for any references to the gateways in question.

                        Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                        Need help fast? Netgate Global Support!

                        Do not Chat/PM for help!

                        1 Reply Last reply Reply Quote 0
                        • Q
                          q54e3w
                          last edited by

                          thanks Jim - I'll do that. This is 100% reproducible for me on multiple systems so let me do that comparison and see what I find. I'll report back. thanks again.

                          1 Reply Last reply Reply Quote 0
                          • kesawiK
                            kesawi
                            last edited by

                            Did you manage to resolve it? I've been having the identical issue with my pfSense install for a while as well. Currently running version 2.4.4-RELEASE-p2.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.