Peer to peer (shared key) OpenVpn issues



  • Configuration:

    Site A: Pfsense (2.1.3). OpenVpn: 2 servers (Road Warriors + peer to peer server)
    Site B: Pfsense (2.1.3). OpenVpn: 1 server, 1 client (Road Warriors + peer to peer client)

    Problem description
    The peer to peer link has been rock solid for years but since about a month ago I started having problems with Rsync backups over the link. I thought it was a NAS issue first, but after bringing in a third Synology NAS and doing some testing, the common failure factor was the OpenVpn link. I have seen in the firewall logs that traffic from LAN on site A to LAN on site B gets blocked by the firewall and I really don't understand why. I can see things in the OpenVpn logs I don't know how to interpret. What I do know is that this setup has worked for years but when I upgraded some time ago to a newer version the peer to peer stopped working. The Road Warriors seem to work fine though. Reading on the forum It seems like other people too have problem with peer to peer OpenVpn after upgrading to 2.1.x

    OpenVPN logs
    I get a lot of this in the OpenVpn logs on the server side, site A

    May 14 09:10:27 	openvpn[77088]: SIGTERM[hard,] received, process exiting
    May 14 09:10:27 	openvpn[77088]: /usr/local/sbin/ovpn-linkdown ovpns1 1500 1542 192.168.50.1 192.168.50.2 init
    May 14 09:10:27 	openvpn[77088]: event_wait : Interrupted system call (code=4)
    May 14 09:09:58 	openvpn[78619]: WARNING: 'ifconfig' is used inconsistently, local='ifconfig 192.168.100.1 192.168.100.2', remote='ifconfig 192.168.55.1 192.168.55.2'
    May 14 09:09:49 	openvpn[78619]: Initialization Sequence Completed
    May 14 09:09:49 	openvpn[78619]: Peer Connection Initiated with [AF_INET]79.136.1.226:9468
    May 14 09:09:48 	openvpn[78619]: UDPv4 link remote: [undef]
    May 14 09:09:48 	openvpn[78619]: UDPv4 link local (bound): [AF_INET]193.12.110.10:1193
    May 14 09:09:48 	openvpn[77371]: /usr/local/sbin/ovpn-linkup ovpns2 1500 1561 192.168.100.1 192.168.100.2 init
    May 14 09:09:48 	openvpn[77371]: /sbin/ifconfig ovpns2 192.168.100.1 192.168.100.2 mtu 1500 netmask 255.255.255.255 up
    May 14 09:09:48 	openvpn[77371]: do_ifconfig, tt->ipv6=1, tt->did_ifconfig_ipv6_setup=0
    May 14 09:09:48 	openvpn[77371]: TUN/TAP device /dev/tun2 opened
    May 14 09:09:48 	openvpn[77371]: TUN/TAP device ovpns2 exists previously, keep at program end
    May 14 09:09:48 	openvpn[77371]: NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
    May 14 09:09:48 	openvpn[77371]: OpenVPN 2.3.2 i386-portbld-freebsd8.3 [SSL (OpenSSL)] [LZO] [eurephia] [MH] [IPv6] built on Mar 27 2014
    May 14 09:09:48 	openvpn[77088]: Initialization Sequence Completed
    May 14 09:09:48 	openvpn[77088]: UDPv4 link remote: [undef]
    May 14 09:09:48 	openvpn[77088]: UDPv4 link local (bound): [AF_INET]193.12.110.10:1194
    May 14 09:09:48 	openvpn[73963]: /usr/local/sbin/ovpn-linkup ovpns1 1500 1542 192.168.50.1 192.168.50.2 init
    May 14 09:09:48 	openvpn[73963]: /sbin/ifconfig ovpns1 192.168.50.1 192.168.50.2 mtu 1500 netmask 255.255.255.255 up
    May 14 09:09:48 	openvpn[73963]: do_ifconfig, tt->ipv6=1, tt->did_ifconfig_ipv6_setup=0
    May 14 09:09:48 	openvpn[73963]: TUN/TAP device /dev/tun1 opened
    May 14 09:09:48 	openvpn[73963]: TUN/TAP device ovpns1 exists previously, keep at program end
    May 14 09:09:48 	openvpn[42314]: SIGTERM[hard,] received, process exiting
    May 14 09:09:48 	openvpn[42314]: /usr/local/sbin/ovpn-linkdown ovpns2 1500 1561 192.168.100.1 192.168.100.2 init
    May 14 09:09:48 	openvpn[42314]: event_wait : Interrupted system call (code=4)
    May 14 09:09:48 	openvpn[73963]: NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
    May 14 09:09:48 	openvpn[73963]: WARNING: using --duplicate-cn and --client-config-dir together is probably not what you want
    May 14 09:09:48 	openvpn[73963]: OpenVPN 2.3.2 i386-portbld-freebsd8.3 [SSL (OpenSSL)] [LZO] [eurephia] [MH] [IPv6] built on Mar 27 2014
    May 14 09:09:47 	openvpn[73193]: SIGTERM[hard,] received, process exiting
    May 14 09:09:47 	openvpn[73193]: /usr/local/sbin/ovpn-linkdown ovpns1 1500 1542 192.168.50.1 192.168.50.2 init
    May 14 09:09:47 	openvpn[73193]: event_wait : Interrupted system call (code=4)
    May 14 07:36:11 	openvpn[42314]: Peer Connection Initiated with [AF_INET]79.136.1.226:9468
    

    And on the client side B, I got this every two minutes:

    May 14 20:01:00 	openvpn[31791]: Inactivity timeout (--ping-restart), restarting
    May 14 20:00:00 	openvpn[31791]: UDPv4 link remote: [AF_INET]81.170.140.146:1198
    May 14 20:00:00 	openvpn[31791]: UDPv4 link local (bound): [AF_INET]79.136.1.226
    May 14 20:00:00 	openvpn[31791]: Preserving previous TUN/TAP instance: ovpnc4
    May 14 20:00:00 	openvpn[31791]: Re-using pre-shared static key
    May 14 20:00:00 	openvpn[31791]: NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
    May 14 19:59:58 	openvpn[31791]: SIGUSR1[soft,ping-restart] received, process restarting
    

    I don't know if this shows anything abnormal since I don't remember how the logs looked like when everything was fine (Since three weeks I send all logs to syslog servers to be able to do more research in the logs)

    Blocked traffic

    I see this in the firewall logs:

    2014-05-12	04:58:50	info	akwallitx	local0	pf	    192.168.149.5.873 > 192.168.145.10.59719: Flags [P.], cksum 0x9ebf (correct), ack 774471911, win 31934, options [nop,nop,TS val 71757384 ecr 53932868], length 86
    
    2014-05-12	04:58:50	info	akwallitx	local0	pf	00:00:03.511626 rule 5/0(match): block in on ovpns2: (tos 0x0, ttl 63, id 51993, offset 0, flags [DF], proto TCP (6), length 138)
    
    2014-05-12	04:58:47	info	akwallitx	local0	pf	    192.168.149.5.873 > 192.168.145.10.59719: Flags [P.], cksum 0xac77 (correct), ack 774471911, win 31934, options [nop,nop,TS val 71753872 ecr 53932868], length 86
    
    2014-05-12	04:58:47	info	akwallitx	local0	pf	00:00:01.756794 rule 5/0(match): block in on ovpns2: (tos 0x0, ttl 63, id 51992, offset 0, flags [DF], proto TCP (6), length 138)
    

    OpenVpn rules

    ID 	Proto 	Source 	Port 	Destination 	Port 	Gateway 	Queue 	Schedule 	Description 	
    
    	IPv4* 	 *   	* 	       * 	    * 	    *    	 none 	  
    

    I need to solve this, until then I have to use another site to site OpenVpn link using two Linux servers. Any help appreciated



  • May 14 09:09:58 openvpn[78619]: WARNING: 'ifconfig' is used inconsistently, local='ifconfig 192.168.100.1 192.168.100.2', remote='ifconfig 192.168.55.1 192.168.55.2'

    This error message is quite straight forward, isn't it?  ;)

    Apparently the IPs in the firewall log don't match any IP from the openVPN config as given in the VPN log…  :o



  • Even if the ifconfig was incorrect, this configuration has worked for years. (Since pfsense 1.2)

    After writing my post yesterday I did see just that and have actually fixed the local and remote config so they use the correct ranges, but the firewall still blocks traffic.

    After pushing X mbytes of traffic over the link….suddenly pfsense says NO: @6 block drop out log inet all label "Default deny rule IPv4"

    Why!?



  • Can you show a screenshot of the firewall log in the GUI when traffic is blocked?



  • The traffic that's getting blocked is a symptom of the VPN timing out and reconnecting. The question is why is it doing that. My first guess, your WAN gateway monitor IP isn't replying reliably, it's going into and out of alarm on occasion, which will restart the VPN when it comes out of alarm.



  • Hi, That seems reasonable.

    I have been on the wrong track here thinking it was a firewall issue. A did see in a thread in the Firewall section that "chemlud" had a similar problem with blocked valid traffic on his OpenVpn interface but it is was FA blocks. I guess this is what I see here.

    So the issue here is that one of my WAN connections are unstable causing the VPN-connection to go down? Or at least the GW monitoring detects that that the WAN GW is down and restarts the VPN connection (and other connections). I should see this in the syslog right?

    Yesterday, I did setup a Site to site OpenVpn connection using two Linux servers with static routes through the pfsense boxes, and actually I got the same symptoms. Worked for a short while and then the link died + a lot of valid packets was blocked on the LAN interface this time. Adding that info, it makes sense that the WAN connection is the root cause of the problem.

    I have a symmetric 8/8 mbit SDSL connection on one end and a 100/10 fibre connection in the other, one of them must be Fubar.

    After having 4-5 hours of sleep three nights in a row because of trubleshhoting this issue my brain doesn't function as it should. Do yoy have any hints for troubleshooting further, except looking in the syslog?
    Any additional logging that I should enable? I send all logs to syslog servers so log space is not an issue.



  • If it's the gateway: Status -> System Logs -> Gateways and see if you can match the time when the VPN goes down to an apinger alarm…

    I had such an alarm yesterday when the firewall went crazy on the FA packages...

    This morning I changed the server to ping to for gateway monitoring (the address I used was apparently causing troubles from time to time and the gateway of the PPPoE side of the tunnel is not responding to pings...).

    Hope now the tunnel is more stable...



  • OK, I think I nailed it this time. It pretty much confirms the theory, right?

    System->Gateway log:

    May 17 03:48:18 	apinger: alarm canceled: WANGW(193.12.110.9) *** delay ***
    May 17 03:48:01 	apinger: ALARM: WANGW(193.12.110.9) *** delay ***
    

    System->General log:

    May 17 03:49:02 	php: rc.start_packages: Restarting/Starting all packages.
    May 17 03:49:02 	php: rc.start_packages: Restarting/Starting all packages.
    May 17 03:48:57 	php: rc.filter_configure_sync: Not installing NAT reflection rules for a port range > 500
    May 17 03:48:57 	php: rc.filter_configure_sync: Not installing NAT reflection rules for a port range > 500
    May 17 03:48:52 	check_reload_status: Starting packages
    May 17 03:48:52 	php: rc.newwanip: pfSense package system has detected an ip change -> 192.168.101.2 ... Restarting packages.
    May 17 03:48:52 	check_reload_status: Starting packages
    May 17 03:48:52 	php: rc.newwanip: pfSense package system has detected an ip change -> 192.168.50.1 ... Restarting packages.
    May 17 03:48:52 	php: rc.newwanip: rc.newwanip: on (IP address: 192.168.101.2) (interface: []) (real interface: ovpnc2).
    May 17 03:48:52 	php: rc.newwanip: rc.newwanip: on (IP address: 192.168.50.1) (interface: []) (real interface: ovpns1).
    May 17 03:48:52 	php: rc.newwanip: rc.newwanip: Informational is starting ovpnc2.
    May 17 03:48:52 	php: rc.newwanip: rc.newwanip: Informational is starting ovpns1.
    May 17 03:48:44 	php: rc.start_packages: Restarting/Starting all packages.
    May 17 03:48:43 	check_reload_status: rc.newwanip starting ovpnc2
    May 17 03:48:43 	kernel: ovpnc2: link state changed to UP
    May 17 03:48:43 	php: rc.start_packages: Restarting/Starting all packages.
    May 17 03:48:42 	check_reload_status: rc.newwanip starting ovpns1
    May 17 03:48:42 	kernel: ovpnc2: link state changed to DOWN
    May 17 03:48:42 	kernel: ovpns1: link state changed to UP
    May 17 03:48:40 	php: rc.openvpn: OpenVPN: Resync client2
    May 17 03:48:40 	kernel: ovpns1: link state changed to DOWN
    May 17 03:48:37 	php: rc.openvpn: OpenVPN: Resync server1 RoadWarriors
    May 17 03:48:37 	php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
    May 17 03:48:32 	check_reload_status: Starting packages
    May 17 03:48:32 	php: rc.newwanip: pfSense package system has detected an ip change -> 192.168.101.2 ... Restarting packages.
    May 17 03:48:32 	php: rc.newwanip: rc.newwanip: on (IP address: 192.168.101.2) (interface: []) (real interface: ovpnc2).
    May 17 03:48:32 	php: rc.newwanip: rc.newwanip: Informational is starting ovpnc2.
    May 17 03:48:30 	check_reload_status: Starting packages
    May 17 03:48:30 	php: rc.newwanip: pfSense package system has detected an ip change -> 192.168.50.1 ... Restarting packages.
    May 17 03:48:30 	php: rc.newwanip: rc.newwanip: on (IP address: 192.168.50.1) (interface: []) (real interface: ovpns1).
    May 17 03:48:30 	php: rc.newwanip: rc.newwanip: Informational is starting ovpns1.
    May 17 03:48:28 	check_reload_status: Restarting OpenVPN tunnels/interfaces
    May 17 03:48:28 	check_reload_status: Restarting ipsec tunnels
    May 17 03:48:28 	check_reload_status: updating dyndns WANGW
    May 17 03:48:24 	php: rc.filter_configure_sync: Not installing NAT reflection rules for a port range > 500
    May 17 03:48:23 	php: rc.filter_configure_sync: Not installing NAT reflection rules for a port range > 500
    May 17 03:48:23 	check_reload_status: rc.newwanip starting ovpnc2
    May 17 03:48:23 	kernel: ovpnc2: link state changed to UP
    May 17 03:48:22 	check_reload_status: rc.newwanip starting ovpns1
    May 17 03:48:22 	kernel: ovpns1: link state changed to UP
    May 17 03:48:22 	kernel: ovpnc2: link state changed to DOWN
    May 17 03:48:21 	php: rc.openvpn: OpenVPN: Resync client2
    May 17 03:48:21 	check_reload_status: Reloading filter
    May 17 03:48:21 	kernel: ovpns1: link state changed to DOWN
    May 17 03:48:20 	php: rc.openvpn: OpenVPN: Resync server1 RoadWarriors
    May 17 03:48:20 	php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
    May 17 03:48:11 	check_reload_status: Reloading filter
    May 17 03:48:11 	check_reload_status: Restarting OpenVPN tunnels/interfaces
    May 17 03:48:11 	check_reload_status: Restarting ipsec tunnels
    May 17 03:48:11 	check_reload_status: updating dyndns WANGW
    

    And it correlates to when Rsync fails (after a timeout of 15 minutes). I see the same thing happen again at 04:48 and 08:58. I have submitted this error to our ISP so I'll get back when they have looked in to it. I have also used Googles public DNS server, 8.8.8.8, to monitor the GW and I get the same behaviour….



  • Where did I write that a I changed to 8.8.8.8?  :o …you can see my config online?  ;D

    Maybe the NSA doesn't like openVPN tunnels...  ;)

    btw: If you are running snort, have an eye on that funny little beast, too. It took me some weeks to figure out, that under heavy load some P2P rules kill tunnels (IPsec or openVPN does not matter)... ;D



  • ;D Google…
    Update:
    My ISP does not see anything special on their side so I did changed the parameters on the GW monitor to,Probe Interval: 10and```
    Down: 60

    
    With this setting the GW Alarm does not trigger and the VPN connection and Backups run fine.
    
    However, I think I have found a bug related to OpenVPN and GW monitoring: Can anyone try this:
    
    You must have a OpenVpn tunnel with PSK.
    
    1\. Print the routes in pfsense, Diagnostics->Routes and notice the route to your LAN network on other side of OpenVpn PSK tunnel
    2\. In Systems->Routing: edit default GW and change monitor IP (you can use Google's 8.8.8.8 ), save
    3\. Print the routes in pfsense, Diagnostics->Routes: The route to the other side of the OpenVpn tunnel is gone
    4\. Even a reeboot won't help
    5\. In VPN->OpenVpn: edit the PSK tunnel, don't change anything, just save.
    6.  Print the routes in pfsense, Diagnostics->Routes: the route is back


  • @coolnicklas:

    My ISP does not see anything special on their side so I did changed the parameters on the GW monitor to,Probe Interval: 10and```
    Down: 60

    I don't think this has anything to do with the monitoring.

    From my experience this happens when you saturate (max out) the bandwidth of the VPN connection.

    I get the same problem when I initiate a file transfer in Remote Desktop, which works ok until the bandwidth goes up to maximum.

    You can try throttling the bandwidth of RSYNC to 50% of the max capacity of the WAN link it uses to see if that prevents it.  You might want to play around with that speed to see where the threshold is.

    It looks like this is definitely a bug because there are several other threads for the same root issue; all of which characterized by the same thing:
    event_wait : Interrupted system call (code=4)

    See these threads:
    https://forum.pfsense.org/index.php?topic=75989.0
    https://forum.pfsense.org/index.php?topic=76735.0
    https://forum.pfsense.org/index.php?topic=77169.0
    and possibly
    https://forum.pfsense.org/index.php?topic=77214.0

    -nb



  • This seems to be happening to me too.  I just switched from Untangle and have been experiencing problems with the pfSense OpenVPN staying stable at full bandwidth.  I limited my Rsync bandwidth and it is stable now….I hope they find a fix for this bug soon.



  • The "detected IP change" part is the root of the issue. 2.1.4 (should be released this week) has a fix for that. Under maximum upload, your latency increases enough to put the gateway into alarm, and when it comes out of alarm it incorrectly thinks your WAN IP has changed (if it actually did, it would have to restart the OpenVPN instance, which is why it's doing that).

    Two things.

    1. a one-letter fix in /etc/rc.newwanip should be enough to take care of the "detected IP change" issue. Find curnwanip in that file and remove the n, so that line shows:
      if (is_ipaddr($curwanip))
    2. under the Gateway advanced properties (System>Routing, edit gateway), if your connection's latency under load goes up enough to trigger an alarm, increase the latency thresholds so normal high latency when your connection is under heavy load won't trigger an alarm.


  • Thank you for the suggestion.  I have increased the latency thresholds to 500/1000 but I don't seem to have any typos of curwanip in that file.  I'm on 2.1.2.  Will do some testing today.  Thanks!



  • Adjusting the threshold seems to have fixed it.  Thank you!


Log in to reply