Loss of openVPN connectivity after idle period in multi site to site set up.
-
Hi there, I am having a rather irritating issue with loss of connectivity over one of my site to site open VPN tunnels when the tunnel is idle for a while. Our business has 4 pfsense boxes on 4 different subnets
192.168.1.0/24 (the main office and server)
192.168.2.0/24 (client 1) tunnel 10.0.2.0/24
192.168.3.0/24 (client 2) tunnel 10.0.3.0/24
192.168.4.0/24 (client 3) tunnel 10.0.4.0/24The server is set up so the clients can talk to all PCs on the server LAN and also each other via the server using multiple remote networks. My server settings for server 2 are as below (I have obviously removed my key). All three servers run identical set ups apart from the different subnets.
- <openvpn-server><vpnid>2</vpnid>
<mode>p2p_shared_key</mode>
<protocol>UDP</protocol>
<dev_mode>tun</dev_mode>
<ipaddr><interface>wan</interface>
<local_port>11003</local_port>
<custom_options><shared_key>xxxxxxx </shared_key>
<crypto>AES-128-CBC</crypto>
<digest>SHA1</digest>
<engine>none</engine>
<tunnel_network>10.0.3.0/24</tunnel_network>
<tunnel_networkv6><remote_network>192.168.3.0/24, 192.168.2.0/24, 192.168.4.0/24</remote_network>
<remote_networkv6><gwredir><local_network><local_networkv6><maxclients><compression>adaptive</compression>
<passtos><client2client><dynamic_ip><pool_enable>yes</pool_enable>
<topology>subnet</topology>
<serverbridge_dhcp><serverbridge_interface>none</serverbridge_interface>
<serverbridge_dhcp_start><serverbridge_dhcp_end><netbios_enable><netbios_ntype>0</netbios_ntype>
<netbios_scope><no_tun_ipv6>yes</no_tun_ipv6>
<verbosity_level>1</verbosity_level></netbios_scope></netbios_enable></serverbridge_dhcp_end></serverbridge_dhcp_start></serverbridge_dhcp></dynamic_ip></client2client></passtos></maxclients></local_networkv6></local_network></gwredir></remote_networkv6></tunnel_networkv6></custom_options></ipaddr></openvpn-server>
This set up has worked fine for months and still works OK most of the time, however over the last few weeks after a whileI loose all connectivity between client 2 and the rest of the network. I can't ping the computer connected to the pfsense box or even ping the pfsense box itself. However the open VPN tunnel shows as up from both ends in the status pages of both routers. I am pretty sure it is a routing issue as tracert shows the following.
Tracing route to 192.168.3.1 over a maximum of 30 hops
1 <1 ms <1 ms <1 ms pfSenseML.ShaweyecareML [192.168.2.1]
2 46 ms 38 ms 41 ms 10.0.2.1
3 38 ms 38 ms 38 ms 10.0.2.2
4 77 ms 78 ms 76 ms 10.0.2.1
5 76 ms 76 ms 76 ms 10.0.2.2
6 114 ms 114 ms 115 ms 10.0.2.1
7 120 ms 137 ms 115 ms 10.0.2.2
8 161 ms 167 ms 156 ms 10.0.2.1
9 153 ms 157 ms 154 ms 10.0.2.2
10 192 ms 192 ms 196 ms 10.0.2.1
11 196 ms 190 ms 195 ms 10.0.2.2
12 230 ms 229 ms 243 ms 10.0.2.1
13 229 ms 229 ms 230 ms 10.0.2.2
14 270 ms 276 ms 293 ms 10.0.2.1
15 290 ms 271 ms 272 ms 10.0.2.2
16 310 ms 338 ms 306 ms 10.0.2.1
17 306 ms 308 ms 342 ms 10.0.2.2
18 360 ms 362 ms 355 ms 10.0.2.1
19 345 ms 355 ms 345 ms 10.0.2.2
20 401 ms 386 ms 395 ms 10.0.2.1
21 388 ms 383 ms 383 ms 10.0.2.2
22 452 ms 423 ms 438 ms 10.0.2.1
23 450 ms 423 ms 429 ms 10.0.2.2
24 467 ms 480 ms 486 ms 10.0.2.1
25 459 ms 503 ms 460 ms 10.0.2.2
26 512 ms 522 ms 529 ms 10.0.2.1
27 509 ms 519 ms 508 ms 10.0.2.2
28 544 ms 538 ms 580 ms 10.0.2.1
29 556 ms 553 ms 556 ms 10.0.2.2
30 583 ms 588 ms 599 ms 10.0.2.1Trace complete.
It seems to be getting stuck in the tunnel of client 1 (10.0.2.0/24) rather than passing down its own tunnel (10.0.3.0/24) so never reaches the pfsense box at the other end. However if I reboot the server it is fine again for a while. The other two clients are generally fine. Client 1 has never had this problem, and client 3 has only done it once. Only a full reboot of the server pfsense box will fix it. Starting and stopping the open VPN server doesn't help, nor rebooting the client pfsense box. I rebooted on monday night and it was fine all day tuesday, but today (wednesday) exactly the same thing has happened. It has done it a couple of times before over the past 6/12 as well, but has been fine 90% of the time. no changes to the config have happened in all this time. I have set a cron job to reboot the server every morning to sort the problem temporarily, but this is a horrible solution, and I really want to get to the bottom of it.
Thanks for your help and if you need any more info just let me know.
-
I have confirmed that the issue is defintely linked to the multiple remote networks in the open vpn config as if I remove the additional remote networks and only have one subnet per vpn server it starts working again. The problem with this is the remote client networks then can't communicste with each other, only the server network. While this isn't critical, as I can remote desktop into the server lan and access the other subnets from there, it isn't very elegant.