No Site-to-Site VPN after upgrading CE from 2.6.0 to 2.7.0

michaelschefczyk

@jimp The tunnel does connect without issues and it does stay up. The logs are similar to those further up in the thread.

By my understanding, this will likely be a routing issue.

Server log:

Jul 6 22:11:43 openvpn 45966 library versions: OpenSSL 1.1.1t-freebsd 7 Feb 2023, LZO 2.10
Jul 6 22:11:43 openvpn 45966 OpenVPN 2.6.4 amd64-portbld-freebsd14.0 [SSL (OpenSSL)] [LZO] [LZ4] [PKCS11] [MH/RECVDA] [AEAD] [DCO]
Jul 6 22:11:42 openvpn 98100 Initialization Sequence Completed
Jul 6 22:11:42 openvpn 98100 UDPv4 link remote: [AF_UNSPEC]
Jul 6 22:11:42 openvpn 98100 UDPv4 link local (bound): [AF_INET]127.0.0.1:1196
Jul 6 22:11:42 openvpn 98100 /usr/local/sbin/ovpn-linkup ovpns3 1500 0 192.168.18.1 255.255.255.0 init
Jul 6 22:11:42 openvpn 98100 /sbin/ifconfig ovpns3 192.168.18.1/24 mtu 1500 up
Jul 6 22:11:42 openvpn 98100 TUN/TAP device /dev/tun3 opened
Jul 6 22:11:42 openvpn 98100 TUN/TAP device ovpns3 exists previously, keep at program end
Jul 6 22:11:42 openvpn 98100 WARNING: experimental option --capath /var/etc/openvpn/server3/ca
Jul 6 22:11:42 openvpn 98100 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Jul 6 22:11:42 openvpn 98100 NOTE: your local LAN uses the extremely common subnet address 192.168.0.x or 192.168.1.x. Be aware that this might create routing conflicts if you connect to the VPN server from public locations such as internet cafes that use the same subnet.
Jul 6 22:11:42 openvpn 97789 DCO version: FreeBSD 14.0-CURRENT #1 RELENG_2_7_0-n255866-686c8d3c1f0: Wed Jun 28 04:21:19 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-CE-snapshots-2_7_0-main/obj/amd64/LwYAddCr/var/jenkins/workspace/pfSense-CE-snapshots-2_7_0-main/sources/FreeBSD-src-REL
Jul 6 22:11:42 openvpn 97789 library versions: OpenSSL 1.1.1t-freebsd 7 Feb 2023, LZO 2.10
Jul 6 22:11:42 openvpn 97789 OpenVPN 2.6.4 amd64-portbld-freebsd14.0 [SSL (OpenSSL)] [LZO] [LZ4] [PKCS11] [MH/RECVDA] [AEAD] [DCO]

Client Log:

Jul 6 22:12:59 openvpn 5948 [srv.xxx.xxx Peer Connection Initiated with [AF_INET]xx.xx.xx.xx:1196
Jul 6 22:12:59 openvpn 5948 Preserving previous TUN/TAP instance: ovpnc2
Jul 6 22:12:59 openvpn 5948 Initialization Sequence Completed
Jul 6 22:13:49 openvpn 22998 Server poll timeout, restarting
Jul 6 22:13:49 openvpn 22998 SIGUSR1[soft,server_poll] received, process restarting
Jul 6 22:13:49 openvpn 22998 NOTE: your local LAN uses the extremely common subnet address 192.168.0.x or 192.168.1.x. Be aware that this might create routing conflicts if you connect to the VPN server from public locations such as internet cafes that use the same subnet.
Jul 6 22:13:49 openvpn 22998 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts

jimp

OK so if the VPN is connected now that narrows things down a bit.

The route errors are probably the client failing to add unnecessary/duplicate routes but whether or not that's a problem depends on what the route table looks like in the end.

If the firewalls themselves can ping the other LANs then the OS routing is probably OK and there is more likely a problem in the local firewall rules/NAT.

There are a lot of troubleshooting suggestions for that sort of stuff at https://docs.netgate.com/pfsense/en/latest/troubleshooting/connectivity.html

But to boil that down a bit, you should check:

Look at the OS routing table on both sides, make sure there are entries for the opposite side LAN(s) and that those routes are pointing to the correct OpenVPN interface(s).
When you ping from the firewall make sure to ping from both the OpenVPN interface itself (default source) and again using the LAN interface as a source. That tests routing between the LANs in both directions, not just to/from the OpenVPN interface directly, which is a much different test.
When pinging from a client on the LAN, look at its states under Diagnostics > States on both firewalls, there should be two entries on each, one as it enters the firewall and one as it exits the firewall. If something like outbound NAT is catching it, the NAT would show in these states. If the traffic is taking the wrong path, that would also show (e.g. it should go in LAN, out VPN, in VPN, out LAN).

That should give you a better idea of what's going on and what needs fixed.

michaelschefczyk

@nazelus Does that imply manually rebuilding the configuration from scratch? Did you restore parts of the configuration from the config file? If so, which parts did you not restore (OpenVPN and NAT maybe)?

spittlbm

This post is deleted!

michaelschefczyk

@jimp At the moment, we have a situation with many users starting their configuration from scratch to avoid an undefined configuration "error". It should be possible to avoid this by comparing an old and a new configuration of someone who was successful. Would someone in that situation please consider this or would the developers offer support in that direction?

SeaMonkey

@jimp said in No Site-to-Site VPN after upgrading CE from 2.6.0 to 2.7.0:

A configuration that worked by chance before that was never correct (e.g. routes in System > Routing instead of in OpenVPN natively)

Just going to chime in to say this was my misconfiguration that worked in 2.6 and didn't work in 2.7. Thanks for the hints.

rcoleman-netgate

@SeaMonkey said in No Site-to-Site VPN after upgrading CE from 2.6.0 to 2.7.0:

Just going to chime in to say this was my misconfiguration that worked in 2.6 and didn't work in 2.7. Thanks for the hints.

What's the config, then? 3DES? https://docs.netgate.com/pfsense/en/latest/releases/2-7-0.html#general

SeaMonkey

@rcoleman-netgate

Mode: Peer to Peer ( SSL/TLS )
Data Ciphers: AES-256-GCM
Digest: SHA256
D-H Params: 2048 bits

edit To be more specific, DNS domain overrides were failing much more frequently in 2.7. Removed the redundant static routes and DNS resolution across the VPN was instantaneous whereas even while working in 2.6, it seemed to take several seconds.

jimp

And so far I haven't seen anyone that has followed my troubleshooting suggestions from earlier in the thread:

https://forum.netgate.com/post/1114468

michaelschefczyk

@jimp I have since rolled back to 2.6.0. I am willing to share config.xml files from before rolling back with netgate. My hardware is supermicro X10SDV-TLN4F which is probably what s in Netgate 1541 1U. For obvious reasons, I would not post such files in the forum.

matt84

I'm holding off upgrading to 2.7 from a working 2.6 config until this issue(s) is/are resolved. I've been following since the start and as a developer myself I was a little surprised the initial redmine ticket by @michaelschefczyk was closed so quickly. Clearly multiple people are having issues with site to site VPNs after the 2.7 upgrade.

Clearly something has changed. Even if everyone's issue(s) turns out to be a misconfiguration that somehow worked in 2.6 but no longer in 2.7, it would be good to know and have documented why this is no longer the case. Just like the PHP upgrade warning to uninstall packages prior to upgrade.

If people are willing to share their configs, is this something that can be run up in a dev/test environment by Netgate?

jimp

@matt84 said in No Site-to-Site VPN after upgrading CE from 2.6.0 to 2.7.0:

I'm holding off upgrading to 2.7 from a working 2.6 config until this issue(s) is/are resolved. I've been following since the start and as a developer myself I was a little surprised the initial redmine ticket by @michaelschefczyk was closed so quickly. Clearly multiple people are having issues with site to site VPNs after the 2.7 upgrade.

It was closed because there was no evidence it was a bug or anything we could determine programmatically, and there still isn't.

Clearly something has changed. Even if everyone's issue(s) turns out to be a misconfiguration that somehow worked in 2.6 but no longer in 2.7, it would be good to know and have documented why this is no longer the case. Just like the PHP upgrade warning to uninstall packages prior to upgrade.

So far no two people have had the same problem, but most people haven't given us enough detail to determine what their problems might be. People keep jumping into the thread saying they have the "same issue" when it most likely isn't, but trying to diagnose them all in one thread is not viable.

If people are willing to share their configs, is this something that can be run up in a dev/test environment by Netgate?

It depends on the complexity of the setup. We can lab some things but completely replicating someone's multi-site VPN infrastructure is more likely to have problems from the lab setup being wrong vs replicating the user's original problem.

jimp

I'm locking this thread for now because it really needs to be a separate thread for every different person here, and people keep lumping their issues together.

I'll try to fork off some of the different ones I can isolate, but feel free to start new threads separately if you choose.

Please keep these discussions separate and do not put your own diagnostic info in someone else's thread even if your symptoms sound similar.

@michaelschefczyk If you want to submit your configuration files to TAC, mention my name and this thread. TAC can't help you directly but they should be able to get the files to me privately.

I'll unlock this thread after I get things separated if I can.

jimp

OK, I split each different person's troubleshooting posts off into separate threads. It's likely I missed some or they're missing some context now, but having them separated will make following individual problems much less confusing.

Please keep posts in this thread relevant to OP's specific problem and keep meta discussion and separate problems in their own posts/threads and not here.

Thanks!

michaelschefczyk

@jimp I did submit four configuration files and a brief explanation under ticket number 1773311411. This took some time, because I wanted to be physically present at each side of the connection when changing and reinstating the configuration. Any feedback would be most welcome!

nazelus

@michaelschefczyk
I've update my Topic and my issue was solve, Please check it out if it can help you a bit.

jimp

I finally had some time to review the configurations you submitted and I found a number of configuration errors, some of which could combine to make it work on 2.6.x but fail on 2.7.x

There is a conflicting IPsec tunnel with a P2 for S10m LAN <-> B72m LAN
Client on B72m has OpenVPN tunnel network filled in and it shouldn't
S10m has route to client subnet (192.168.12.0/24) on ovpns3 (port 1196), and also a conflicting entry on the ovpns4 (port 1197) server, in addition to the correct entry on Server 5.
- You cannot have the same remote network on more than one server entry
- Whichever one starts first will end up with the route in the table, and no others!
There is a rule in the ruleset on S10m which can't be resolved
- "# destination address is empty. label "USER_RULE: IPsec S10-B72""
- In config.xml this references OPT8 which doesn't exist
- This is causing the VPN traffic to fall through to the next rule which has a gateway set, which will bypass the VPN
There is a similar broken rule on B72m "IPsec S10-B72"

So to fix it, you should:

Disable or delete the IPsec tunnel if you want to use OpenVPN, otherwise IPsec will be grabbing the traffic in its kernel policy and it won't ever reach OpenVPN.
- Might need to flush the SPD entries or reboot to ensure the policies are removed.
On S10m, Remove Remote Network 192.168.12.0/24 from Server 3 (S10-B72 CATV) and Server 4 (S10-B72 DSL)
- Edit/Save Server 5 (s2s) afterward to ensure it gets its route in the table properly.
Fix the broken firewall rules so they have the proper destination (Use B72 alias on S10, and use S10 alias on B72)

michaelschefczyk

@jimp Thank you very much! Due to limited reachability of the other end during the summer holiday period, I will try this in the second half of August! Michael

pki79

Hi.
I had a similar problem. It started after I upgraded to 2.7.0.
Several OpenVPN Peer to Peer connections with Shared Keys stopped working. SSL/TLS were still operational.

After collecting all informations i found out:

the tunnel connections are functional, but i could not communicate from the Servers side (where the OpenVPN Server is) LAN.
the clients are on pfSense 2.3.4 most (because of older hardware)
i could reach the clients LAN from the pfSense Server shell
because of multi WAN the tunnels are bind to LAN

The solution was:

add firewall rules on LAN with source LAN NET and Destination the Client side LAN network and choose the Default Gateway under advanced.