No Site-to-Site VPN after upgrading CE from 2.6.0 to 2.7.0

michaelschefczyk

Dear All,

If we will remain unable to clarify this quickly, my aim is to roll back to 2.6.0. I did that on the secondary CARP-member in my hoe office: Install, recover config, unplug WAN on the fist boot. change update version back to 2.6.0, reboot again, plug in WAN and install packages manually. This does seem to work. I will try the primary unit this evening. Saturday, I will travel to the other end of the VPN (600 km) and try it there.

I did try to file a second bug report. Jim Pingle replied:

"Please do not open duplicate issues. Keep the discussion on the forum and if there is a proven bug and not a configuration issue, then the original can be reopened.

We cannot be responsible for making sure every possible variation of OpenVPN works across every version/upgrade, especially when OpenVPN itself changes and deprecates functions/features or changes how things work. Many users have working OpenVPN tunnels on 2.7.0 and current Plus versions that have been upgraded and working for years, it's highly unlikely to be a bug, but something in your setup that isn't correct or needs adjusted to compensate for OpenVPN changes. This is not the place to track that down, that is what the forum is for.

Be sure to post complete settings for all nodes involved, not just general description of the setup."

If anyone has better chances to involve the developers than me, help would be most welcome.

I will certainly not be glad to rebuild my > 40 MB configuration from scratch. I am also unable to post the configuration publicly, for obvious reasons.

Regards,

Michael

melik2k3

@michaelschefczyk did you try cold start of 2.7.0 box? My problem gone somehow after i poweroff/poweron.

jimp

As I mentioned on Redmine you most likely have a configuration problem that has always been wrong but some change on the backend changed and now your previously "working" settings which happened to be incorrect in some way stopped working.

A few common things we have seen are:

SSL/TLS setups where people had filled in a tunnel network on the client when they should not
SSL/TLS setups with a /24 tunnel network where the Client-Specific Overrides were not setup correctly breaking LAN-to-LAN routing
Static Key configurations using the wrong subnet size for the tunnel network (e.g. /24 when it should have been /30)
Not explicitly setting the same topology on both sides
Some other routing conflict preventing the correct entries from being in the tables
A configuration that worked by chance before that was never correct (e.g. routes in System > Routing instead of in OpenVPN natively)
Policy routing rules overriding the VPN and sending the client traffic in some unexpected path

Since you won't (or can't) post your settings, there isn't any way for us to really help you diagnose things, but it sounds like it's your routing/route table entries that are broken or missing. Either you are not in the correct mode (e.g. SSL/TLS with /24 tunnel network requires setting up override entries for client network routing, not static routes), or some other similar problem.

Compare your setup against the reference here: https://docs.netgate.com/pfsense/en/latest/recipes/openvpn-s2s-tls.html

If you were using a routing protocol like OSPF before, then you either had to have been using shared key, a /30 tunnel network, or maybe tap mode in certain cases.

michaelschefczyk

Dear All,

Please find the most relevant pages of my configuration below for comments. If other views are required, please let me know. Firewall rules as before are in place.

I think that this is in line with the tutorial. The tunnel does get established, but nothing else does work.

My assumption is, that the issues are either due to certificate stuff or to routing issues outside OpenVPN including LAGG.

Regards,

Michael

Server config

Screenshot 2023-07-06 at 18-07-05 pfsenses10m.schefczyk.net - VPN OpenVPN Servers Edit.png

Server override

Screenshot 2023-07-06 at 18-09-59 pfsenses10m.schefczyk.net - VPN OpenVPN Client Specific Overrides Edit.png

Client config

Screenshot 2023-07-06 at 18-12-29 pfsenseb72m.schefczyk.net - VPN OpenVPN Clients Edit.png

jimp

The tutorial was checked against 2.7.0 and 23.05, but there may be slight wording differences.

The "Automatically generate" box only showed up when you first create a tunnel, it won't show when editing.

I just re-followed the recipe a week or two ago and confirmed it all worked, so if it doesn't work for you, something isn't matched up or wasn't followed as shown.

jimp

At a glance what stands out is that the server is bound to localhost so maybe your port forward for that server isn't correct so the client can't reach it. Otherwise there isn't enough info to say why it might be failing (could be certs, for example)

Also with just the one client you probably don't want to list that client's own network as "local" to the server since that will make the client try to pull (and probably fail) to pull a route for its own network from the server.

Also you might try changing the TLS config so it's auth only and not auth+encryption.

If it still fails after all that, check the logs and see what it says on both sides.

michaelschefczyk

@jimp Thank you very much!

Binding to localhost is due to Mulit-WAN following this tutorial:

https://docs.netgate.com/pfsense/en/latest/multiwan/openvpn.html#bind-to-localhost-and-setup-port-forwards

The NAT port forwards and rules were there before the upgrade. If they would not work, I guess the tunnel would not come up - which it does reliably. I would very much like to keep that for extra resilience.

I never liked adding the remote network into the local network field on the server side. I never had that in in the past. This was due to this tutorial: https://docs.netgate.com/pfsense/en/latest/recipes/openvpn-s2s-tls.html

There it says "Enter the LAN subnets for all sites including the server: " under "IPv4 Local Network(s)". I did remove that.

I also changed from TLS Authentication and Encryption to just Authentication.

Unfortunately, that does not change the outcome.

Regards,

Michael

jimp

Having the remote networks in the "local" field lets the others know they can also be reached through the server, which is nice for >1 client but not needed for just one.

If it still won't form a link now you'll need to start looking at logs to see what is going on.

The server log should show a connection coming in from the client. If it doesn't, and the client process is running, then the client isn't reaching the server which could be DNS, your NAT/firewall rules, etc. The client logs should show what it's doing there.

Most other problems would show in the logs, too, like a key or cert mismatch and so on.

michaelschefczyk

@jimp The tunnel does connect without issues and it does stay up. The logs are similar to those further up in the thread.

By my understanding, this will likely be a routing issue.

Server log:

Jul 6 22:11:43 openvpn 45966 library versions: OpenSSL 1.1.1t-freebsd 7 Feb 2023, LZO 2.10
Jul 6 22:11:43 openvpn 45966 OpenVPN 2.6.4 amd64-portbld-freebsd14.0 [SSL (OpenSSL)] [LZO] [LZ4] [PKCS11] [MH/RECVDA] [AEAD] [DCO]
Jul 6 22:11:42 openvpn 98100 Initialization Sequence Completed
Jul 6 22:11:42 openvpn 98100 UDPv4 link remote: [AF_UNSPEC]
Jul 6 22:11:42 openvpn 98100 UDPv4 link local (bound): [AF_INET]127.0.0.1:1196
Jul 6 22:11:42 openvpn 98100 /usr/local/sbin/ovpn-linkup ovpns3 1500 0 192.168.18.1 255.255.255.0 init
Jul 6 22:11:42 openvpn 98100 /sbin/ifconfig ovpns3 192.168.18.1/24 mtu 1500 up
Jul 6 22:11:42 openvpn 98100 TUN/TAP device /dev/tun3 opened
Jul 6 22:11:42 openvpn 98100 TUN/TAP device ovpns3 exists previously, keep at program end
Jul 6 22:11:42 openvpn 98100 WARNING: experimental option --capath /var/etc/openvpn/server3/ca
Jul 6 22:11:42 openvpn 98100 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Jul 6 22:11:42 openvpn 98100 NOTE: your local LAN uses the extremely common subnet address 192.168.0.x or 192.168.1.x. Be aware that this might create routing conflicts if you connect to the VPN server from public locations such as internet cafes that use the same subnet.
Jul 6 22:11:42 openvpn 97789 DCO version: FreeBSD 14.0-CURRENT #1 RELENG_2_7_0-n255866-686c8d3c1f0: Wed Jun 28 04:21:19 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-CE-snapshots-2_7_0-main/obj/amd64/LwYAddCr/var/jenkins/workspace/pfSense-CE-snapshots-2_7_0-main/sources/FreeBSD-src-REL
Jul 6 22:11:42 openvpn 97789 library versions: OpenSSL 1.1.1t-freebsd 7 Feb 2023, LZO 2.10
Jul 6 22:11:42 openvpn 97789 OpenVPN 2.6.4 amd64-portbld-freebsd14.0 [SSL (OpenSSL)] [LZO] [LZ4] [PKCS11] [MH/RECVDA] [AEAD] [DCO]

Client Log:

Jul 6 22:12:59 openvpn 5948 [srv.xxx.xxx Peer Connection Initiated with [AF_INET]xx.xx.xx.xx:1196
Jul 6 22:12:59 openvpn 5948 Preserving previous TUN/TAP instance: ovpnc2
Jul 6 22:12:59 openvpn 5948 Initialization Sequence Completed
Jul 6 22:13:49 openvpn 22998 Server poll timeout, restarting
Jul 6 22:13:49 openvpn 22998 SIGUSR1[soft,server_poll] received, process restarting
Jul 6 22:13:49 openvpn 22998 NOTE: your local LAN uses the extremely common subnet address 192.168.0.x or 192.168.1.x. Be aware that this might create routing conflicts if you connect to the VPN server from public locations such as internet cafes that use the same subnet.
Jul 6 22:13:49 openvpn 22998 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts

jimp

OK so if the VPN is connected now that narrows things down a bit.

The route errors are probably the client failing to add unnecessary/duplicate routes but whether or not that's a problem depends on what the route table looks like in the end.

If the firewalls themselves can ping the other LANs then the OS routing is probably OK and there is more likely a problem in the local firewall rules/NAT.

There are a lot of troubleshooting suggestions for that sort of stuff at https://docs.netgate.com/pfsense/en/latest/troubleshooting/connectivity.html

But to boil that down a bit, you should check:

Look at the OS routing table on both sides, make sure there are entries for the opposite side LAN(s) and that those routes are pointing to the correct OpenVPN interface(s).
When you ping from the firewall make sure to ping from both the OpenVPN interface itself (default source) and again using the LAN interface as a source. That tests routing between the LANs in both directions, not just to/from the OpenVPN interface directly, which is a much different test.
When pinging from a client on the LAN, look at its states under Diagnostics > States on both firewalls, there should be two entries on each, one as it enters the firewall and one as it exits the firewall. If something like outbound NAT is catching it, the NAT would show in these states. If the traffic is taking the wrong path, that would also show (e.g. it should go in LAN, out VPN, in VPN, out LAN).

That should give you a better idea of what's going on and what needs fixed.

michaelschefczyk

@nazelus Does that imply manually rebuilding the configuration from scratch? Did you restore parts of the configuration from the config file? If so, which parts did you not restore (OpenVPN and NAT maybe)?

spittlbm

This post is deleted!

michaelschefczyk

@jimp At the moment, we have a situation with many users starting their configuration from scratch to avoid an undefined configuration "error". It should be possible to avoid this by comparing an old and a new configuration of someone who was successful. Would someone in that situation please consider this or would the developers offer support in that direction?

SeaMonkey

@jimp said in No Site-to-Site VPN after upgrading CE from 2.6.0 to 2.7.0:

A configuration that worked by chance before that was never correct (e.g. routes in System > Routing instead of in OpenVPN natively)

Just going to chime in to say this was my misconfiguration that worked in 2.6 and didn't work in 2.7. Thanks for the hints.

rcoleman-netgate

@SeaMonkey said in No Site-to-Site VPN after upgrading CE from 2.6.0 to 2.7.0:

Just going to chime in to say this was my misconfiguration that worked in 2.6 and didn't work in 2.7. Thanks for the hints.

What's the config, then? 3DES? https://docs.netgate.com/pfsense/en/latest/releases/2-7-0.html#general

SeaMonkey

@rcoleman-netgate

Mode: Peer to Peer ( SSL/TLS )
Data Ciphers: AES-256-GCM
Digest: SHA256
D-H Params: 2048 bits

edit To be more specific, DNS domain overrides were failing much more frequently in 2.7. Removed the redundant static routes and DNS resolution across the VPN was instantaneous whereas even while working in 2.6, it seemed to take several seconds.

jimp

And so far I haven't seen anyone that has followed my troubleshooting suggestions from earlier in the thread:

https://forum.netgate.com/post/1114468

michaelschefczyk

@jimp I have since rolled back to 2.6.0. I am willing to share config.xml files from before rolling back with netgate. My hardware is supermicro X10SDV-TLN4F which is probably what s in Netgate 1541 1U. For obvious reasons, I would not post such files in the forum.

matt84

I'm holding off upgrading to 2.7 from a working 2.6 config until this issue(s) is/are resolved. I've been following since the start and as a developer myself I was a little surprised the initial redmine ticket by @michaelschefczyk was closed so quickly. Clearly multiple people are having issues with site to site VPNs after the 2.7 upgrade.

Clearly something has changed. Even if everyone's issue(s) turns out to be a misconfiguration that somehow worked in 2.6 but no longer in 2.7, it would be good to know and have documented why this is no longer the case. Just like the PHP upgrade warning to uninstall packages prior to upgrade.

If people are willing to share their configs, is this something that can be run up in a dev/test environment by Netgate?

jimp

@matt84 said in No Site-to-Site VPN after upgrading CE from 2.6.0 to 2.7.0:

I'm holding off upgrading to 2.7 from a working 2.6 config until this issue(s) is/are resolved. I've been following since the start and as a developer myself I was a little surprised the initial redmine ticket by @michaelschefczyk was closed so quickly. Clearly multiple people are having issues with site to site VPNs after the 2.7 upgrade.

It was closed because there was no evidence it was a bug or anything we could determine programmatically, and there still isn't.

Clearly something has changed. Even if everyone's issue(s) turns out to be a misconfiguration that somehow worked in 2.6 but no longer in 2.7, it would be good to know and have documented why this is no longer the case. Just like the PHP upgrade warning to uninstall packages prior to upgrade.

So far no two people have had the same problem, but most people haven't given us enough detail to determine what their problems might be. People keep jumping into the thread saying they have the "same issue" when it most likely isn't, but trying to diagnose them all in one thread is not viable.

If people are willing to share their configs, is this something that can be run up in a dev/test environment by Netgate?

It depends on the complexity of the setup. We can lab some things but completely replicating someone's multi-site VPN infrastructure is more likely to have problems from the lab setup being wrong vs replicating the user's original problem.