Site-to-Site VPN after 2.6 upgrade stop working

SteveITS

@sidekickgmbh said in Site-to-Site VPN after 2.6 upgrade stop working:

Yes, but I could not directly discover anything that could lead to the problem

If you install the System Patches package there's a patch "Fix Captive Portal handling of non-TCP traffic after login (Redmine #12834)" that affects UDP packets.

There's also a couple forum threads about captive portal blocking traffic if using limiters, though that doesn't sound like your issue.

SidekickGmbH

Hello everyone, interim result: The tunnels have not been running since yesterday (same symptoms). I'm now starting to downgrade the other side (then I won't have 2.6 on both sides of the tunnel) and hope that I can get everything stable again.

SidekickGmbH

@SteveITS i will try in the lab! Thanks for the tip!

NOCling

Set the hardware crypto right, because the 4100 uses QAT and not AES-NI, there was a problem with that.

After the change, reboot so everything loads correctly and QAT is active, AES-NI inactive. Is it working now?

SidekickGmbH

@nocling Nope. arhh

timboau 0

@sidekickgmbh
There is something very wrong.. attempting a fresh 2.6 importing only IPsec configs - this isn’t the only post about this. I’ve offered Netgate or anyone to have a look 2.5 really great - 2.6 upgrades with many IPsec configs horrible downgrading back to 2.5 stable

timboau 0

So after an afternoon of playing here is what I found.

Update 2.5 to 2.6 - breaks ipsec
Clean 2.6 install - restore backup - breaks ipsec
~~Clean 2.6 restore parts (interface,nat,rules,ipsec) - all ok!~~

What a piece of rubbish - running for a while and IPSEC tunnels now drop and won't reconnect

Currently, I can boot 2.6 the tunnels come up and work 100% fine. Then some will drop and just not reconnect - no config changes on either side. They connections drop well before the natural timeouts of the VPN

There are messages
Both sides seem to be attempting to connect but the 2.6 side doesn't reply to the 2.5 trying to connect.
ignoring acquire, connection attempt pending

timboau 0

IPSEC apparently auto-creates rules for incoming port 500 & 4500 traffic - is there any way where I can see these rules listed?

Interestingly I also run NAT 500&4500 for Ipsec internal VPN server - never had an issue with this before. Turned off NAT to the server and didnt seem to make any difference and never has been an issue previouslydidn't

SteveITS

@timboau-0 One can view the rules table:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/firewall.html#ruleset-failing-to-load

"The ruleset can also be verified from the console or Diagnostics > Command in the Shell Execute box by running:

pfctl -f /tmp/rules.debug
"

timboau 0

@steveits the remote subnets im having issues with are the table <vpn_networks> & table <negate_networks> on both devices.

The only rule relating to those tables appears to be scrub from any to <vpn_networks> max-mss 1300
scrub from <vpn_networks> to any max-mss 1300

There are the NAT inbound Redirects in place for Port 500 & 4500 for the internal IPSec server

There appears in place the VPNRules for passing traffic to/from the WAN upstream gateway for each side of the Ipsec connections

timboau 0

Ive had a play with this again over the Easter break - whats the rules around both sides trying to connect to each other at the same time? (under 2.5 it just worked either side brought up the link - in fact, it generally never dropped)
There are multiple entries about:
ignoring acquire, connection attempt pending
On this side, there is an incoming SA (unnamed): #8084 as a responder
there is also an initiator outbound SA (neither connect successfully) after a while they both seem to give up then one side manages to connect first and the link comes up. (this can a few minutes as they battle to connect)
I have this setup still up and running if anyone has any time for a look - doesn't take long for one of the tunnels to drop then not reconnect for a while.
All ipsec configs must be ok - firewall etc all ok as they eventually connect and work as expected they are just dropping really often and then not reconnecting as they did under 2.5

regnauld

Want to confirm we're seeing the exact same thing here - we've got a bunch of 2.4.x in production we just upgraded to 2.6.0, with quite a few tunnels going between them, and it's been running flawlessly for 2 years now. All are running virtually, and on the other side we've got a mix of netgate 2100s recently upgraded to 23.01.

The issue only happens between some 2.6.0s - we'd see things hang with both sides trying to initiate. In the logs: "ignoring acquire, connection attempt pending". We used nearly half a day debugging this, and the only way to get things to come up reliably (and so far, stay up), was to roll back one side to 2.4.4. Tunnels suddenly came up.