Issue with 21.02 and not with 2.5.0
-
Good day,
I currently manage multiple sites with some SG-2220, a SG-5100, and a XG-7100.
The XG-7100 is my head office router, which manage everything. All others (2220 and 5100) connect through IPSEC to the 7100.
When the 21.02/2.5.0 got release, I tried to update my SG-5100. The update went fine, but the IPsec tunnel was dropping after a few minutes, and to have it back I had to either reboot or restart the IPSec service.
As it was not working, I reinstalled the 2.4.5p1, which was working perfectly, and left it like that.
Yesterday, as the 21.02p1/2.5.0p1 was release, I did try to update one of my SG-2220. Everything went fine. Tunnel is not dropping. I then update all 6 others that I manage. The night went perfect with no drop.This morning, I tried again the SG-5100. Still the same.. I even delete the tunnel on both device and recreate it with other name/description. No-go.
I decide to try a “custom” box that I have with 2.4.5p1 as a backup, updated it to 2.5.0 (as it’s not a netgate device), and restore the backup of my SG-5100.
I then plugged it at the place of my 5100. It ran all day, with no drop..
I think there may be something different between the 21.02 and the 2.5.0 (except the FE and the CE)..
But.. where to look.. what to do? I cant give any log as I rolled back on 2.4.5p1.
With that issue.. im not ready to update our core XG-7100Thanks!
Frank -
@froussy I have this EXACT same issue! We just put in a 5100 for a client and were pulling our hair out because we replaced a functioning sitution with everything working and here we are now with things all broken and in production. The tunnel stays up for a random period of time completely unrelated to the TTLS we have for phases and such. It works perfectly fine and then just stops, however, the tuennel and SAs stay up and rekey etc.. The logs show nothing of value from what we can see either. Basically it looks like a working configuration and it just isnt.
This tunnel is between a 5100 and 3100 and both of them running the 21.02 relese software.
-
I think I have the similar situation. After upgraded to version 21.02-RELEASE-p1, my Netgate XG-7100 kept getting disconnected to /from a remote Cisco RV042 even though the IPsec status showed as connected. The IPsec tunnel had been working smoothly for a year without any hiccup now decided to act up. There is no pattern when it would stop the connection, most of the time just within couple minutes. I applied the 7 patches as posted from [https://forum.netgate.com/topic/161523/pfsense-2-5-to-pfsense-2-5-ipsec-tunnel-fails-to-connect/3](link url) but still no luck. I am under the pressure to get it back on track but it looks like there is no other fix.
-
For me the problem starts when I have more that one IPSec tunnel.
If I enable only one, the connection works fine but as soon I enable a second tunnel problem with disconnects starts. -
I finally got my IPsec tunnel to work without interruption on my Netgate XG-7100 by turning of the Hardware Crypto. I think version 21.02 had something that made the IPsec become no longer able to catch up or sync with the Hardware Crypto. I believe it is a work around solution since the Hardware Crypto has to be off.
-
If you are using a 5100 or 7100 your best bet is to change the hardware crypto from AES-NI to QAT.
If you have to stay on AES-NI then you should check and see if any of your tunnels are using SHA256 and if so, change to a different hash (or ideally change over to AES-GCM without a hash).
You could disable hardware cryptography as well but that will obviously incur a performance penalty if you are pushing significant traffic over IPsec.
-
@jimp : Could you please explain why there is a problem with AES-NI and SHA256? I guess this is a very common setting for the SG5100 and heavily used as of today. I also failed to found the "QAT" setting in 2.4.5, so no chance to set this in advance?
-
@jimp Thanks for your advice! We have hit the very same bug and did cost me some hours since have been in evaluation of pfsense and therefor a million other reasons why it could have failed. We are using XG-7100-1U devices in the eval.
-
@lst_hoe said in Issue with 21.02 and not with 2.5.0:
@jimp : Could you please explain why there is a problem with AES-NI and SHA256? I guess this is a very common setting for the SG5100 and heavily used as of today. I also failed to found the "QAT" setting in 2.4.5, so no chance to set this in advance?
The newer AES-NI driver in 21.02/2.5.0 added support for accelerating hashing and that new feature appears to have some issues. It isn't consistently repeatable, though, so we haven't tracked down the exact cause yet.
-
It might be a different issue for our site. I have checked the logs from the failed upgrade and with 21.02_1 (SG5100) and imported working config we got for every IKE_SA_INIT request
no IKE config found for x.x.x.x...x.x.x.x, sending NO_PROPOSAL_CHOSEN
all the time, so no IPSEC tunnel was working at all. We have three P1 IKEv2 configs, one for Mobile Clients (mostly Windows) and two Site-to-Site tunnels. We have indeed AES-NI enabled and SHA-256 for all Phase 1, but we never seen any tunnel succeed at all?
I will try with the second SG5100 device and QAT, but this will take some time.
-
That is likely something else, you should start your own thread with more details, but first check the existing threads which list patches to try. If it can't match the connection there could be one of several solved issues at play with the IPsec config itself.
-
Already found
https://redmine.pfsense.org/issues/11442
https://redmine.pfsense.org/issues/11555The Mobile Clients use %any and the Site-to-Site FQDN as remote ID :-(
-
You can apply the commits referenced on those issues using the system patches package as mentioned in several other threads.
-
@jimp I swapped my 5100 over to QAT (with a reboot) and am still seeing random drops. Since the need here is non-critical I'm going to wait for the referenced patches to land in an update and re-test.
-
@jimp Maybe it helps since I saw you only mentioning Sha256: We had the described issue using aes 256 and sha1 on the xg7100 (refer to my above mentioned post).
-
@jimp Good day,
I did, on my 5100 change the AES-NI to QAT and it worked..
I will try our 7100 tonight!
Will keep updated
Frank
-
but changing or disabling hardware encryption ist not a real solution!
it is a workaround, not more -
@jd-0 Wanted to followup that 21.02.2 (as well as the RC over the last month) completely resolved my random drops issue. Many thanks!