IPsec Apply changes time out
I encountered problem to apply changes to my IPSec tunnels. GUI will show 504 time out.
We got 50 offices (phase1) and 100 tunnels (phase2) from central HQ. If I need to change something at the central pfsense GUI will show 504 time out and nothing really change after refresh.
Is there any possible solution to solve this problem (setup, tweak, tunning..)? I find out that it is the same with 2.45p1 CE and also 2.5 CE. I am tried 4 different hw setups and still the same. I add patches to 2.5 CE #11435, #11442, #11486, #11487, #11488, #11475, #11518, #11564, #11555 but the problem persist.
The only working solution is to change manually config.xml fine and apply.
I have the same issue with pfSense 21.02 in AWS. If I reload the page it finally applies (sometimes after several retries.) Have not had to modify the config.xml
@vergilis I tried many times with the same result (504 time out) so I have to find workable solution (manual config.xml change). Every try (new installation with the same IPSec tunnels) was the same.
If there are only P1 IPSec it saved and is ok, but if I add P2 tunnels it will slow the whole process till it cause 504 time out.
How many tunnels you have? If there
@richi44 I also have about 50 tunnels. I have no issues adding P1 or P2. The timeout only happens when I hit apply.
Do you see errors in the system when this happens?
What hardware are you running on?
@stephenw10 Multiple instance types in AWS. Everything works fine, just the Apply function does not work.
The only error in my log is the time out error:
2021/02/18 13:38:18 [error] 51390#100109: *64336 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 188.8.131.52, server: , request: "POST /vpn_ipsec_settings.php HTTP/2.0", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "fwname.example.com:1234", referrer: "https://fwname.example.com:1234/vpn_ipsec_settings.php"
Hmm, that's really the only error shown? That's in the system log?
@stephenw10 Correct. The only error in the log is the one specified from the system log - for me.
@stephenw10 I tried it on different hw setups but our main pfsense router runs on virtual machine (4cores of cpu-xeon e-2236@3,4Ghz, 12gb ram).
Mar 17 20:24:04 nginx 2021/03/17 20:24:04 [error] 65880#100222: *1386 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 192.168.211.3, server: , request: "POST /status_services.php HTTP/2.0", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "192.168.211.1:8443", referrer: "https://192.168.211.1:8443/status_services.php"
Mar 17 20:32:29 kernel module_register_init: MOD_LOAD (vesa, 0xffffffff8140c3e0, 0) error 19
Mar 17 20:32:29 kernel module_register_init: MOD_LOAD (iwi_monitor_fw, 0xffffffff80765790, 0) error 1
Mar 17 20:32:29 kernel module_register_init: MOD_LOAD (iwi_ibss_fw, 0xffffffff807656e0, 0) error 1
Mar 17 20:32:29 kernel module_register_init: MOD_LOAD (iwi_bss_fw, 0xffffffff80765630, 0) error 1
Mar 17 20:32:29 kernel module_register_init: MOD_LOAD (ipw_monitor_fw, 0xffffffff8073dda0, 0) error 1
Mar 17 20:32:29 kernel module_register_init: MOD_LOAD (ipw_ibss_fw, 0xffffffff8073dcf0, 0) error 1
Mar 17 20:32:29 kernel module_register_init: MOD_LOAD (ipw_bss_fw, 0xffffffff8073dc40, 0) error 1
Those kernel module errors are unrelated and not a cause for concern.
Unclear why that timeout happens yet.
Kernel erros could relate to virtualisation on Proxmox.
I tried to setup new router and time out problem does not occur if there were only few tunnels. After clean installation I was able to continually setup up to 50 P1 with 50 P2 but after reboot and apply changes the time out problem occurred.
Could it be related to nginx memory isssue?
It seems more likely it's failing to pull the data from vici/strongswan for some reason. nginx shows it is timing out waiting for that data as far as I can see.
Is there a specific number of tunnels that seems to trigger the issue?
Or is it perhaps hitting a connection number that is failing to parse?
The way connections are numbered was changed significantly in 2.5 to allow for VTI tunnels when a large number exists. https://redmine.pfsense.org/issues/9592
@stephenw10 I currently have 46 tunnels on the failing system.
ipsec statusallif you can. If that fails it would be interesting. If it doesn't look for a connection number that might be hitting a limit, con100000 maybe.
Does it fail with 45 tunnels?
@stephenw10 Yes. It fails to Apply with 45 tunnels all the time. ipsec statusall returns results.
Can you try a 2.5.1 RC snapshot and see if it's better there?
Sorry, I mean is there a specific number where it doesn't fail? Is it something that clear cut?
@jimp The following release is still exhibiting the issue:
built on Sat Mar 20 01:04:33 EDT 2021
The firs time it showed was when I added 33th tunnel.