Gateway Monitoring gets stuck in infinite loop when using multiple VTIs on SG-3100
-
/e For anybody stumbling across this: The root of the problem was a race condition in strongSwan. https://redmine.pfsense.org/issues/10176 which will be fixed in 2.5.0. My description and findings are all symptom rather than cause.
Hi everyone,
I'm going to describe a very delicate problem. I know that i sometimes write long, hard to understand sentences. Please do ask questions and i'll try and clarify as much as possible!
Setup:
- Cluster of two SG-3100 nodes in a data center
- Release 2.4.4-p3 (the most recent at the time of writing)
- A total of 6 VTI tunnels
- 2 between the SG-3100 cluster and other pfsense clusters
- 4 between the SG-3100 cluster and an AWS VPC (using AWS Site-to-Site VPN)
- One WAN uplink provided by data center
- No dynamic IPs involved (!)
Everything is working fine, as long as we either disable all AWS tunnels or keep all gateway monitoring actions (including those for WAN) switched off. When the gateway monitoring action is turned on and a tunnel is renegotiated, we usually end up with all or almost all tunnels being in a very weird, broken state after a while. The P1s and P2s show as "up", outgoing traffic can be seen using tcpdump on both the VTI interface and the WAN interface (ESP on WAN as expected), but there is never a reply. They sometimes also pass traffic for a few seconds before they just die again.
I believe that this is just the symptom, and i may have found the root cause, which is the gateway monitoring action getting stuck in an infinite loop. This is what the log looks like when the tunnels are messed up:(too big for this post, will put it in a reply)
The entries that stand out to me are
php-fpm 39952 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. [...] check_reload_status Restarting ipsec tunnels
The first one "Restarting ipsec tunnels" is triggered by this line in /etc/rc.gateway_alarm https://github.com/pfsense/pfsense/blob/RELENG_2_4_4/src/etc/rc.gateway_alarm#L40 (notice that there's no check whether any IPSec interface is actually affected by the alarm).
The second one stems from this line here https://github.com/pfsense/pfsense/blob/RELENG_2_4_4/src/etc/rc.newipsecdns#L61 and is actually just wrong, because there are no dynamic IPs at play, but given that there are no checks in this file either, it's logical that the line gets logged.The rest is guesswork, but my assumption is that when a tunnel is renegotiated, there is some packet loss on the corresponding VTI interface, and because of that, the gateway monitoring action is triggered. But the gateway monitoring action in turn causes all ipsec tunnels to be reloaded, and ends up triggering yet another gateway monitoring action, and we're already in a loop.
Unfortunately i was not (yet) able to reproduce this using two pfsense VMs (see also my observation below). I don't have spare SG-3100s lying around, so at this point it's hard for me to drill down any further. What i can say is this: once i remove line 40 from rc.gateway_alarm, everything works like a charm. I applied this patch yesterday, and now my tunnels go down and come back up one by one when they need to be renegotiated, as i would expect them to. My logs look like thisDec 18 07:54:44 check_reload_status Reloading filter Dec 18 07:54:44 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 18 07:54:44 check_reload_status updating dyndns VTI_IF2 Dec 18 07:54:44 rc.gateway_alarm 93436 >>> Gateway alarm: VTI_IF2 (Addr:10.77.1.17 Alarm:0 RTT:388.434ms RTTsd:790.308ms Loss:0%) Dec 18 07:54:01 check_reload_status Reloading filter Dec 18 07:54:01 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 18 07:54:01 check_reload_status updating dyndns VTI_IF2 Dec 18 07:54:01 rc.gateway_alarm 99143 >>> Gateway alarm: VTI_IF2 (Addr:10.77.1.17 Alarm:1 RTT:503.702ms RTTsd:942.678ms Loss:0%) Dec 18 07:49:35 check_reload_status Reloading filter Dec 18 07:49:35 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 18 07:49:35 check_reload_status updating dyndns VTI_IF3 Dec 18 07:49:35 rc.gateway_alarm 52193 >>> Gateway alarm: VTI_IF3 (Addr:169.254.251.193 Alarm:0 RTT:4.489ms RTTsd:.094ms Loss:65%) Dec 18 07:47:18 check_reload_status Reloading filter Dec 18 07:47:18 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 18 07:47:18 check_reload_status updating dyndns VTI_IF3 Dec 18 07:47:18 rc.gateway_alarm 864 >>> Gateway alarm: VTI_IF3 (Addr:169.254.251.193 Alarm:1 RTT:4.499ms RTTsd:.132ms Loss:81%) Dec 18 07:45:43 check_reload_status Reloading filter Dec 18 07:45:43 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 18 07:45:43 check_reload_status updating dyndns VTI_IF4 Dec 18 07:45:43 rc.gateway_alarm 6448 >>> Gateway alarm: VTI_IF4 (Addr:169.254.117.21 Alarm:0 RTT:5.247ms RTTsd:.840ms Loss:65%) Dec 18 07:43:26 check_reload_status Reloading filter Dec 18 07:43:26 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 18 07:43:26 check_reload_status updating dyndns VTI_IF4 Dec 18 07:43:26 rc.gateway_alarm 14381 >>> Gateway alarm: VTI_IF4 (Addr:169.254.117.21 Alarm:1 RTT:4.889ms RTTsd:.096ms Loss:81%) Dec 18 07:41:55 check_reload_status Reloading filter Dec 18 07:41:55 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 18 07:41:55 check_reload_status updating dyndns VTI_IF5 Dec 18 07:41:55 rc.gateway_alarm 51415 >>> Gateway alarm: VTI_IF5 (Addr:169.254.181.145 Alarm:0 RTT:99.391ms RTTsd:.221ms Loss:65%) Dec 18 07:39:37 check_reload_status Reloading filter Dec 18 07:39:37 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 18 07:39:37 check_reload_status updating dyndns VTI_IF5 Dec 18 07:39:37 rc.gateway_alarm 55432 >>> Gateway alarm: VTI_IF5 (Addr:169.254.181.145 Alarm:1 RTT:99.451ms RTTsd:.530ms Loss:81%) Dec 18 07:37:43 check_reload_status Reloading filter Dec 18 07:37:43 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 18 07:37:43 check_reload_status updating dyndns VTI_IF6 Dec 18 07:37:43 rc.gateway_alarm 89501 >>> Gateway alarm: VTI_IF6 (Addr:169.254.95.249 Alarm:0 RTT:99.410ms RTTsd:.488ms Loss:65%) Dec 18 07:35:27 check_reload_status Reloading filter Dec 18 07:35:27 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 18 07:35:27 check_reload_status updating dyndns VTI_IF6 Dec 18 07:35:27 rc.gateway_alarm 9333 >>> Gateway alarm: VTI_IF6 (Addr:169.254.95.249 Alarm:1 RTT:99.213ms RTTsd:.256ms Loss:82%)
I understand that this is not a fix for everything, because people may actually need such reloads when dynamic IPs are in the mix. I don't know how this could be fixed in a more generic way, either. But i would appreciate if somebody could try and reproduce this issue. I think at the bottom of it all is that the newipsecdns script gets called to often, executes vpn_ipsec_configure() too often, and maybe even simultaneously. That function decides for no apparent reason that it needs to restart all IPSec tunnels, or so i assume at least, and those factors combined are a feedback loop of death.
Some more thoughts and observations:
- While i'm not familiar enough with PHP's lock mechanisms to be certain, the way locking / unlocking is done in /etc/rc.newipsecdns seems flawed to me. Why is $ipseclck_pending unlocked after a hardcoded 12 second sleep, but before the actual call to vpn_ipsec_configure()?
- Apparently AWS IPSec tunnels take quite a while to renegotiate and reestablish. High RTT might be a factor here. Such delays could certainly amplify the problem. As i mentioned it's super hard (if at all possible) to reproduce this with VMs. If somebody has a spare static IP and some $ to spend on an AWS site-to-site VPN for testing, please do!
- Disabling all gateway monitoring actions seems to alleviate the problem too, but the logical consequence is that users can only have Multi-WAN or VTI tunnels, but not both. Not ideal imo.
- Disabling the gateway monitoring action only on VTI Gateways is somewhat stable, but there are edge cases where the problem can still occur. We actually ran into one once a few days ago i believe, but it happened over night and i have no logs to be 100% sure. In any case, i'd rather not rely on prayers that there won't be an unfortunate chain of events.
-
These are the logs prior to removing the particular line, looping like this until i reboot the machine.
(Newest at the top)Dec 17 16:01:54 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:53 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:52 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:52 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:52 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:51 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:51 check_reload_status Reloading filter Dec 17 16:01:51 php-fpm 83099 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 16:01:35 check_reload_status Reloading filter Dec 17 16:01:35 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 16:01:35 check_reload_status Restarting ipsec tunnels Dec 17 16:01:35 check_reload_status updating dyndns VTI_IF3 Dec 17 16:01:35 rc.gateway_alarm 29921 >>> Gateway alarm: VTI_IF3 (Addr:169.254.251.193 Alarm:0 RTT:7.814ms RTTsd:8.138ms Loss:65%) Dec 17 16:01:29 php-fpm 17605 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:28 php-fpm 17605 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:28 php-fpm 17605 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:27 php-fpm 17605 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:27 php-fpm 17605 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:27 php-fpm 17605 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:27 check_reload_status Reloading filter Dec 17 16:01:27 php-fpm 17605 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 16:01:10 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:10 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:09 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:09 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:08 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:07 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:01:07 check_reload_status Reloading filter Dec 17 16:01:07 php-fpm 83099 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 16:01:06 check_reload_status Reloading filter Dec 17 16:01:06 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 16:01:06 check_reload_status Restarting ipsec tunnels Dec 17 16:01:06 check_reload_status updating dyndns VTI_IF6 Dec 17 16:01:06 rc.gateway_alarm 1205 >>> Gateway alarm: VTI_IF6 (Addr:169.254.95.249 Alarm:1 RTT:99.887ms RTTsd:1.669ms Loss:81%) Dec 17 16:00:51 check_reload_status Reloading filter Dec 17 16:00:51 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 16:00:51 check_reload_status Restarting ipsec tunnels Dec 17 16:00:51 check_reload_status updating dyndns VTI_IF1 Dec 17 16:00:51 rc.gateway_alarm 65102 >>> Gateway alarm: VTI_IF1 (Addr:10.77.1.21 Alarm:0 RTT:9.139ms RTTsd:19.443ms Loss:65%) Dec 17 16:00:45 php-fpm 33378 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:00:44 php-fpm 33378 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:00:43 php-fpm 33378 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:00:43 php-fpm 33378 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:00:43 php-fpm 33378 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:00:42 php-fpm 33378 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:00:42 check_reload_status Reloading filter Dec 17 16:00:42 php-fpm 33378 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 16:00:26 check_reload_status Reloading filter Dec 17 16:00:26 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 16:00:26 check_reload_status Restarting ipsec tunnels Dec 17 16:00:26 check_reload_status updating dyndns VTI_IF3 Dec 17 16:00:26 rc.gateway_alarm 23058 >>> Gateway alarm: VTI_IF3 (Addr:169.254.251.193 Alarm:1 RTT:22.541ms RTTsd:30.898ms Loss:81%) Dec 17 16:00:15 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:00:13 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:00:13 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:00:12 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:00:12 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:00:11 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 16:00:11 check_reload_status Reloading filter Dec 17 16:00:11 php-fpm 39952 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 15:59:55 check_reload_status Reloading filter Dec 17 15:59:55 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:59:55 check_reload_status Restarting ipsec tunnels Dec 17 15:59:55 check_reload_status updating dyndns VTI_IF5 Dec 17 15:59:55 rc.gateway_alarm 12247 >>> Gateway alarm: VTI_IF5 (Addr:169.254.181.145 Alarm:1 RTT:103.380ms RTTsd:12.050ms Loss:81%) Dec 17 15:59:31 php-fpm 72332 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:59:31 php-fpm 72332 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:59:30 php-fpm 72332 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:59:30 php-fpm 72332 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:59:29 php-fpm 72332 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:59:29 php-fpm 72332 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:59:29 check_reload_status Reloading filter Dec 17 15:59:29 php-fpm 72332 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 15:59:13 check_reload_status Reloading filter Dec 17 15:59:13 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:59:13 check_reload_status Restarting ipsec tunnels Dec 17 15:59:13 check_reload_status updating dyndns VTI_IF2 Dec 17 15:59:13 rc.gateway_alarm 75521 >>> Gateway alarm: VTI_IF2 (Addr:10.77.1.17 Alarm:1 RTT:16.586ms RTTsd:38.979ms Loss:81%) Dec 17 15:59:01 php-fpm 69402 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:59:00 php-fpm 69402 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:59:00 php-fpm 69402 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:58:59 php-fpm 69402 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:58:59 php-fpm 69402 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:58:59 php-fpm 69402 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:58:59 check_reload_status Reloading filter Dec 17 15:58:59 php-fpm 69402 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 15:58:43 check_reload_status Reloading filter Dec 17 15:58:43 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:58:43 check_reload_status Restarting ipsec tunnels Dec 17 15:58:43 check_reload_status updating dyndns VTI_IF5 Dec 17 15:58:43 rc.gateway_alarm 45517 >>> Gateway alarm: VTI_IF5 (Addr:169.254.181.145 Alarm:0 RTT:99.484ms RTTsd:.337ms Loss:66%) Dec 17 15:58:43 check_reload_status Reloading filter Dec 17 15:58:43 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:58:43 check_reload_status Restarting ipsec tunnels Dec 17 15:58:43 check_reload_status updating dyndns VTI_IF3 Dec 17 15:58:43 rc.gateway_alarm 43532 >>> Gateway alarm: VTI_IF3 (Addr:169.254.251.193 Alarm:0 RTT:6.814ms RTTsd:13.799ms Loss:65%) Dec 17 15:58:17 php-fpm 72332 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:58:17 php-fpm 72332 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:58:16 php-fpm 72332 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:58:15 php-fpm 72332 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:58:15 php-fpm 72332 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:58:15 php-fpm 72332 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:58:14 check_reload_status Reloading filter Dec 17 15:58:14 php-fpm 72332 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 15:57:58 check_reload_status Reloading filter Dec 17 15:57:58 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:57:58 check_reload_status Restarting ipsec tunnels Dec 17 15:57:58 check_reload_status updating dyndns VTI_IF6 Dec 17 15:57:58 rc.gateway_alarm 31367 >>> Gateway alarm: VTI_IF6 (Addr:169.254.95.249 Alarm:0 RTT:99.552ms RTTsd:.984ms Loss:65%) Dec 17 15:56:58 php-fpm 51431 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:56:58 php-fpm 51431 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:56:57 php-fpm 51431 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:56:57 php-fpm 51431 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:56:56 php-fpm 51431 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:56:56 php-fpm 51431 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:56:56 check_reload_status Reloading filter Dec 17 15:56:56 php-fpm 51431 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 15:56:40 check_reload_status Reloading filter Dec 17 15:56:40 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:56:40 check_reload_status Restarting ipsec tunnels Dec 17 15:56:40 check_reload_status updating dyndns VTI_IF2 Dec 17 15:56:40 rc.gateway_alarm 65558 >>> Gateway alarm: VTI_IF2 (Addr:10.77.1.17 Alarm:0 RTT:3.358ms RTTsd:5.134ms Loss:80%) Dec 17 15:56:26 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:56:25 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:56:24 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:56:24 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:56:24 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:56:23 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:56:23 check_reload_status Reloading filter Dec 17 15:56:23 php-fpm 83099 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 15:56:07 check_reload_status Reloading filter Dec 17 15:56:07 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:56:07 check_reload_status Restarting ipsec tunnels Dec 17 15:56:07 check_reload_status updating dyndns VTI_IF3 Dec 17 15:56:07 rc.gateway_alarm 26818 >>> Gateway alarm: VTI_IF3 (Addr:169.254.251.193 Alarm:1 RTT:9.798ms RTTsd:12.252ms Loss:81%) Dec 17 15:55:56 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:55:56 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:55:55 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:55:55 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:55:54 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:55:54 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:55:54 check_reload_status Reloading filter Dec 17 15:55:54 php-fpm 39952 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 15:55:38 check_reload_status Reloading filter Dec 17 15:55:38 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:55:38 check_reload_status Restarting ipsec tunnels Dec 17 15:55:38 check_reload_status updating dyndns VTI_IF5 Dec 17 15:55:38 rc.gateway_alarm 59775 >>> Gateway alarm: VTI_IF5 (Addr:169.254.181.145 Alarm:1 RTT:104.422ms RTTsd:13.656ms Loss:82%) Dec 17 15:55:13 php-fpm 69402 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:55:12 php-fpm 69402 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:55:12 php-fpm 69402 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:55:11 php-fpm 69402 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:55:11 php-fpm 69402 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:55:11 php-fpm 69402 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:55:11 check_reload_status Reloading filter Dec 17 15:55:11 php-fpm 69402 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 15:54:54 check_reload_status Reloading filter Dec 17 15:54:54 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:54:54 check_reload_status Restarting ipsec tunnels Dec 17 15:54:54 check_reload_status updating dyndns VTI_IF2 Dec 17 15:54:54 rc.gateway_alarm 35207 >>> Gateway alarm: VTI_IF2 (Addr:10.77.1.17 Alarm:1 RTT:14.119ms RTTsd:20.102ms Loss:81%) Dec 17 15:54:44 php-fpm 51431 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:54:43 php-fpm 51431 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:54:43 php-fpm 51431 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:54:42 php-fpm 51431 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:54:42 php-fpm 51431 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:54:41 php-fpm 51431 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:54:41 check_reload_status Reloading filter Dec 17 15:54:41 php-fpm 51431 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 15:54:25 check_reload_status Reloading filter Dec 17 15:54:25 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:54:25 check_reload_status Restarting ipsec tunnels Dec 17 15:54:25 check_reload_status updating dyndns VTI_IF5 Dec 17 15:54:25 rc.gateway_alarm 76600 >>> Gateway alarm: VTI_IF5 (Addr:169.254.181.145 Alarm:0 RTT:99.553ms RTTsd:.306ms Loss:65%) Dec 17 15:53:59 php-fpm 33378 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:53:59 php-fpm 33378 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:53:58 php-fpm 33378 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:53:58 php-fpm 33378 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:53:57 php-fpm 33378 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:53:57 php-fpm 33378 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:53:57 check_reload_status Reloading filter Dec 17 15:53:57 php-fpm 33378 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 15:53:41 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:53:41 check_reload_status Reloading filter Dec 17 15:53:41 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:53:41 check_reload_status Restarting ipsec tunnels Dec 17 15:53:41 check_reload_status updating dyndns VTI_IF2 Dec 17 15:53:41 rc.gateway_alarm 6906 >>> Gateway alarm: VTI_IF2 (Addr:10.77.1.17 Alarm:0 RTT:1.480ms RTTsd:1.715ms Loss:65%) Dec 17 15:53:40 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:53:40 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:53:39 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:53:39 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:53:39 php-fpm 83099 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:53:39 check_reload_status Reloading filter Dec 17 15:53:39 php-fpm 83099 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 15:53:29 check_reload_status Reloading filter Dec 17 15:53:29 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:53:29 check_reload_status Restarting ipsec tunnels Dec 17 15:53:29 check_reload_status updating dyndns VTI_IF6 Dec 17 15:53:29 rc.gateway_alarm 73428 >>> Gateway alarm: VTI_IF6 (Addr:169.254.95.249 Alarm:1 RTT:107.704ms RTTsd:20.844ms Loss:81%) Dec 17 15:53:23 check_reload_status Reloading filter Dec 17 15:53:23 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:53:23 check_reload_status Restarting ipsec tunnels Dec 17 15:53:23 check_reload_status updating dyndns VTI_IF3 Dec 17 15:53:23 rc.gateway_alarm 30236 >>> Gateway alarm: VTI_IF3 (Addr:169.254.251.193 Alarm:0 RTT:5.659ms RTTsd:4.316ms Loss:65%) Dec 17 15:52:58 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:52:58 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:52:57 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:52:57 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:52:56 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:52:56 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:52:56 check_reload_status Reloading filter Dec 17 15:52:56 php-fpm 39952 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 15:52:40 check_reload_status Reloading filter Dec 17 15:52:40 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:52:40 check_reload_status Restarting ipsec tunnels Dec 17 15:52:40 check_reload_status updating dyndns VTI_IF6 Dec 17 15:52:40 rc.gateway_alarm 47385 >>> Gateway alarm: VTI_IF6 (Addr:169.254.95.249 Alarm:0 RTT:104.125ms RTTsd:16.181ms Loss:66%) Dec 17 15:52:35 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec6000' create reqid '6000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:52:34 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec5000' create reqid '5000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:52:33 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec4000' create reqid '4000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:52:33 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec3000' create reqid '3000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:52:32 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec2000' create reqid '2000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:52:31 php-fpm 39952 /rc.newipsecdns: The command '/sbin/ifconfig 'ipsec1000' create reqid '1000'' returned exit code '1', the output was 'ifconfig: create: bad value' Dec 17 15:52:31 check_reload_status Reloading filter Dec 17 15:52:31 php-fpm 39952 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Dec 17 15:52:15 check_reload_status Reloading filter Dec 17 15:52:15 check_reload_status Restarting OpenVPN tunnels/interfaces Dec 17 15:52:15 check_reload_status Restarting ipsec tunnels Dec 17 15:52:15 check_reload_status updating dyndns VTI_IF3 Dec 17 15:52:15 rc.gateway_alarm 37136 >>> Gateway alarm: VTI_IF3 (Addr:169.254.251.193 Alarm:1 RTT:11.159ms RTTsd:15.710ms Loss:81%)
-
Sounds like https://redmine.pfsense.org/issues/9668
-
I don't think so, the problem is not limited to routing daemons or any other packages. I can't even ping the other side of the tunnel (i.e. from VTI interface to VTI interface), i don't see how restarting FRR would solve that...
I actually already tried to manually add changes from bugfixes that i found useful, but to no avail. I used this one https://redmine.pfsense.org/projects/pfsense/repository/revisions/3aea62307804fbdcc433f959072d978efe14a7f2/diff/src/etc/inc/interfaces.inc and this one https://redmine.pfsense.org/projects/pfsense/repository/revisions/affe8a552ef1f7b8e59f3b60fd1421aa46f45b03/diff/src/etc/inc/interfaces.inc (the latter is probably unrelated though).
-
Bump and some additional info:
With that line removed, we initially thought we had gotten rid of the problem, but it looks like we only added delay. After about 2 weeks, the IPSec tunnels entered the same weird state where they are up and traffic going out can be seen, but dpinger (or a manual ping) seems to never reach the remote end. Hard to tell if the exact timing is reproducible, but it will happen again eventually after a few days/weeks. (we've seen it happen twice since i initially posted here).
Rebooting seems to be the only way to fix this. Restarting ipsec service didn't do anything, neither did disconnecting the tunnels via webinterface and reconnecting.Our resort for now was to move the AWS Site-to-Site tunnels off to an EdgeRouter, as we only started to encounter the issues once they were added. Worker fine before with only pfsense<->pfsense tunnels. I'll continue to monitor the situation, and am waiting for the 2.4.5 release to become available. Hopefully the various bugfixes will in the end resolve the problem.
I'll also soon have the opportunity to make another, almost identical setup using two MB-4220T pfsense appliances (https://store.netgate.com/MBT-4220-system.aspx) instead of SG-3100s.