FRR seeing IPsec tunnels disappearing
-
I think i found the restart event with cause.
ipsecdnsedit: Oh man....so i do have a few IPsec tunnels using the DNS name of a remote gateway instead of an IPv4 address. My theory is that the IP changes, the ipsecdns process picks it up, and restarts all tunnels. I really hope that's not the case but if so that's really bad.
-
Mmm, indeed. IIRC there's something specific about the way FRR interacts with it there.
Looks like it may be related to this:
https://redmine.pfsense.org/issues/10503Though in your case no gateway actually goes down?
-
@stephenw10 No gateways go down.
The incident happened just now and whats good is that i now know what to look for.
Is there a way to find out which gateway is changing its IP.
Also should i open a redmine?
-
Hmm, if it's actually a gateway I'd expect to see that logged there and in the gateways log.
If it's just a remote IPSec node that changed IP that's probably in the resolver log.
-
The only times in the gateway log are the following. Packet loss but nothing seen showing a complete loss. Considering all VPN tunnels bounce, these error messages make sense.
Resolver log shows nothing useful. I see pfsense checking local cache for DNS but i don't see any related errors
-
Hmm, it does seem to have triggered something at 14:19:13 though. Was there anything in the system log leading up to that? I can just about see there was a newipsecdns call then.
-
@stephenw10 Yep a restart event
-
Hmm, so in both cases the first thing logged is 'Restarting IPsec tunnels' ?
That would normally be triggered by something else. Were any tunnels being renewed at that point?
-
@stephenw10 That is correct, that is the first thing logged.
-
Is it possible that coincided with the renew time for the tunnel using an FQDN remote endpoint?
-
I believe it does. For both incidents. Even though the time between a change of IP and the restart are a few minutes apart so it doesn't seem to occur right away.
Incident one. Time of restart event was around 09:38
./pfblockerng/dns_reply.log:DNS-reply,Oct 7 09:32:25,resolver,A,A,300,vpn.server4u.in,127.0.0.1,124.123.66.69,IN ./pfblockerng/dns_reply.log:DNS-reply,Oct 7 09:37:25,resolver,A,A,300,vpn.server4u.in,127.0.0.1,103.127.188.125,IN ./pfblockerng/dns_reply.log:DNS-reply,Oct 7 09:42:41,resolver,A,A,300,vpn.server4u.in,127.0.0.1,124.123.66.69,IN
Incident two: 14:18
./dns_reply.log:DNS-reply,Oct 7 14:14:03,resolver,A,A,300,vpn.networkzz.co.in,127.0.0.1,210.89.55.63,IN <--- ./dns_reply.log:DNS-reply,Oct 7 14:18:33,resolver,A,A,300,vpn.networkzz.co.in,127.0.0.1,202.88.209.151,IN
Im happy that we found something that is reproducable.
-
Hmm, OK so did those endpoints actually change? Are they FQDNs that resolve to several IPs?
I'd guess there is some timeout there that has to add-up over those 4mins.
Either way I agree it should not affect all IPSec tunnels.
-
@stephenw10 said in FRR seeing IPsec tunnels disappearing:
Hmm, OK so did those endpoints actually change? Are they FQDNs that resolve to several IPs?
Yep those endpoints do resolve to several IPs. One of those i know for sure because i remember the set up for that recently.
I did open the redmine for it for tracking purposes. Dont think there is any workaround for this other than getting into the weeds of how IPsec is configured/built
-
-
Hmm, do they all resolve IPs? Conversely do you have any that only resolve to one IP that doesn't cause this?
Like is this being triggered because it's resolving a different IP address everytime or just because it is re-resolving at all?
-
@stephenw10 said in FRR seeing IPsec tunnels disappearing:
do they all resolve IPs? Converse
I have a few IPsec tunnels that are by IP only. I suspect this is being caused every time the it detects a change in the IP when pfsense goes to resolve the name.
-
Yup. Are you able to test that by adding a host override so it always resolves to the same IP?
-
@stephenw10 that’s a good idea. Setting up one now. I’ll observe overnight maybe for a few days.
Have you discussed this internally?
-
Yes, I think we've looked at this from other angles before. Just trying to pin down what's happening. I suspect there are several things open with the same root cause here.
-
Lets give it two days or so. Searching on events
/rc.newipsecdns: IPSEC:
i noticed that this occurs either every day or every two days. I think we should know if the host override solves this problem by Wednesday