24.03 FRR has flapping BGP neighbors
-
Post 24.09 uypgrade i am running into a strange issue with FRR, specifically BGP.
I have two peerings over VPN to different locations.Routing uptime has always been stable but since the upgrade I notice I have a constant flapping peer. Running a pcap over the ipsc interface I see the following error code from BGP.
A bit of googling later, it seems this occurs when two neighbors attempt to establish at the same time. Strange that this is happening now so the solution I've read is to place a peer into passive mode. I do that and the peerings never come up.
I disable passive mode and the neighbors come up and establish for about a minute and go back down again.
I then restart FRR service which doesn't help.
I reverted back to the previous install and no issue (thank goodness for boot environments).Something is very suspect with the latest FRR release.
Anyone care to help out?
to be clear, there are two BGP neighbors on the pfsense. The other neighbor is just fine.Connections established 4; dropped 3 Last reset 00:02:36, Notification received (Cease/Connection Collision Resolution) External BGP neighbor may be up to 1 hops away. Local host: 172.28.0.6, Local port: 5469 Foreign host: 172.28.0.5, Foreign port: 179 Nexthop: 172.28.0.6 Nexthop global: fe80::92ec:77ff:fe34:cedc Nexthop local: fe80::92ec:77ff:fe34:cedc BGP connection: non shared network BGP Connect Retry Timer in Seconds: 120 Estimated round trip time: 20 ms Read thread: on Write thread: on FD used: 24 BFD: Type: single hop Detect Multiplier: 3, Min Rx interval: 300, Min Tx interval: 300 Status: Down, Last update: 0:00:13:05
-
There is also no ping loss over the tunnel.
IPsec is working fine. Just FRR specifically is flaky.-- 172.28.0.5 ping statistics --- 115 packets transmitted, 115 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 20.277/20.541/23.086/0.255 ms
-
I think i figured it out the cause.
https://www.netgate.com/blog/state-policy-default-change
So to add a bit more color i have two VPNs.
- Wireguard - Stable
- IPsec VPN - Unstable
Checking logging for the past few hours i noticed the blocks started occurring after the upgrade.
I am pretty confident that i have data showing this to be a post upgrade issue. Now the question is why.
Digging through the changes i noticed the bit about the firewall policy changes.
It still isn't clear to me WHY these two types of VPNs have different reactions to the state policy change introduced but there you have it.
I have updated the policy specifically for a rule i created permitting BGP across the tunnel and under Advanced there is the setting to adjust.I feel this needs a redmine but not sure. @stephenw10 what do you think?
-
IPSec has some unique features with how it's filtered. What sort of IPSec tunnel(s) are you using, route or policy mode? How is the 'IPsec Filter Mode' set in IPSec Advanced Settings?
Depending on those you can have outbound states on the assigned interface and reply traffic on 'ipsec' and hence can be tripped up by the interface bound states.
-
@stephenw10
Howdy StephenI am using route based IPsec - VTI.
IPsec Filter Mode set to 'Filter IPsec Tunnel, Transport and VTI on IPsec tab'So are you saying there are two interfaces involved here? VTI and encap0 ?
If so i can logically see it being asymmetrical but i would like you to confirm. Its def a nuance point specifically with IPsec that should be documented i think. -
Yup pretty much exactly as you described it in the bug: https://redmine.pfsense.org/issues/15430.
The same issue that prevents NAT working on VTI interfaces. [https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248474](https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248474
-
@stephenw10
Makes sense.
Ive always wondered and you have more familiarity with whats being used out there but should i instead use the following?
I understand the limitation here is that i can no longer use IPsec mobile but if this is only going to be a IPsec gateway does it matter?
What are customers using out in the field?And if i make the above change does that mean the asymmetrical problem is solved?
-
It's not just mobile IPSec that is blocked it's also anything in policy mode (not VTI). If you're only using VTI mode IPSec then set that and it removes most if the limitations. Including this with state binding.
-
@stephenw10
i think thats reasonable for me then.
So this turns pfsense into a strickly VTI/Routed IPsec gateway.
The remote end can do whatever it wants technically, right? (policy or routed).The only downside is if you have a large amount of IPsec tunnels. They all get their own interface and firewall rules but would that be a GUI limitation displaying all of that?
-
@michmoor
Indeed the other end can do whatever it wants. However, I've found that having routed on one and policy on the other is prone to config mistakes so I would not normally recommend that. -
Ha! Yup if you're looking for a bad time and confusing diagnosis try mixing route and policy based IPSec.
-