24.03 FRR has flapping BGP neighbors
-
Post 24.09 uypgrade i am running into a strange issue with FRR, specifically BGP.
I have two peerings over VPN to different locations.Routing uptime has always been stable but since the upgrade I notice I have a constant flapping peer. Running a pcap over the ipsc interface I see the following error code from BGP.
A bit of googling later, it seems this occurs when two neighbors attempt to establish at the same time. Strange that this is happening now so the solution I've read is to place a peer into passive mode. I do that and the peerings never come up.
I disable passive mode and the neighbors come up and establish for about a minute and go back down again.
I then restart FRR service which doesn't help.
I reverted back to the previous install and no issue (thank goodness for boot environments).Something is very suspect with the latest FRR release.
Anyone care to help out?
to be clear, there are two BGP neighbors on the pfsense. The other neighbor is just fine.Connections established 4; dropped 3 Last reset 00:02:36, Notification received (Cease/Connection Collision Resolution) External BGP neighbor may be up to 1 hops away. Local host: 172.28.0.6, Local port: 5469 Foreign host: 172.28.0.5, Foreign port: 179 Nexthop: 172.28.0.6 Nexthop global: fe80::92ec:77ff:fe34:cedc Nexthop local: fe80::92ec:77ff:fe34:cedc BGP connection: non shared network BGP Connect Retry Timer in Seconds: 120 Estimated round trip time: 20 ms Read thread: on Write thread: on FD used: 24 BFD: Type: single hop Detect Multiplier: 3, Min Rx interval: 300, Min Tx interval: 300 Status: Down, Last update: 0:00:13:05
-
There is also no ping loss over the tunnel.
IPsec is working fine. Just FRR specifically is flaky.-- 172.28.0.5 ping statistics --- 115 packets transmitted, 115 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 20.277/20.541/23.086/0.255 ms
-
I think i figured it out the cause.
https://www.netgate.com/blog/state-policy-default-change
So to add a bit more color i have two VPNs.
- Wireguard - Stable
- IPsec VPN - Unstable
Checking logging for the past few hours i noticed the blocks started occurring after the upgrade.
I am pretty confident that i have data showing this to be a post upgrade issue. Now the question is why.
Digging through the changes i noticed the bit about the firewall policy changes.
It still isn't clear to me WHY these two types of VPNs have different reactions to the state policy change introduced but there you have it.
I have updated the policy specifically for a rule i created permitting BGP across the tunnel and under Advanced there is the setting to adjust.I feel this needs a redmine but not sure. @stephenw10 what do you think?
-
IPSec has some unique features with how it's filtered. What sort of IPSec tunnel(s) are you using, route or policy mode? How is the 'IPsec Filter Mode' set in IPSec Advanced Settings?
Depending on those you can have outbound states on the assigned interface and reply traffic on 'ipsec' and hence can be tripped up by the interface bound states.
-
@stephenw10
Howdy StephenI am using route based IPsec - VTI.
IPsec Filter Mode set to 'Filter IPsec Tunnel, Transport and VTI on IPsec tab'So are you saying there are two interfaces involved here? VTI and encap0 ?
If so i can logically see it being asymmetrical but i would like you to confirm. Its def a nuance point specifically with IPsec that should be documented i think. -
Yup pretty much exactly as you described it in the bug: https://redmine.pfsense.org/issues/15430.
The same issue that prevents NAT working on VTI interfaces. [https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248474](https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248474
-
@stephenw10
Makes sense.
Ive always wondered and you have more familiarity with whats being used out there but should i instead use the following?
I understand the limitation here is that i can no longer use IPsec mobile but if this is only going to be a IPsec gateway does it matter?
What are customers using out in the field?And if i make the above change does that mean the asymmetrical problem is solved?
-
It's not just mobile IPSec that is blocked it's also anything in policy mode (not VTI). If you're only using VTI mode IPSec then set that and it removes most if the limitations. Including this with state binding.
-
@stephenw10
i think thats reasonable for me then.
So this turns pfsense into a strickly VTI/Routed IPsec gateway.
The remote end can do whatever it wants technically, right? (policy or routed).The only downside is if you have a large amount of IPsec tunnels. They all get their own interface and firewall rules but would that be a GUI limitation displaying all of that?
-
@michmoor
Indeed the other end can do whatever it wants. However, I've found that having routed on one and policy on the other is prone to config mistakes so I would not normally recommend that. -
Ha! Yup if you're looking for a bad time and confusing diagnosis try mixing route and policy based IPSec.
-
-
i'm having the same issue, 23.09 vti ipsec tunnels worked great with frr/bgp, now they keep flapping. if i want to go back to 23.09, where would i get that image? or, what is the fix if there is one?
jim
-
If you're running ZFS you can just roll back the Boot Environment.
The Net Installer can install a number of versions including 23.09.1.
But you should first just try switching the State Interface Binding back to floating:
https://docs.netgate.com/pfsense/en/latest/config/advanced-firewall-nat.html#config-advanced-firewall-state-policy -
@stephenw10 thanks! i rolled back and everything working great...This is the first time i've had to do that.
jim -
@stephenw10 so, how will i know its ok to upgrade in the future? will they have a release note about frr fix possibly?
jim
-
Well that's why I suggested switching the state binding back to floating. If that allows BGP to come up correctly in 24.03 then the fix here is to add floating rules for the VTI tunnels (if you have those).
The state binding changed in 24.03 to make it more secure and that isn't likely to be changed back. The underlying issue with VTI interfaces is being looked at but until then you need floating state binding rules for it. -
@stephenw10 ok, i'll try a test on a non production firewall :) when you say add floating rules, what exactly do you mean?
jim
-
-
@stephenw10 the flapping only seems to happen when both ends are on 24.03, i'll keep testing with my dev firewalls.
jim
-
@michmoor hi mich, can you give more detail on what rules you created to allow bgp across the interfaces?
thanks
jim