FRR OSPF Tuning for Fast Convergence
-
Does anyone have a guide or experience with tuning ospf to converge faster than default?
I see support for sub second hellos but i cant find documentation on how to fine tune ospf.
As of right now i have a ospf domain with several sites connected over IPSEC VTI links that whenever a change is made, the client phone/voip system cuts out for a couple seconds. The convergence times are ranging from 5-15 seconds and in some cases up to 45 seconds.
Any help would be appreciated. Thanks guys
-
I have the same problems here. The current implementation of Pfsense/FRR doesnt work as expected, in case of changes on the links. My experience is, that the whole routing table wil be cleared and afterwards rebuild, if anything on the known OSPF routes changes. Even a not involved route/interface can trigger that reassembling and leads to major timeouts. Therefore tuning for fast convergence will have limited success. I had some success in less timeouts, when setting the hello/dead timers to 1/2. But that causes a lot more traffic and the effect depends on WAN latencies and general performance on involved routers. FRR/Pfsense is not yet ready for uninterrupted OSPF/Routing changes.
-
The attached patch isn't ideal, but may help avoid some restarting of FRR when interface events happen.
You can install the System Patches package and then create an entry with the contents of the attached file pasted in to apply the fix.
skip_restart_for_routing_packages-2.4.5.patch (It says 2.4.5 but it also applies to 2.4.4-p3)
-
Thank you Jim, i appreciate this. During the nights are lots of Wan events and i applied this patch. So lets wait and see if this helps.
-
WOW. Yeehaw .... This looks really, really good. This patch is the best one which i have seen in the last 4 years with pfsense. The change of the routes on a 200km WAN line with OSPF is about 300ms without loosing any pings. It just pops up and the route is changed. No packet loss. This is really a great pleasure. Can we have this on 2.4.5???
For changes on the configuration ... you mentioned it before : https://github.com/FRRouting/frr/blob/master/tools/frr-reload.py
can we put this on the wishlist for vers. 2.5 ???Thank you!
-
It's too late for a change like that to go into 2.4.5, but I may try to find a way to work that into 2.5.0 as an optional behavior (defaulting to off). You can use the patch for 2.4.5 if you need it.
The change to use that config script is much, much more complex. It will require redesigning how the FRR package generates the FRR config completely. And we've found some quirks with using that method on TNSR so it's not quite as easy as it would appear to be.
-
Sounds promising. I will implement this weekend. Thanks!!!!!!!
-
After this weekend, the results of the "dont restart FRR on link events patch" is mixed. On the one site for routed ipsec vti connections, its much better, for openvpn its worse.
I have timeouts on wan with openvpn, which leads to the situation, that OSPF is perfectly correct with the routes, but the routes are not in the routing table, the openvpn route is missing. After a FRR restart, they are there. Got this for several times and this is reproducible.
After a longer up and down of an ipsec vti link, there were no connectivity, no ping, needed to restart FRR too, to made it run again. Didnt have a chance to look for the exact situation of routing tables.
So the conclusio is, that after timeouts on wan, routing is wrong and needs manual intervention. Before the patch, it timeouts for a longer time, but recovers afterwards.
Both isnt really a valid situation for production.
-
That's part of why it's still just a patch and not in the code. It helps some situations, but not others. That said, you can't really have it both ways. It can either restart the packages or not restart the packages.
The reason it has to restart is typically what you have seen -- FRR needs to restart to latch back onto an interface which has changed (or probably deleted and recreated).
IPsec may fare better because of the changes I made in https://redmine.pfsense.org/issues/9668 which cycles the interface in FRR live without restarting the package. I'm not sure if a similar change for OpenVPN would be viable.