FFR Restart on configuration changes
Premise: I'm going under the assumption this is currently necessary for a configuration change to take effect, if in the future this changes to a reload or even live inserting changes, then it is no longer necessary.
When pressing save on any of the FRR package pages a full restart seems to be triggered on the package, this is great if you're just developing with it however if you're using it to do a bit more then that it can cause some issues, especially when external peers and their anti-flapping timers are concerned.
Is there a way to stop this behaviour and manually restart FRR? So a lot of changes can be made, and only restart once for all of them, rather than on each save?
Any help would be appreciated.
At the moment there isn't a good way to do that. It might be possible to reengineer the package to work that way in the future, but it would have to be very vocal about making the user go and manually apply changes. Users have enough trouble remembering to do that for squidGuard I am hesitant to make FRR work that way unconditionally.
There is also a script for FRR that might be helpful, though it would also require that the FRR package be changed in a number of ways to take advantage of it, starting with using a single unified FRR config file instead of per-daemon files.
You can also get it where you need to be and switch to custom configuration files instead.
Then you could make a set of changes with one save/restart.
@jimp Perhaps it could simply be an option with the default set to the current behaviour?
It's just quite frustrating when you know pressing save will probably cause an alert at someone else's end cause the session flapped.
We don´t need OSPF/BGP for playing around on a few links to other sites. Usually dynamic routing is for large sites starting from tens of routes. Adding or deleting a new connection in Pfsense/FRR package causes clearing all FRR routes from the routing table (!) which leads to major timeouts and interrupts on all services on the dynamic routing connections. The routes are inserted afterwards (after FRR restart), but the timeouts are far to long.
I minimzed the OSPF timings to the absolute minimum, but that causes lots of traffic, when douzens of connections on weak WAN connections talk to eachother and even that causes long timeouts.
This is really unusable on production sites. So please don´t clear/touch other routes, when something on FRR changes. With Cisco OSPF there aren´t such interrupts, just the altered route pops up or gets deleted.
I get what you're saying but if it's that critical you should not be making changes outside of a set maintenance window off-hours. If you're making random changes in the middle of the day and disrupting your business, that's not the fault of the package.
@jimp I would normally agree with this, but sometimes the changes you do are in response to things that are affecting your operation, such as a peer experiencing issues and needing to be disabled, or priorities changed. Changing these settings shouldn’t cause several minutes of outage on the entire platform.
the Freebsd implemention of FRR seems to lack "service frr reload" which would be essential for a dynamic reload of the configuration. this command should read all the configs and reload the daemons without interrupting the running operations, at least the manuals of FRR are mentioning that. Am i wrong? Maybe we can have the reload command from FRR?
Shell Output - service frr reload
/usr/local/etc/rc.d/frr: unknown directive 'reload'.
Usage: /usr/local/etc/rc.d/frr [fast|force|one|quiet](start|stop|restart|rcvar|enabled|describe|extracommands|configtest|status|poll)
That doesn't really matter. That's just the rc script that makes it easier to run it in a standard rc script way. IIRC Some aspects of FRR can be reloaded with a SIGHUP but we've seen that fail before, it doesn't quite do what you think it does, and doesn't always catch all changes, especially to interfaces that connect/disconnect/change addresses.
looking at the FRR github pages, especially the "open issues" and the results of the automatic testing, leads me to the opinion, that the FRR suite needs more work and time to get a reliable beta state. Automatic testing finds lots of failures in the operations, even on the 7.X releases, which shows that the FRR 6.X package isnt tested anymore on freebsd 12, so there is time to update to FRR 7.x on Pfsense.
To solve the reload issue we can only trust at your work and experience. You mentioned the unified configuration feature (frr.conf) which is already included in FRR and the reload script. To adopt the current pfsense package and GUI to this may be huge work. You have done lots of work to improve the FRR package, so dont stop half the way. I think this would be a major enhancement of the whole dynamic routing package in Pfsense. On the other hand : if a change of firewall rules would cause an interrupt of all packet forwards in Pfsense, it would be really problematic and there are good reasons, that this is not the case.
Next point for sure, but there is already a feature request, sync FRR within CARP, this maybe easier with an unified configuration file (frr.conf) so we get two flies with one stone.
pfSense 2.5.0 snapshots are based on FreeBSD 12 and are already using FRR 7.0