Major issue with QUAGGA-OSPF and VLANs (pfsense 2.3.0)
-
Hi guys… so ... maybe we should try changing the script and remove -9 like Martin suggested, I think he might not be too keen to respond until that is tried since he specifically asked to try that. Is it possible that while that piece of code was removed, another one was added to do the same function for cleanup of routes or similar ?
-
you can find/remove the -9 in
line 306-325 /usr/local/pkg/quagga_ospfd.inc
after clicking 'save' in the webgui the rc-file will be updated
/usr/local/etc/rc.d/quagga.sh
i don't have a test environment but i've done this on my home box. adjusting above should be fairly safe in a non-production environment.
i also fail to see how this will solve the issue; but it might be a hackish workaround (as jimp already mentioned)
-
Hi guys… so ... maybe we should try changing the script and remove -9 like Martin suggested, I think he might not be too keen to respond until that is tried since he specifically asked to try that. Is it possible that while that piece of code was removed, another one was added to do the same function for cleanup of routes or similar ?
I would say go ahead and try this and see what happens. However, even if it cleans up the routes without using -9, that won't be ideal for two reasons:
- Do you really want it to remove all OSPF routes from your firewall for a few seconds, maybe even longer? It takes some time for OSPF to start back up, establish neighbors, etc.
- What if Quagga (either zebra or ospf) crashes at some point? You would need to restart your firewall, just starting Quagga won't work because it didn't shut down cleanly and remote OSPF routes.
Quagga really should be able to detect routes that it put into the kernel. Before v1.0 it did this, and it still actually does detect the routes it put there, it just doesn't remove them from the zebra RIB like it used to.
-
Hello there,
I'm wondering if anything have change this last month concerning Quagga OSPF/Kernel problem. Seems I'm still stuck with the kernel route written and the OSPF not used …
Thx
-
Still not solved?
-
Still not solved?
I don't know man … I almost want to switch to watchguard for my multi site OSPF deployments now.
Knowing that support for a major component Like a routing package is non existent ( not pfsense fault seems quagga doesn't want to a knowledge problem ) is worrisome. I don't even have any more hours to invest in troubleshooting this as I have to catch up with projects.
-
I'd like to confirm that removing the -9 has resolved the OSPF learned routes getting stuck as Kernel routes. I have been attempting a seamless voice failover setup using 2 openvpn tunnels and was running OSPF on those interfaces. This had been the only issue preventing this from working. After several tests in my lab everything appears to be working without issue.
-
@reqlez could you report to your contacts at quagga that removing -9 works around the issue.
We need a permanent fix, that also works with -9
-
-
Did https://github.com/pfsense/FreeBSD-ports/pull/265 to hack around this stupidity, since this has been going on for way too long… Obviously not a real solution, as noted here and here.
So just to clarify, if you kill quagga without -9 it will remove the routes from the kernel until it starts back up and re-learns the routes, correct? So it basically creates a brief outage, which is not great either.
It would be nice to hear from someone at pfSense about what our options are to get a long-term solution. From my understanding there are two options:
- Prevent quagga from restarting, by using VTY for configuration changes instead of generating new configuration files and restarting.
- Add the code back to quagga (zebra) that was removed that filters out the kernel routes put there by itself.
I think #2 would be easiest, but I'm not sure if the quagga community will be open to that, as I can't find out why the code was removed/commented out to begin with.
-
Did https://github.com/pfsense/FreeBSD-ports/pull/265 to hack around this stupidity, since this has been going on for way too long… Obviously not a real solution, as noted here and here.
So just to clarify, if you kill quagga without -9 it will remove the routes from the kernel until it starts back up and re-learns the routes, correct? So it basically creates a brief outage, which is not great either.
I'd figure out that dealing with ~1-2 seconds outage would be a whole lot better than having bogus "kernel" routes picked up by zebra and getting routing broken. You of course are welcome to provide better solution. So far, for ~1 year, noone provided any better ideas for the upstream regression.
Also, this thread is not about "pfSense should not restart routing packages". I'd guess that the summary provided by jimp is pretty accurate:
Preventing it from restarting is a hackish workaround no matter what signal is used. It will get restarted at some point and failing to recover gracefully is a regression in quagga's behavior in 1.x.
Restarting the package is required at minimum on upgrades, not avoidable.
-
- Add the code back to quagga (zebra) that was removed that filters out the kernel routes put there by itself.
I think #2 would be easiest, but I'm not sure if the quagga community will be open to that, as I can't find out why the code was removed/commented out to begin with.
That is what needs to happen. Quagga needs to recognize its own routes by the flags in the routing table. There's no reason they should have removed that code that I can see.
-
I'd figure out that dealing with ~1-2 seconds outage would be a whole lot better than having bogus "kernel" routes picked up by zebra and getting routing broken. You of course are welcome to provide better solution. So far, for ~1 year, noone provided any better ideas for the upstream regression.
I'm not knocking your efforts, I think in most cases (including mine) your pull request would be better than the current situation. However, I'm not sure it would be better in all cases.
Also, this thread is not about "pfSense should not restart routing packages". I'd guess that the summary provided by jimp is pretty accurate:
Preventing it from restarting is a hackish workaround no matter what signal is used. It will get restarted at some point and failing to recover gracefully is a regression in quagga's behavior in 1.x.
Restarting the package is required at minimum on upgrades, not avoidable.
I agree, however I'm pretty sure the quagga community disagrees, at least with the first sentence. According to the quagga community the proper way to handle configuration changes (from what I can tell) is to use the VTY or VTYSH to make changes like you would with a router, not by re-writing the configuration files, killing with -9, and restarting the daemons. For reference I was discussing this on their list here: https://lists.quagga.net/pipermail/quagga-users/2016-November/014557.html, and then here (one guy was replying off-list and I tried to add it back to the list, but nobody else chimed in): https://lists.quagga.net/pipermail/quagga-users/2016-November/014571.html. If you're doing an upgrade you can kill it without -9 and it will recover fine, and in that case an outage of a few seconds isn't a big deal.
- Add the code back to quagga (zebra) that was removed that filters out the kernel routes put there by itself.
I think #2 would be easiest, but I'm not sure if the quagga community will be open to that, as I can't find out why the code was removed/commented out to begin with.
That is what needs to happen. Quagga needs to recognize its own routes by the flags in the routing table. There's no reason they should have removed that code that I can see.
I can try to discuss with them again (there has been turmoil on the quagga lists lately), and even submit a pull request reverting the changes. If they refuse to allow that code back in, what is the plan going forward for OSPF support in pfSense?
-
If they won't fix it I'm not sure what the best path is. Maybe adding a port for the old version, or adding that code back in as a patch on the port.
If FreeBSD's route command would let us flush based on -proto1/RTF_PROTO1 then we could clear out its old routes before restarting, but that also seems harsh.
-
Hi,
Does anyone know if this issue has been fixed? I just noticed that Quagga 0.6.17 is available in the packet manager. Will try myself obviously but just wondering if anyone can confirm.
Thanks.
EDIT: Can confirm that 0.6.17 solves the issue.
-
The killall -9 is gone, yes. https://github.com/pfsense/FreeBSD-ports/pull/265
-
Is anyone else can confirm?
-
Hi All,
I'm having the same issue but when I tried to revert using the following command:pkg add -f http://pkg.freebsd.org/freebsd:10:x86:64/release_3/All/quagga-0.99.24.1_2.txz
The OSPF and ZEBRA service no longer started.
If I ran the following command via SSH, I received this error:
Exec format error
Anyone have an idea of what I may be doing wrong or perhaps a configuration incompatibility that I must remove? I tried uninstalling the packages, rebooting then reinstalling but didn't help. I tried removing all the interfaces from the configuration but services still didn't start.
This is a MAJOR issue for us because we rely on OSPF for redundancy, at the moment, without it working, if a link goes down, we have to manually reboot the pfSense units so that the new routes are written.
I've attached my ospfd.conf and zebra.conf files with some of the IP's and passwords changed.