Major issue with QUAGGA-OSPF and VLANs (pfsense 2.3.0)
-
Still not solved?
-
Still not solved?
I don't know man … I almost want to switch to watchguard for my multi site OSPF deployments now.
Knowing that support for a major component Like a routing package is non existent ( not pfsense fault seems quagga doesn't want to a knowledge problem ) is worrisome. I don't even have any more hours to invest in troubleshooting this as I have to catch up with projects.
-
I'd like to confirm that removing the -9 has resolved the OSPF learned routes getting stuck as Kernel routes. I have been attempting a seamless voice failover setup using 2 openvpn tunnels and was running OSPF on those interfaces. This had been the only issue preventing this from working. After several tests in my lab everything appears to be working without issue.
-
@reqlez could you report to your contacts at quagga that removing -9 works around the issue.
We need a permanent fix, that also works with -9
-
-
Did https://github.com/pfsense/FreeBSD-ports/pull/265 to hack around this stupidity, since this has been going on for way too long… Obviously not a real solution, as noted here and here.
So just to clarify, if you kill quagga without -9 it will remove the routes from the kernel until it starts back up and re-learns the routes, correct? So it basically creates a brief outage, which is not great either.
It would be nice to hear from someone at pfSense about what our options are to get a long-term solution. From my understanding there are two options:
- Prevent quagga from restarting, by using VTY for configuration changes instead of generating new configuration files and restarting.
- Add the code back to quagga (zebra) that was removed that filters out the kernel routes put there by itself.
I think #2 would be easiest, but I'm not sure if the quagga community will be open to that, as I can't find out why the code was removed/commented out to begin with.
-
Did https://github.com/pfsense/FreeBSD-ports/pull/265 to hack around this stupidity, since this has been going on for way too long… Obviously not a real solution, as noted here and here.
So just to clarify, if you kill quagga without -9 it will remove the routes from the kernel until it starts back up and re-learns the routes, correct? So it basically creates a brief outage, which is not great either.
I'd figure out that dealing with ~1-2 seconds outage would be a whole lot better than having bogus "kernel" routes picked up by zebra and getting routing broken. You of course are welcome to provide better solution. So far, for ~1 year, noone provided any better ideas for the upstream regression.
Also, this thread is not about "pfSense should not restart routing packages". I'd guess that the summary provided by jimp is pretty accurate:
Preventing it from restarting is a hackish workaround no matter what signal is used. It will get restarted at some point and failing to recover gracefully is a regression in quagga's behavior in 1.x.
Restarting the package is required at minimum on upgrades, not avoidable.
-
- Add the code back to quagga (zebra) that was removed that filters out the kernel routes put there by itself.
I think #2 would be easiest, but I'm not sure if the quagga community will be open to that, as I can't find out why the code was removed/commented out to begin with.
That is what needs to happen. Quagga needs to recognize its own routes by the flags in the routing table. There's no reason they should have removed that code that I can see.
-
I'd figure out that dealing with ~1-2 seconds outage would be a whole lot better than having bogus "kernel" routes picked up by zebra and getting routing broken. You of course are welcome to provide better solution. So far, for ~1 year, noone provided any better ideas for the upstream regression.
I'm not knocking your efforts, I think in most cases (including mine) your pull request would be better than the current situation. However, I'm not sure it would be better in all cases.
Also, this thread is not about "pfSense should not restart routing packages". I'd guess that the summary provided by jimp is pretty accurate:
Preventing it from restarting is a hackish workaround no matter what signal is used. It will get restarted at some point and failing to recover gracefully is a regression in quagga's behavior in 1.x.
Restarting the package is required at minimum on upgrades, not avoidable.
I agree, however I'm pretty sure the quagga community disagrees, at least with the first sentence. According to the quagga community the proper way to handle configuration changes (from what I can tell) is to use the VTY or VTYSH to make changes like you would with a router, not by re-writing the configuration files, killing with -9, and restarting the daemons. For reference I was discussing this on their list here: https://lists.quagga.net/pipermail/quagga-users/2016-November/014557.html, and then here (one guy was replying off-list and I tried to add it back to the list, but nobody else chimed in): https://lists.quagga.net/pipermail/quagga-users/2016-November/014571.html. If you're doing an upgrade you can kill it without -9 and it will recover fine, and in that case an outage of a few seconds isn't a big deal.
- Add the code back to quagga (zebra) that was removed that filters out the kernel routes put there by itself.
I think #2 would be easiest, but I'm not sure if the quagga community will be open to that, as I can't find out why the code was removed/commented out to begin with.
That is what needs to happen. Quagga needs to recognize its own routes by the flags in the routing table. There's no reason they should have removed that code that I can see.
I can try to discuss with them again (there has been turmoil on the quagga lists lately), and even submit a pull request reverting the changes. If they refuse to allow that code back in, what is the plan going forward for OSPF support in pfSense?
-
If they won't fix it I'm not sure what the best path is. Maybe adding a port for the old version, or adding that code back in as a patch on the port.
If FreeBSD's route command would let us flush based on -proto1/RTF_PROTO1 then we could clear out its old routes before restarting, but that also seems harsh.
-
Hi,
Does anyone know if this issue has been fixed? I just noticed that Quagga 0.6.17 is available in the packet manager. Will try myself obviously but just wondering if anyone can confirm.
Thanks.
EDIT: Can confirm that 0.6.17 solves the issue.
-
The killall -9 is gone, yes. https://github.com/pfsense/FreeBSD-ports/pull/265
-
Is anyone else can confirm?
-
Hi All,
I'm having the same issue but when I tried to revert using the following command:pkg add -f http://pkg.freebsd.org/freebsd:10:x86:64/release_3/All/quagga-0.99.24.1_2.txz
The OSPF and ZEBRA service no longer started.
If I ran the following command via SSH, I received this error:
Exec format error
Anyone have an idea of what I may be doing wrong or perhaps a configuration incompatibility that I must remove? I tried uninstalling the packages, rebooting then reinstalling but didn't help. I tried removing all the interfaces from the configuration but services still didn't start.
This is a MAJOR issue for us because we rely on OSPF for redundancy, at the moment, without it working, if a link goes down, we have to manually reboot the pfSense units so that the new routes are written.
I've attached my ospfd.conf and zebra.conf files with some of the IP's and passwords changed.