Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Major issue with QUAGGA-OSPF and VLANs (pfsense 2.3.0)

    Scheduled Pinned Locked Moved Routing and Multi WAN
    81 Posts 23 Posters 39.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S Offline
      Spydre13
      last edited by

      @reqlez:

      So apparently -9 is a really nasty way of stopping Quagga, as per Martin from Quagga, and he thinks this is not letting it flush routing tables before exit. Maybe there is new code in new version of Quagga that takes a bit more time to flush those routes ? and maybe that is why it was not an issue in 0.99 version but it is with 1.0 ?

      I see Martin's reply to you on Oct. 10, but I don't see anything after that.  Are you emailing him off-list?

      I was looking through the Quagga code last night, and found something that I'm wondering whether or not could be the problem.  Quagga (zebra daemon) puts routes into the kernel with flag "1" (RTF_PROTO1, see netstat man page).  When zebra starts up it's supposed to ignore (filter out) any kernel routes with flag "1" because it should assume it put those there to begin with.  I think before Quagga version 1 this was working, and in version >= 1 it pulls in those kernel routes into the zebra RIB.

      If I reboot a firewall and go to OSPF -> Status -> Zebra routes, I see a bunch of OSPF routes but barely any K (kernel) routes.  If I make any change on the Global Settings or Interface Settings tab quagga restarts, and then when looking at the zebra routes it is filled with kernel routes (one for each OSPF route).

      Can you ask Martin to look at this:
      Commit: https://github.com/Quagga/quagga/commit/0d0686f98e64017415071e590bde262f0ab5a4c9
      File: zebra/zebra_rib.c
      Function: rib_sweep_table

      This function is commented out starting in version 1, but it was used in version 0.99.24.  There is a block of code in it:

      
      if (rib->type == ZEBRA_ROUTE_KERNEL &&
        CHECK_FLAG (rib->flags, ZEBRA_FLAG_SELFROUTE))
      {
          ret = rib_uninstall_kernel (rn, rib);
          if (! ret)
              rib_delnode (rn, rib);
      }
      
      

      The rib_weed_tables function that is still being used doesn't seem to do this same thing, from what I can tell.  This URL shows them side-by-side: https://fossies.org/diffs/quagga/0.99.24.1_vs_1.0.20160315/zebra/zebra_rib.c-diff.html

      If you can point me to the thread where you are discussing this with Martin, I can pass this along to him if you prefer.

      1 Reply Last reply Reply Quote 0
      • R Offline
        reqlez
        last edited by

        Sorry I'm a mailing list noob and I just realized when you told me that this stuff is not going via lists … I'll post this and include the list this time, yes you are correct I been just emailing him

        @Spydre13:

        @reqlez:

        So apparently -9 is a really nasty way of stopping Quagga, as per Martin from Quagga, and he thinks this is not letting it flush routing tables before exit. Maybe there is new code in new version of Quagga that takes a bit more time to flush those routes ? and maybe that is why it was not an issue in 0.99 version but it is with 1.0 ?

        I see Martin's reply to you on Oct. 10, but I don't see anything after that.  Are you emailing him off-list?

        I was looking through the Quagga code last night, and found something that I'm wondering whether or not could be the problem.  Quagga (zebra daemon) puts routes into the kernel with flag "1" (RTF_PROTO1, see netstat man page).  When zebra starts up it's supposed to ignore (filter out) any kernel routes with flag "1" because it should assume it put those there to begin with.  I think before Quagga version 1 this was working, and in version >= 1 it pulls in those kernel routes into the zebra RIB.

        If I reboot a firewall and go to OSPF -> Status -> Zebra routes, I see a bunch of OSPF routes but barely any K (kernel) routes.  If I make any change on the Global Settings or Interface Settings tab quagga restarts, and then when looking at the zebra routes it is filled with kernel routes (one for each OSPF route).

        Can you ask Martin to look at this:
        Commit: https://github.com/Quagga/quagga/commit/0d0686f98e64017415071e590bde262f0ab5a4c9
        File: zebra/zebra_rib.c
        Function: rib_sweep_table

        This function is commented out starting in version 1, but it was used in version 0.99.24.  There is a block of code in it:

        	      
        if (rib->type == ZEBRA_ROUTE_KERNEL &&
          CHECK_FLAG (rib->flags, ZEBRA_FLAG_SELFROUTE))
        {
            ret = rib_uninstall_kernel (rn, rib);
            if (! ret)
                rib_delnode (rn, rib);
        }
        
        

        The rib_weed_tables function that is still being used doesn't seem to do this same thing, from what I can tell.  This URL shows them side-by-side: https://fossies.org/diffs/quagga/0.99.24.1_vs_1.0.20160315/zebra/zebra_rib.c-diff.html

        If you can point me to the thread where you are discussing this with Martin, I can pass this along to him if you prefer.

        1 Reply Last reply Reply Quote 0
        • R Offline
          reqlez
          last edited by

          @Spydre13

          I just posted your comment on the same list:  https://lists.quagga.net/pipermail/quagga-users/2016-October/014476.html

          This time i'm being less of a noob and actually e-mailing list.  We can continue this discussion there if you subscribe to it.

          1 Reply Last reply Reply Quote 0
          • B Offline
            bgibson
            last edited by

            Good morning,
            Has there been any update regarding this issue? Is there another forum or notes I can follow to see when this is resolved? This is causing a huge problem within our company and if not fixed soon - we will have to change routing. I'm on the latest version of pfsense.

            1 Reply Last reply Reply Quote 0
            • E Offline
              echu2016
              last edited by

              Hi bgibson,
              Meanwhile I suggest you to take mi heper's and my recommendation:

              https://forum.pfsense.org/index.php?topic=111108.msg620733#msg620733
              https://forum.pfsense.org/index.php?topic=111108.msg654483#msg654483

              1 Reply Last reply Reply Quote 0
              • B Offline
                bgibson
                last edited by

                Thanks - I will look into the links.

                1 Reply Last reply Reply Quote 0
                • T Offline
                  Trey
                  last edited by

                  Hi,

                  the new version 1.1 of quagga was released a couple of days ago:

                  http://mirror.yannic-bonenberger.com/nongnu/quagga/quagga-1.1.0.changelog.txt

                  As the problems startet with version 1.0 and having a look at the chengelog, I hope quagga is running smooth again after the update.

                  Would be greate to see an update of the packeage to quagga 1.1.

                  Thanks!

                  1 Reply Last reply Reply Quote 0
                  • R Offline
                    reqlez
                    last edited by

                    The problem is that because I still have not heard a reply from Martin after my last post I don't think anybody is working on a solution, and the guys from pfsense have not commented about their use of -9 to restart packages either and as to why they are restarted in the first place. So just thinking that a new release fixed anything… it probably didn't.

                    @Trey:

                    Hi,

                    the new version 1.1 of quagga was released a couple of days ago:

                    http://mirror.yannic-bonenberger.com/nongnu/quagga/quagga-1.1.0.changelog.txt

                    As the problems startet with version 1.0 and having a look at the chengelog, I hope quagga is running smooth again after the update.

                    Would be greate to see an update of the packeage to quagga 1.1.

                    Thanks!

                    1 Reply Last reply Reply Quote 0
                    • R Offline
                      reqlez
                      last edited by

                      Hmmm… maybe I could be wrong... I do see something here...  :

                      commit 7e73eb740f3c52a5b7c0ae9c2cd33b486d885552
                      Author: Timo Teräs <timo.teras@iki.fi>Date:   Sat Apr 9 17:22:32 2016 +0300
                      
                          zebra: handle multihop nexthop changes properly
                      
                          The rib entries are normally added and deleted when they are
                          changed. However, they are modified in placae when the nexthop
                          reachability changes. This fixes to:
                           - properly detect nexthop changes from nexthop_active_update()
                             calls from rib_process()
                           - rib_update_kernel() to not reset FIB flags when a RIB entry
                             is being modifed (old and new RIB are same)
                           - improves the "show ip route <prefix>" output to display
                             both ACTIVE and FIB flags for each nexthop
                      
                          Fixes: 325823a5 "zebra: support FIB override routes"
                          Signed-off-by: Timo Teräs <timo.teras@iki.fi>Reported-By: Igor Ryzhov <iryzhov@nfware.com>Tested-by: NetDEF CI System <cisystem@netdef.org></cisystem@netdef.org></iryzhov@nfware.com></timo.teras@iki.fi></prefix></timo.teras@iki.fi> 
                      

                      not sure if something here would help "rib_update_kernel() to not reset FIB flags when a RIB entry
                            is being modifed (old and new RIB are same)"  But maybe i'm not understanding the problem properly.

                      @Trey:

                      Hi,

                      the new version 1.1 of quagga was released a couple of days ago:

                      http://mirror.yannic-bonenberger.com/nongnu/quagga/quagga-1.1.0.changelog.txt

                      As the problems startet with version 1.0 and having a look at the chengelog, I hope quagga is running smooth again after the update.

                      Would be greate to see an update of the packeage to quagga 1.1.

                      Thanks!

                      1 Reply Last reply Reply Quote 0
                      • S Offline
                        Spydre13
                        last edited by

                        @reqlez:

                        not sure if something here would help "rib_update_kernel() to not reset FIB flags when a RIB entry
                              is being modifed (old and new RIB are same)"  But maybe i'm not understanding the problem properly.

                        I looked at the changelog too, and didn't see anything that would fix this.  The main problem is that when Quagga restarts, it doesn't recognize the routes that it previously put in there, so it pulls them in as "kernel" routes and they will always take precedence.  That's why it works fine until Quagga is restarted (which is basically kill & start, there is no graceful restart in Quagga).  Since the rib_sweep_table() function isn't used anymore, when it starts up it doesn't remove routes from the list of kernel routes that it previously put there (which it flags as RTF_PROTO1, or "1" in netstat -r).  I don't see how they aren't having more issues with this, unless the common scenario is that Quagga never gets restarted unless the whole OS is restarted.

                        I don't see why kill -9 matters here, because it worked fine before v1.0, and there is no graceful restart capability in Quagga.  Ideally pfSense could use the Quagga VTY to make changes live without restarting, and then write changes to the config files for the next time it starts up, but I doubt anyone wants to take on a project like that.

                        If you want more details let me know, but it would probably make more sense to discuss on the Quagga list instead of here.

                        1 Reply Last reply Reply Quote 0
                        • jimpJ Offline
                          jimp Rebel Alliance Developer Netgate
                          last edited by

                          @Spydre13:

                          @reqlez:

                          not sure if something here would help "rib_update_kernel() to not reset FIB flags when a RIB entry
                                is being modifed (old and new RIB are same)"  But maybe i'm not understanding the problem properly.

                          I looked at the changelog too, and didn't see anything that would fix this.  The main problem is that when Quagga restarts, it doesn't recognize the routes that it previously put in there, so it pulls them in as "kernel" routes and they will always take precedence.  That's why it works fine until Quagga is restarted (which is basically kill & start, there is no graceful restart in Quagga).  Since the rib_sweep_table() function isn't used anymore, when it starts up it doesn't remove routes from the list of kernel routes that it previously put there (which it flags as RTF_PROTO1, or "1" in netstat -r).  I don't see how they aren't having more issues with this, unless the common scenario is that Quagga never gets restarted unless the whole OS is restarted.

                          I don't see why kill -9 matters here, because it worked fine before v1.0, and there is no graceful restart capability in Quagga.  Ideally pfSense could use the Quagga VTY to make changes live without restarting, and then write changes to the config files for the next time it starts up, but I doubt anyone wants to take on a project like that.

                          If you want more details let me know, but it would probably make more sense to discuss on the Quagga list instead of here.

                          That sounds like the issue. Preventing it from restarting is a hackish workaround no matter what signal is used. It will get restarted at some point and failing to recover gracefully is a regression in quagga's behavior in 1.x.

                          It needs to recognize the flags it sets on routes in the table, and it isn't. Hopefully someone at Quagga can pick up and run with that on their list.

                          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                          Need help fast? Netgate Global Support!

                          Do not Chat/PM for help!

                          1 Reply Last reply Reply Quote 0
                          • R Offline
                            reqlez
                            last edited by

                            Okay. So basically I will forward your comments again and see if anybody replies… I can't believe this bug has been out for like 8 months already and still no fix and nobody too involved about it ... unless it specifically targets pfsense or freebsd some how, but based on what you guys are saying this should affect all platforms, no ?

                            @jimp:

                            @Spydre13:

                            @reqlez:

                            not sure if something here would help "rib_update_kernel() to not reset FIB flags when a RIB entry
                                  is being modifed (old and new RIB are same)"  But maybe i'm not understanding the problem properly.

                            I looked at the changelog too, and didn't see anything that would fix this.  The main problem is that when Quagga restarts, it doesn't recognize the routes that it previously put in there, so it pulls them in as "kernel" routes and they will always take precedence.  That's why it works fine until Quagga is restarted (which is basically kill & start, there is no graceful restart in Quagga).  Since the rib_sweep_table() function isn't used anymore, when it starts up it doesn't remove routes from the list of kernel routes that it previously put there (which it flags as RTF_PROTO1, or "1" in netstat -r).  I don't see how they aren't having more issues with this, unless the common scenario is that Quagga never gets restarted unless the whole OS is restarted.

                            I don't see why kill -9 matters here, because it worked fine before v1.0, and there is no graceful restart capability in Quagga.  Ideally pfSense could use the Quagga VTY to make changes live without restarting, and then write changes to the config files for the next time it starts up, but I doubt anyone wants to take on a project like that.

                            If you want more details let me know, but it would probably make more sense to discuss on the Quagga list instead of here.

                            That sounds like the issue. Preventing it from restarting is a hackish workaround no matter what signal is used. It will get restarted at some point and failing to recover gracefully is a regression in quagga's behavior in 1.x.

                            It needs to recognize the flags it sets on routes in the table, and it isn't. Hopefully someone at Quagga can pick up and run with that on their list.

                            1 Reply Last reply Reply Quote 0
                            • S Offline
                              Spydre13
                              last edited by

                              @reqlez:

                              Okay. So basically I will forward your comments again and see if anybody replies… I can't believe this bug has been out for like 8 months already and still no fix and nobody too involved about it ... unless it specifically targets pfsense or freebsd some how, but based on what you guys are saying this should affect all platforms, no ?

                              I posted a new topic on the quagga-dev list as well, and haven't seen any responses to that.  I don't see how they wouldn't be having this issue on any platform, as soon as quagga/zebra restarts.

                              1 Reply Last reply Reply Quote 0
                              • R Offline
                                reqlez
                                last edited by

                                @Spydre13:

                                @reqlez:

                                Okay. So basically I will forward your comments again and see if anybody replies… I can't believe this bug has been out for like 8 months already and still no fix and nobody too involved about it ... unless it specifically targets pfsense or freebsd some how, but based on what you guys are saying this should affect all platforms, no ?

                                I posted a new topic on the quagga-dev list as well, and haven't seen any responses to that.  I don't see how they wouldn't be having this issue on any platform, as soon as quagga/zebra restarts.

                                Does this sound familiar ?  https://lists.quagga.net/pipermail/quagga-dev/2016-February/014777.html

                                1 Reply Last reply Reply Quote 0
                                • S Offline
                                  Spydre13
                                  last edited by

                                  @reqlez:

                                  Does this sound familiar ?  https://lists.quagga.net/pipermail/quagga-dev/2016-February/014777.html

                                  I saw that thread a while ago, I'm not sure it's exactly the same issue, but it might help our situation.  It seems like they are talking about the zebra (OSPF/BGP) routes not being removed from the kernel when the process is stopped/killed.  I'm not sure what their expected behavior is, but that might not be ideal.  Ideally (in my opinion) it would be good to remove the routes when stopping the process, but not remove the routes when restarting the process.  Otherwise while restarting zebra the routes would be removed for a short time, which I don't think would be ideal, although it would probably be minor compared to the issue we are having.  If you follow that thread, it's unclear whether they ended up doing anything with it or not, because their patches kept failing the CI tests.

                                  Assuming the routes zebra inserts are not removed when stopping/restarting, the code I was referring to previously was supposed to prevent kernel routes from being inserted into zebra as "kernel" routes when they originally were put there by zebra.  I'm amazed that there's no response to this on their lists.  I'm not sure if it's because so few people are working on it or what, it seems like it would be a big deal unless there's some other code handling this that works on other OS but not on FreeBSD.  I'm not sure what the best option is going forward, if they are unresponsive it seems it would be best for pfSense to lock in the last version <1.0 (short-sighted fix) or fork it and correct the issue just for pfSense, but then it wouldn't continue to be updated.  It seems like there are plenty of people who want to use OSPF but not many who are working on Quagga or other OSPF projects.  I would be willing to contribute towards paying pfSense, Martin Winter (www.opensourcerouting.org), or someone else to fix this.

                                  @jimp - can you give your opinion on this?  Would it be an option to use a fork of Quagga or specify to use the version before 1.0?  OSPF really is broken right now in pfSense as soon as the service restarts (which is triggered by almost any change, and other things).

                                  1 Reply Last reply Reply Quote 0
                                  • R Offline
                                    reqlez
                                    last edited by

                                    Hi …. I understand it's a hack workaround, but do you think a button can be added in Pfsense under advanced options somewhere to not reboot network packages... just like that script you made ?  The issue right now is while this may or may not be a pfsense issue, users of pfsense cannot use the product... and while I agree this is not a long term fix, we need something ... until maybe a year or two from now or who the f*** knows when somebody will look at this bug  and will fix it and we then just have an extra option in pfsense that we will remember, look back and think "Man ... I hope there is not another stupid a*** bug in OSPF again, but if there is... we have a magic button"

                                    I'm like ready to learn development just to fix this garbage bug ... I have coded in C++ back in the days, maybe I can pick it up fast.      :( :( :( :( :(

                                    @jimp:

                                    @Spydre13:

                                    @reqlez:

                                    not sure if something here would help "rib_update_kernel() to not reset FIB flags when a RIB entry
                                          is being modifed (old and new RIB are same)"  But maybe i'm not understanding the problem properly.

                                    I looked at the changelog too, and didn't see anything that would fix this.  The main problem is that when Quagga restarts, it doesn't recognize the routes that it previously put in there, so it pulls them in as "kernel" routes and they will always take precedence.  That's why it works fine until Quagga is restarted (which is basically kill & start, there is no graceful restart in Quagga).  Since the rib_sweep_table() function isn't used anymore, when it starts up it doesn't remove routes from the list of kernel routes that it previously put there (which it flags as RTF_PROTO1, or "1" in netstat -r).  I don't see how they aren't having more issues with this, unless the common scenario is that Quagga never gets restarted unless the whole OS is restarted.

                                    I don't see why kill -9 matters here, because it worked fine before v1.0, and there is no graceful restart capability in Quagga.  Ideally pfSense could use the Quagga VTY to make changes live without restarting, and then write changes to the config files for the next time it starts up, but I doubt anyone wants to take on a project like that.

                                    If you want more details let me know, but it would probably make more sense to discuss on the Quagga list instead of here.

                                    That sounds like the issue. Preventing it from restarting is a hackish workaround no matter what signal is used. It will get restarted at some point and failing to recover gracefully is a regression in quagga's behavior in 1.x.

                                    It needs to recognize the flags it sets on routes in the table, and it isn't. Hopefully someone at Quagga can pick up and run with that on their list.

                                    1 Reply Last reply Reply Quote 0
                                    • R Offline
                                      reqlez
                                      last edited by

                                      @Spydre13

                                      Let's get like a fund raiser going, and collect like $10,000 and offer 10K to whoever can fix OSPF bug and integrate quagga VTY support into pfsense lol  ( oh and integrate TCP/DNS instead of just ping support for gateway monitoring because every ISP now drops ICMP on high usage and gateway monitoring sucks without DNS / TCP ports support )  …. i'm willing to pitch in $1000 ... if all 3 conditions are met lol  who else wants to donate here for a good cause ???

                                      @Spydre13:

                                      @reqlez:

                                      Does this sound familiar ?  https://lists.quagga.net/pipermail/quagga-dev/2016-February/014777.html

                                      I saw that thread a while ago, I'm not sure it's exactly the same issue, but it might help our situation.  It seems like they are talking about the zebra (OSPF/BGP) routes not being removed from the kernel when the process is stopped/killed.  I'm not sure what their expected behavior is, but that might not be ideal.  Ideally (in my opinion) it would be good to remove the routes when stopping the process, but not remove the routes when restarting the process.  Otherwise while restarting zebra the routes would be removed for a short time, which I don't think would be ideal, although it would probably be minor compared to the issue we are having.  If you follow that thread, it's unclear whether they ended up doing anything with it or not, because their patches kept failing the CI tests.

                                      Assuming the routes zebra inserts are not removed when stopping/restarting, the code I was referring to previously was supposed to prevent kernel routes from being inserted into zebra as "kernel" routes when they originally were put there by zebra.  I'm amazed that there's no response to this on their lists.  I'm not sure if it's because so few people are working on it or what, it seems like it would be a big deal unless there's some other code handling this that works on other OS but not on FreeBSD.  I'm not sure what the best option is going forward, if they are unresponsive it seems it would be best for pfSense to lock in the last version <1.0 (short-sighted fix) or fork it and correct the issue just for pfSense, but then it wouldn't continue to be updated.  It seems like there are plenty of people who want to use OSPF but not many who are working on Quagga or other OSPF projects.  I would be willing to contribute towards paying pfSense, Martin Winter (www.opensourcerouting.org), or someone else to fix this.

                                      @jimp - can you give your opinion on this?  Would it be an option to use a fork of Quagga or specify to use the version before 1.0?  OSPF really is broken right now in pfSense as soon as the service restarts (which is triggered by almost any change, and other things).

                                      1 Reply Last reply Reply Quote 0
                                      • S Offline
                                        Soyokaze
                                        last edited by

                                        @Moderators
                                        Please, rename this topic to 'Major issue with QUAGGA-OSPF and VLANs or VPNs (pfsense 2.3.0)'
                                        I have this problem on setup with multiple OpenVPN tunnels, but I never checked this thread because topic says only about VLANs.  :'(

                                        Need full pfSense in a cloud? PM for details!

                                        1 Reply Last reply Reply Quote 0
                                        • S Offline
                                          Spydre13
                                          last edited by

                                          @reqlez:

                                          @Spydre13

                                          Let's get like a fund raiser going, and collect like $10,000 and offer 10K to whoever can fix OSPF bug and integrate quagga VTY support into pfsense lol  ( oh and integrate TCP/DNS instead of just ping support for gateway monitoring because every ISP now drops ICMP on high usage and gateway monitoring sucks without DNS / TCP ports support )  …. i'm willing to pitch in $1000 ... if all 3 conditions are met lol  who else wants to donate here for a good cause ???

                                          First you need to find someone willing to fix the problem, otherwise the money doesn't help.  I've already pointed out where the bug is (fairly confident anyways), and could fix it just by reverting the change they made.  However, there's no guarantee that they will accept the fix.  I can't get a response on why that change was made, or what the intention was.  If they're not going to be responsive it seems like pfSense should either revert to the older version or use a fork that corrects this issue.

                                          1 Reply Last reply Reply Quote 0
                                          • R Offline
                                            reqlez
                                            last edited by

                                            yes but … who is going to be developing the fork lol

                                            @Spydre13:

                                            @reqlez:

                                            @Spydre13

                                            Let's get like a fund raiser going, and collect like $10,000 and offer 10K to whoever can fix OSPF bug and integrate quagga VTY support into pfsense lol  ( oh and integrate TCP/DNS instead of just ping support for gateway monitoring because every ISP now drops ICMP on high usage and gateway monitoring sucks without DNS / TCP ports support )  …. i'm willing to pitch in $1000 ... if all 3 conditions are met lol  who else wants to donate here for a good cause ???

                                            First you need to find someone willing to fix the problem, otherwise the money doesn't help.  I've already pointed out where the bug is (fairly confident anyways), and could fix it just by reverting the change they made.  However, there's no guarantee that they will accept the fix.  I can't get a response on why that change was made, or what the intention was.  If they're not going to be responsive it seems like pfSense should either revert to the older version or use a fork that corrects this issue.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.