Quagga OSPF route gets deleted periodically



  • I just upgraded one of our remote office's pfsense router to the latest 2.1

    Now I'm having a problem with our OSPF routes.

    In the main office's Quagga OSPF, the route to the remote office periodically disappears for a few seconds and then reappears.

    In the "Quagga Zebra Routes" section, I can see that the route to this specific office has a very low timer (the one on the rightmost of the route entry) because it keeps on getting reset (deleted then added). The route lasts an average of 15 minutes only. This didn't happen before.

    I added logging to Quagga and get a lot of this:

    2014/10/04 09:15:37 OSPF: nsm_change_state(192.168.2.0, Exchange -> Full): scheduling new router-LSA origination                                                                                                                               
    2014/10/04 09:15:41 OSPF: nsm_change_state(192.168.2.0, Full -> Init): scheduling new router-LSA origination                                                                                                                                   
    2014/10/04 09:15:41 OSPF: nsm_change_state(192.168.2.0, Full -> Init): scheduling new router-LSA origination                                                                                                                                   
    2014/10/04 09:15:47 OSPF: Packet[DD]: Neighbor 192.168.2.0 Negotiation done (Slave).                                                                                                                                                           
    2014/10/04 09:15:47 OSPF: nsm_change_state(192.168.2.0, Exchange -> Full): scheduling new router-LSA origination                                                                                                                               
    2014/10/04 09:15:47 OSPF: Packet[DD]: Neighbor 192.168.2.0 Negotiation done (Slave).                                                                                                                                                           
    2014/10/04 09:15:47 OSPF: nsm_change_state(192.168.2.0, Exchange -> Full): scheduling new router-LSA origination
    

    What is causing the state change and route reset? This is over an OpenVPN link and the VPN connection isn't disconnected (I can see the long connection time).



  • Is there something causing packages to be reloaded frequently? You'll see that in the system log. How much time goes by between the repeating instances of those logs you're seeing? It seems like something is causing a restart of the OSPF service, but not sure what that would be. I can't think of any circumstance offhand like that where it wouldn't also restart the VPNs, and that apparently isn't happening.



  • @cmb:

    Is there something causing packages to be reloaded frequently? You'll see that in the system log. How much time goes by between the repeating instances of those logs you're seeing? It seems like something is causing a restart of the OSPF service, but not sure what that would be. I can't think of any circumstance offhand like that where it wouldn't also restart the VPNs, and that apparently isn't happening.

    I don't think the package is being reloaded because I have 3 remote offices and only this one route gets reset periodically.

    Also when I check the OSPF on this remote office, the route to the main office isn't getting reset. So it's only the route from main office to this specific remote office that is getting reset.



  • @cmb:

    Is there something causing packages to be reloaded frequently? You'll see that in the system log. How much time goes by between the repeating instances of those logs you're seeing? It seems like something is causing a restart of the OSPF service, but not sure what that would be. I can't think of any circumstance offhand like that where it wouldn't also restart the VPNs, and that apparently isn't happening.

    Okay, after more investigation, you are right. Services are getting restarted, here are the logs:

    Oct 18 08:53:16 	php: rc.newwanip: pfSense package system has detected an ip change 192.168.102.2 -> 192.168.102.2 ... Restarting packages.
    Oct 18 08:53:16 	check_reload_status: Starting packages
    Oct 18 08:53:18 	php: rc.start_packages: Restarting/Starting all packages.
    

    That part gets repeated once in a while. I'm not sure what's the cause of pfsense detecting IP change.

    When that happens though apinger detects that the VPN link is down. But it's not really down. When I restart the apinger service, the gateway status goes back up.

    Oct 18 08:53:21 	php: rc.filter_configure_sync: MONITOR: VPN_DSL is down, removing from routing group FailoverINT
    Oct 18 08:53:26 	check_reload_status: updating dyndns WAN
    Oct 18 08:53:26 	check_reload_status: Restarting ipsec tunnels
    Oct 18 08:53:26 	check_reload_status: Restarting OpenVPN tunnels/interfaces
    Oct 18 08:53:28 	php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN.
    Oct 18 08:53:28 	php: rc.openvpn: OpenVPN: Resync client1 CEB-GSC VPN
    Oct 18 08:53:28 	php: rc.openvpn: MONITOR: VPN_DSL is down, removing from routing group FailoverINT
    Oct 18 08:53:28 	php: rc.openvpn: MONITOR: VPN_DSL is down, removing from routing group FailoverINT
    Oct 18 08:53:28 	php: rc.dyndns.update: MONITOR: VPN_DSL is down, removing from routing group FailoverINT
    Oct 18 08:53:28 	kernel: ovpnc1: link state changed to DOWN
    

    For now, I just disable monitoring on that gatweway so it's always detected as up.



  • This "restart packages" after an "NEW IP" change is very annoying…

    Nov 20 18:55:49	php: rc.newwanip: pfSense package system has detected an ip change -> 10.10.99.1 ... Restarting packages.
    Nov 20 18:55:49	php: rc.newwanip: rc.newwanip: on (IP address: 10.10.99.1) (interface: []) (real interface: ovpns6).
    Nov 20 18:55:49	php: rc.newwanip: rc.newwanip: Informational is starting ovpns6.
    

    So here is the scenario:

    I have a perfectly good WAN1 and a mediocre WAN2 … so what happens ? openVPN connection on WAN2 goes down ( since its a crap backup connection ) and then the OSPF Routes get deleted because of this "NEW IP DETECTED" and my perfectly good WAN1 link to the vpn server on the other side breaks ...

    So ... basically I get better service with only a single WAN line versus 2 wan lines because my WAN2 keeps dropping the connection on WAN1 due to package restarts ... Can't we add some kind of advanced option NOT to restart OSPF after wan change or something ? and just let OSPF figure out itself that there is new WAN ip ... I don't know about the programming side but this is not reliable :(

    Just to add... apparently an openVPN connection is considered a WAN interface also and as soon as an openVPN connection gets connected and that connection is part of OSPF, it will restart the packages on me :(



  • @reqlez:

    This "restart packages" after an "NEW IP" change is very annoying…

    Nov 20 18:55:49	php: rc.newwanip: pfSense package system has detected an ip change -> 10.10.99.1 ... Restarting packages.
    Nov 20 18:55:49	php: rc.newwanip: rc.newwanip: on (IP address: 10.10.99.1) (interface: []) (real interface: ovpns6).
    Nov 20 18:55:49	php: rc.newwanip: rc.newwanip: Informational is starting ovpns6.
    

    So here is the scenario:

    I have a perfectly good WAN1 and a mediocre WAN2 … so what happens ? openVPN connection on WAN2 goes down ( since its a crap backup connection ) and then the OSPF Routes get deleted because of this "NEW IP DETECTED" and my perfectly good WAN1 link to the vpn server on the other side breaks ...

    So ... basically I get better service with only a single WAN line versus 2 wan lines because my WAN2 keeps dropping the connection on WAN1 due to package restarts ... Can't we add some kind of advanced option NOT to restart OSPF after wan change or something ? and just let OSPF figure out itself that there is new WAN ip ... I don't know about the programming side but this is not reliable :(

    Exactly. That's why I disabled gateway monitoring on my WAN2 so it doesn't get detected as down. But it doesn't even help because when a new IP is detected, it restarts all packages anyway.

    I also thought it was related to the "disable state killing upon gateway failure" option under System>Advanced as described in: https://forum.pfsense.org/index.php?topic=63052.0

    But that didn't help either. Packages are still being restarted.

    Hope there's a better solution for this.



  • I found so many post regarding this its not even funny.

    Not all of them are related to OSPF specifically … but some of them say edit the code and take out the line that says "restart packages" is the only solution...

    https://forum.pfsense.org/index.php?topic=80262.0
    https://forum.pfsense.org/index.php?topic=63052.0

    "As per the above mentioned thread I have commented out the call to restart_packages() in rc.newwanip to see if that resolves my issue without introducing any other issues."

    I think the way to go here is a a "packages list" and have checkboxes by them that say "restart this package when IP changes or connection goes down". Basically only select the packages that NEED to be restarted on WAN change ( and because i'm not the developer i don't even know what packages need to be restarted ).

    I'm assuming that OSPF checks periodically  for new connections/IPs and doesn't need to be restarted ... But if you make a configuration change to OSPF and it has to be restarted because of that, I understand ... but for some IP change ? that sucks ...  Imagine if you have an openVPN / OSPF network of 50 routers ... and some routers have crappy WAN2 lines ... you will have an 100% "restart" uptime and will never get any service at all.

    Can a developer comment on this from a "programming" / OSPF perspective ?



  • @reqlez:

    I found so many post regarding this its not even funny.

    Not all of them are related to OSPF specifically … but some of them say edit the code and take out the line that says "restart packages" is the only solution...

    https://forum.pfsense.org/index.php?topic=80262.0
    https://forum.pfsense.org/index.php?topic=63052.0

    "As per the above mentioned thread I have commented out the call to restart_packages() in rc.newwanip to see if that resolves my issue without introducing any other issues."

    I think the way to go here is a a "packages list" and have checkboxes by them that say "restart this package when IP changes or connection goes down". Basically only select the packages that NEED to be restarted on WAN change ( and because i'm not the developer i don't even know what packages need to be restarted ).

    I'm assuming that OSPF checks periodically  for new connections/IPs and doesn't need to be restarted ... But if you make a configuration change to OSPF and it has to be restarted because of that, I understand ... but for some IP change ? that sucks ...  Imagine if you have an openVPN / OSPF network of 50 routers ... and some routers have crappy WAN2 lines ... you will have an 100% "restart" uptime and will never get any service at all.

    Can a developer comment on this from a "programming" / OSPF perspective ?

    So, I investigated rc.newwanip further and compared the old behavior with the new one.

    In the old one:

    
    if (is_ipaddr($oldip) && $curwanip == $oldip) {
            // Still need to sync VPNs on PPPoE and such, as even with the same IP the VPN software is unhappy with the IP disappearing.
            if (in_array($config['interfaces'][$interface]['ipaddr'], array('pppoe', 'pptp', 'ppp'))) {
                    /* reconfigure IPsec tunnels */
                    vpn_ipsec_force_reload();
    
                    /* start OpenVPN server & clients */
                    openvpn_resync_all($interface);
            }
            exit;
    }
    
    ...
    
    restart_packages();
    
    

    This means, if there is no IP change it won't restart packages. But if it's a VPN interface, it resyncs the VPN connection.

    In the new one:

    
    /*
     * We need to force sync VPNs on such even when the IP is the same for dynamic interfaces.
     * Even with the same IP the VPN software is unhappy with the IP disappearing, and we
     * could be failing back in which case we need to switch IPs back anyhow.
     */
    if (!is_ipaddr($oldip) || $curwanip != $oldip || !is_ipaddrv4($config['interfaces'][$interface]['ipaddr'])) {
            /* reconfigure static routes (kernel may have deleted them) */
            system_routing_configure($interface);
    
            /* reconfigure our gateway monitor */
            setup_gateways_monitor();
    
            if (is_ipaddr($curwanip))
                    @file_put_contents("{$g['vardb_path']}/{$interface}_cacheip", $curwanip);
    
            /* perform RFC 2136 DNS update */
            services_dnsupdate_process($interface);
    
            /* signal dyndns update */
            services_dyndns_configure($interface);
    
            /* reconfigure IPsec tunnels */
            vpn_ipsec_force_reload($interface);
    
            /* start OpenVPN server & clients */
            if (substr($interface_real, 0, 4) != "ovpn")
                    openvpn_resync_all($interface);
    
            /* reload graphing functions */
            enable_rrd_graphing();
    
            /* reload igmpproxy */
            services_igmpproxy_configure();
    
            /* restart snmp */
            services_snmpd_configure();
    
            restart_packages();
    }
    
    

    Similarly, this doesn't restart if there is no IP change. But if it's a dynamic interface, it restarts everything.

    That comment they put before the code describes why they made it this way, but i don't think a full package restart is required.



  • okay … so how about this:

    Okay ... restart packages if they really need to be restarted for OSPF to recognize a "NEW" link, but somehow make it so that the routes don't get erased ( remember routes before OSPF restarted option ? ) and the openVPN connections don't get reset unless they need to be reset or something.



  • I wonder if this is also somehow related: https://forum.pfsense.org/index.php/topic,39995.0.html

    a "workaround" where you create manual routes that OSPF cannot delete ?  …

    I mean ... there has to be a way to just save the routes before OSPF restarts or maybe have OSPF "review" the saved routes to see if any have to go when it syncs up .



  • No OSPF developers out there ? Just wanted to know how realistic / how much time is required to fix this ( if its even fixable )



  • okay since nobody commented on a solution, what if i comment out the restartpackages();  is that safe to do ? why would the packages need to be restarted if all i'm using is OSPF openVPN and no other packages ?



  • You can try it, the problem is the service must be restarted after the tunnel comes up before it'll bind on that interface and function correctly, which is why it does what it does. There are definitely some circumstances that will break if you take that out. It could probably be done more gracefully with a good deal of work and widespread testing of a wide range of potential circumstances.



  • thanks for the reply.

    Honestly, that issue alone is probably the show stopper in cases where people have a crappy connection as a second wan.

    I will try it and report.

    But what you really are saying is … It's an issue with how the OSPF package works ? If OSPF package would automagically detect that there are new tunnel interfaces every so often it wouldn't have to be restarted ?


  • Rebel Alliance Developer Netgate

    No guarantees on the results, but you could try this patch with the System Patches package:

    http://files.pfsense.org/jimp/patches/skip_restart_for_routing_packages.patch



  • thanks for the patch!

    I will test.



  • @jimp:

    No guarantees on the results, but you could try this patch with the System Patches package:

    http://files.pfsense.org/jimp/patches/skip_restart_for_routing_packages.patch

    Hi

    Will this patch work with 2.2-RELEASE ?


Log in to reply