FRR OSPF a point-to-point interface keeps reseting
I have a multi-site network that uses point-to-point circuits to connect back to our main colo.
Here's a rough drawing...
The Point to Point circuits are mostly Metro E type circuits from Level 3.
Each site has a local Internet connection as well
In the normal routine of operations, internal company traffic should flow over the red P2P circuits and it does quite well with Static routes.
I wanted to create a failover scenario however and send the internal traffic through an OpenVPN Site to Site tunnel (Peer to Peer Shared Key). I set up an Interface for each VPN.
So far so good!
Setting up FRR, following the Hangout from Dec 2017 the VPN tunnels are working great!
However, the dedicated Ethernet circuits are the problem. In order to get them to come up, I had to mark the Interfaces as Point to Point in the OSPF Interfaces tab.
show ip ospf neighbor looks good...
Neighbor ID Pri State Dead Time Address Interface RXmtL RqstL DBsmL 10.10.10.51 1 Full/DROther 39.917s 192.168.254.141 igb2:192.168.254.138 0 0 0 10.10.10.51 1 Full/DROther 39.925s 172.16.101.25 ovpnc2:172.16.101.26 0 0 0
I noticed traffic would fluctuate between the two paths even though links would stay up. Of course I want the primary link to be the Point to Point fiber circuit.
I went into vtysh and added "log-adjacency-changes detail" to the running config
This showed the problem but I'm not sure where to go from here...
It seems the Point to Point circuit (igb2) is losing it's peer every 5 to 8 seconds.
Here's the config on peer.
interface igb2 ip ospf cost 1 ip ospf network point-to-point ! interface ovpnc2 ip ospf cost 50 ! router ospf ospf router-id 192.168.130.2 log-adjacency-changes detail redistribute static route-map DNR passive-interface igb0 network 172.16.101.24/30 area 0.0.0.0 network 192.168.130.0/24 area 0.0.0.0 network 192.168.254.136/29 area 0.0.0.0
I'm stuck. It's something with the Point to Point circuit, but I'm not seeing errors on the interface itself.
How did you go with this Brian? Any luck?
I was able to get it working. In fact, we built it out to cover sites in the US, Europe and the Asia Pacific areas. The failover works quite nicely.
It's been some time since I made the original post, so my memory is a little foggy. I can't remember the exact setting I changed to get it to work. But it can be done.
That client had some internal personnel shake-ups so I don't work on their systems anymore, but the last automated report I saw indicated they were still using it through the pandemic.
DM me and I see what I can do to help.