FRR OSPF a point-to-point interface keeps reseting

bw-linux

I have a multi-site network that uses point-to-point circuits to connect back to our main colo.

Here's a rough drawing...
alt text

The Point to Point circuits are mostly Metro E type circuits from Level 3.
Each site has a local Internet connection as well

In the normal routine of operations, internal company traffic should flow over the red P2P circuits and it does quite well with Static routes.

I wanted to create a failover scenario however and send the internal traffic through an OpenVPN Site to Site tunnel (Peer to Peer Shared Key). I set up an Interface for each VPN.

So far so good!

Setting up FRR, following the Hangout from Dec 2017 the VPN tunnels are working great!

However, the dedicated Ethernet circuits are the problem. In order to get them to come up, I had to mark the Interfaces as Point to Point in the OSPF Interfaces tab.

show ip ospf neighbor looks good...

Neighbor ID     Pri State           Dead Time Address         Interface            RXmtL RqstL DBsmL
10.10.10.51       1 Full/DROther      39.917s 192.168.254.141 igb2:192.168.254.138     0     0     0
10.10.10.51       1 Full/DROther      39.925s 172.16.101.25   ovpnc2:172.16.101.26     0     0     0

I noticed traffic would fluctuate between the two paths even though links would stay up. Of course I want the primary link to be the Point to Point fiber circuit.

I went into vtysh and added "log-adjacency-changes detail" to the running config

This showed the problem but I'm not sure where to go from here...
alt text

It seems the Point to Point circuit (igb2) is losing it's peer every 5 to 8 seconds.

Here's the config on peer.

interface igb2
 ip ospf cost 1
 ip ospf network point-to-point
!
interface ovpnc2
 ip ospf cost 50
!
router ospf
 ospf router-id 192.168.130.2
 log-adjacency-changes detail
 redistribute static route-map DNR
 passive-interface igb0
 network 172.16.101.24/30 area 0.0.0.0
 network 192.168.130.0/24 area 0.0.0.0
 network 192.168.254.136/29 area 0.0.0.0

I'm stuck. It's something with the Point to Point circuit, but I'm not seeing errors on the interface itself.

~Brian

Gcon

How did you go with this Brian? Any luck?

bw-linux

@Gcon, YES!

I was able to get it working. In fact, we built it out to cover sites in the US, Europe and the Asia Pacific areas. The failover works quite nicely.

It's been some time since I made the original post, so my memory is a little foggy. I can't remember the exact setting I changed to get it to work. But it can be done.

That client had some internal personnel shake-ups so I don't work on their systems anymore, but the last automated report I saw indicated they were still using it through the pandemic.

DM me and I see what I can do to help.