I have two pfSense 2.2.1 boxes on separate sites that each have an Internet connection. The sites are also connected to each other by two microwave links There multiple networks at each site and some remote sites connected via site-to-site OpenVPN. I have configured Quagga for all the networks, including the OpenVPN links and this is all working well and as expected.
For the two microwave connected sites a I wanted to use the other sites Internet connection in case of a local Internet link failure. To test this I just unplug the WAN link. OSPF reroutes everything as expected (mainly meaning the OpenVPN connections) via the other site except for traffic which uses the default route.
The issue seems to be that even though the WAN link is unplugged and the gateway has been detected as being down, the kernel default route pointing out the unplugged interface remains in the route table. I have selected " Redistribute default route" in the Quagga configuration and this is present in the route table, but the kernel route is always selected ahead of it.
I am probably missing something obvious here, but I am stuck. Can some one point me in the right direction to solve this please.
are you sure its a routing issue and not a NAT issue ?
does a traceroute show you to go out the default WAN instead of the intended WAN?
Please, post a network diagram. Describing complicated setups just utterly fails.
Heper: Yes I am sure it is routing and not NAT.
If I delete the default route using route del then OSPF provided route is used and everything works as expected.
doktornotor: Diagram attached.
Some explanation to go with the diagram:
- The Office pfSense is not part of the Lab other than providing a gateway to the world.
- None of the WAN interfaces are in OSPF.
- The "connecting" interfaces (OpenVPN and "microwave" links) are all in OSPF
- All the LANs (not shown in the diagram) are in OSPF as passive interfaces.
All of this works well. Except for the WAN failure scenario which is only a partial failure.
For example, if the WAN on pfSense #1 is unpluged, all the hosts on all the LANs remain contactable to/from each other and the LANs on pfSense #2 and #3 can still get to the "Internet". However, the LANs on pfSense #1 cannot get to the "Internet" because of the default route still pointing out the WAN interface that is unplugged. If I manually delete the default route on pfSense #1 then everything works as expected with pfSense #1 hosts/LANs using pfSense#2 for "Internet" connectivity via the OSPF provided default route.
The output below is from pfSense #1 with the WAN port unplugged. igb0 is the WAN port and igb5 is one of the "microwave" links to pfSense #2.
Quagga Zebra Routes
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, A - Babel,
> - selected route, * - FIB route
O 0.0.0.0/0 [110/10] via 10.100.2.71, igb5, 00:58:47
K>* 0.0.0.0/0 via 10.21.2.254, igb0
O 10.10.10.2/32 [110/100] is directly connected, ovpns2, 00:59:09
C>* 10.10.10.2/32 is directly connected, ovpns2
O>* 10.10.10.6/32 [110/220] via 10.100.2.71, igb5, 00:58:48
O> 10.10.10.9/32 [110/320] via 10.100.2.71, igb5, 00:00:16
O 10.10.10.10/32 [110/100] is directly connected, ovpns3, 00:59:09
C>* 10.10.10.10/32 is directly connected, ovpns3
O>* 10.10.10.13/32 [110/420] via 10.100.2.71, igb5, 00:00:16
O>* 10.10.10.14/32 [110/220] via 10.100.2.71, igb5, 00:58:48
O 10.10.10.18/32 [110/100] is directly connected, ovpns4, 00:59:09
C>* 10.10.10.18/32 is directly connected, ovpns4
O>* 10.10.10.22/32 [110/220] via 10.100.2.71, igb5, 00:58:48
C>* 10.10.10.24/30 is directly connected, igb7
C>* 10.21.2.0/24 is directly connected, igb0
O 10.100.1.0/24 [110/30] is directly connected, igb4, 00:58:54
C>* 10.100.1.0/24 is directly connected, igb4
O 10.100.2.0/24 [110/20] is directly connected, igb5, 00:58:54
C>* 10.100.2.0/24 is directly connected, igb5
O 10.100.3.0/24 [110/30] is directly connected, igb6, 00:59:09
C>* 10.100.3.0/24 is directly connected, igb6
C>* 127.0.0.0/8 is directly connected, lo0
O 172.17.1.0/24 [110/10] is directly connected, igb1, 00:59:09
C>* 172.17.1.0/24 is directly connected, igb1
O 172.18.2.0/24 [110/10] is directly connected, igb2_vlan150, 00:59:09
C>* 172.18.2.0/24 is directly connected, igb2_vlan150
O 172.18.3.0/24 [110/10] is directly connected, igb2_vlan152, 00:59:09
C>* 172.18.3.0/24 is directly connected, igb2_vlan152
O 172.18.4.0/24 [110/10] is directly connected, igb2_vlan153, 00:59:09
C>* 172.18.4.0/24 is directly connected, igb2_vlan153
O>* 172.18.5.0/24 [110/30] via 10.100.2.71, igb5, 00:58:48
O>* 172.18.6.0/24 [110/30] via 10.100.2.71, igb5, 00:58:48
O>* 172.18.7.0/24 [110/30] via 10.100.2.71, igb5, 00:58:48
O 172.18.14.0/24 [110/10] is directly connected, igb2_vlan160, 00:59:09
C>* 172.18.14.0/24 is directly connected, igb2_vlan160
O>* 172.18.16.0/24 [110/30] via 10.100.2.71, igb5, 00:58:48
C>* 172.18.17.0/24 is directly connected, igb2_vlan112
K>* 172.18.21.0/24 via 172.18.21.2, ovpns1
O 172.18.21.2/32 [110/10] is directly connected, ovpns1, 00:59:09
C>* 172.18.21.2/32 is directly connected, ovpns1
O>* 172.18.23.0/24 [110/230] via 10.100.2.71, igb5, 00:00:16
O 192.168.2.0/24 [110/10] is directly connected, igb3, 00:59:09
C>* 192.168.2.0/24 is directly connected, igb3
O>* 192.168.3.0/24 [110/30] via 10.100.2.71, igb5, 00:58:48
O>* 192.168.4.0/24 [110/30] via 10.100.2.71, igb5, 00:58:48
All this detail aside, I think that the question boils down to why is there a default route pointing out an interface that is down?
Is this just the way it has to be on pfSense and/or is there something fundamental that I am missing?
What if you remove all the default gateways from the interfaces and rely only on the OSPF routes?
did you try creating a failover group with member wan+microwaves ?
Yes I had thought of that, but was unsure of the side effects.
Each of these boxes only has one gateway defined, which is marked in the GUI as a default gateway. The GUI will not let you change this gateway to not being a default. Well actually it lets you deselect the "Default Gateway" option and then silently does not save it. Because of this I figured that there was maybe something fundamental to the way that pfSense was strung together that requires a default gateway configured in this way and had not taken this thought any further.
I haven't tried deleting the gateway altogether though. This would mean that there were no gateways defined on these machines. I would then need to figure out how to inject the default routes into OSPF correctly.
I will put this on the list of things to try.
Thanks for the thought
Yes thought of that, tried it and dismissed it.
But, you mentioning it just made me think about what I actually configured (and did not configure) and I am now thinking that this might be a workable approach.
I will revisit this today.
Thanks for triggering the rethink :)
Just to close this thread out.
I tried the the failover goup stuff again and ran into the issues with services on the pfSense box (Squid and unbound) continuing to use the default route no matter what I did. I see that there are many threads raising the same issue.
While trying to understand/resolve that I came across the "Enable default gateway switching" option. Which I had previously been unaware.
I now have this option enabled, two defined gateways (WAN and one of the microwave links), no gateway groups and no policy routing. This achieves what I wanted to accomplish very simply.