Default route changing randomly
-
I have a config with two WAN ports, one goes to the Internet with IPV6 and IPV4, the other to a bunch of internal private 10.0.0.0 networks with IPV4. Both are NAT, and both get their IP and gateways via DHCP. The Internet WAN port gateway is set as default, and I have gateway monitoring disabled. There is a static route sending 10.0.0.0/8 traffic to the private WAN.
On a random basis, sometimes once a week, sometimes a couple times a day, the default route switches from the Internet to the internal private WAN network and I lose connectivity to the Internet. Rebooting pfsense will set things back to normal. I can't figure out what triggers the route to change, nor how to prevent it.
I've tried replacement hardware, and get the same result, so it isn't a hardware problem.
I have resorted to a script that monitors the route table every 5 minutes and reboots the device when it sees the default route change. This works, but, ugh.
What am I doing wrong?
-
Not enough details to work with.
Provide a network schematic (include IP subnets)
Check system logs around the time it hops.
-
"the other to a bunch of internal private 10.0.0.0 networks with IPV4. Both are NAT,"
Why would you be natting to internal rfc1918 networks? Post up your gateway section.. To route to other networks, this would not be "wan" connection with nat - this would normally be just a gateway you setup in pfsense and setup routes.
Do you have both gateways you get via dhcp as "default"?
-
"Why would you be natting to internal rfc1918 networks?"
Because I connect to a wireless network that I don't manage that uses rfc1918 IPs. Each wireless node (router) in the network gets configured with a random 10.0.0.0/29 network address during initial setup on each node. The routing for these nodes is managed with OLSR on the wireless network. Pfsense apparently used to have a plugin for OLSR, but doesn't any longer and I cannot add routes for my internal LAN to OLSR. Nodes can come and go without notification or coordination on this network, so I can't reasonably maintain an accurate static route list, so I have a generic static 10.0.0.0/8 route out to that network interface to cover all wireless networks. I'm only allocated a /29 on the wireless network, and I provide services from multiple internal LAN IPs, so I have NAT configured so it only consumes one wireless IP. This is on my OPT1 interface, and the IP and gateway are provided via DHCP from the wireless node.
I'm open to suggestions for better ways to do this, but this is the only way I could see getting it to work with the restrictions I have.
My internal LAN is 10.10.6.0/24. This works fine because the LAN interface's /24 route is more specific than the wireless /8, so things route properly.
The WAN port connects to my ISP, and is a 73.x.x.x/24 which is provided via DHCP from my cable modem.
So to recap:
To internet
73.x.x.1 (gateway)
|
73.x.x.x/24
WAN
+–---------+
| pfsense | LAN--10.10.6.1/24----To internal LAN
+-----------+
OPT1
10.117.100.157/29
|
10.117.100.153 (gateway)
To a couple dozen or so random 10.x.x.x/29 networks routed by OLSR"Do you have both gateways you get via dhcp as "default"?" "Post up your gateway section"
The only gateway that is set default is the WAN (internet) side.This is from my /conf/config.xml file:
<gateways><gateway_item><interface>opt1</interface>
<gateway>dynamic</gateway>
<name>MESH_NMT_DHCP</name>
<weight>1</weight>
<ipprotocol>inet</ipprotocol><monitor_disable></monitor_disable></gateway_item>
<gateway_item><interface>wan</interface>
<gateway>dynamic</gateway>
<name>WAN_DHCP</name>
<weight>1</weight>
<ipprotocol>inet</ipprotocol><monitor_disable><defaultgw><latencyhigh>1500</latencyhigh>
<losshigh>100</losshigh></defaultgw></monitor_disable></gateway_item>
<gateway_item><interface>wan</interface>
<gateway>dynamic</gateway>
<name>WAN_DHCP6</name>
<weight>1</weight>
<ipprotocol>inet6</ipprotocol><monitor_disable><defaultgw></defaultgw></monitor_disable></gateway_item></gateways>
and just for info:
<staticroutes><route><network>10.0.0.0/8</network>
<gateway>MESH_NMT_DHCP</gateway></route></staticroutes>Normally netstat -nr shows this:
Internet:
Destination Gateway Flags Netif Expire
default 73.x.x.1 UGS em0
10.0.0.0/8 10.117.100.153 UGS em2
10.10.6.0/24 link#2 U em1
10.10.6.1 link#2 UHS lo0
10.117.100.152/29 link#3 U em2
10.117.100.153 10.117.100.153 UGHS em2
10.117.100.157 link#3 UHS lo0
73.x.x.0/24 link#1 U em0
73.x.x.x link#1 UHS lo0
75.75.75.75 73.x.x.1 UGHS em0
75.75.76.76 73.x.x.1 UGHS em0
127.0.0.1 link#8 UH lo0
172.16.0.0/12 10.117.100.153 UGS em2When it goes bad I see this:
Internet:
Destination Gateway Flags Netif Expire
default 10.117.100.153 UGS em2
10.0.0.0/8 10.117.100.153 UGS em2
10.10.6.0/24 link#2 U em1
10.10.6.1 link#2 UHS lo0
10.117.100.152/29 link#3 U em2
10.117.100.153 10.117.100.153 UGHS em2
10.117.100.157 link#3 UHS lo0
73.x.x.0/24 link#1 U em0
73.x.x.x link#1 UHS lo0
75.75.75.75 73.x.x.1 UGHS em0
75.75.76.76 73.x.x.1 UGHS em0
127.0.0.1 link#8 UH lo0
172.16.0.0/12 10.117.100.153 UGS em2I've looked through the various logs when the problem happens, and I don't see anything obviously wrong.
I've played with various values and ultimately disabled gateway monitoring to make sure that isn't causing the problem.