Routing on standby pfsense stops working after a while
Having a strange issue here, been fighting it for a while…
The primary always functions normally. However, the backup pfsense, after a period of several hours to a day or so, routing will stop functioning, and the LAN interface will become unreachable across the internal gateway. The LAN interface is up, and can be reached on the local subnet only. A simple filter reload or system restart brings the routing back to life, but this isn't good in a failover scenario, of course.
I have scoured the system logs and am turning up empty handed.
Looking for some tips to troubleshoot this further. I would like to try bouncing just the routing service instead of a filter reload, to see if the routing service itself is the culprit, or just the interface, to see if it is interface-related.
Of course, having to wait up to 24 hours to troubleshoot is a little painful. I am holding out, though, for troubleshooting sake, before I resort to bandaiding the issue with a periodic filter reload task.
Update on this, just happened again.
Tried restarting routed, no effect. Doesn't appear to be routed daemon.
Tried restarting the interfaces, no effect. Doesn't appear to be at the interface.
Reloaded the filter, started working…. (using /etc/rc.reload_all)
Waiting for it to happen again....
This time I issued /etc/rc.reload_interfaces, and routing came back.
Two functions inside of this one.
Next failure I am going to run each of these one at a time to see which one is bringing it back.
revealed that an openvpn P2P tunnel was inserting some routes when it refreshed, and the static routes were getting overwritten. Only affected the secondary.