Updating to pfSense+ 24.3 breaks routing - kernel routes now gone
-
The update from pfSense+ 23.09.1 to 24.3 immediately broke my network. Note that the update completed successfully.
Upon investigation I saw that kernel routes have gone missing in the FRR 9.1 included with 24.3, but were present in the FRR 9.0.2 included in pfSense+ 23.09.01.
These kernel routes are visible to the system in both old and new versions of pfSense, when viewed with "netstat" (netstat -rn4). Note that for this output, the v23.09.1 and v24.03 outputs are identical, so no need to include a diff.
The reason why my network broke was that there was another route in the FRR "show ip route" - an OSPF-learned default route - which is not shown here as I have since removed it. It was a default route on another router in another part of the network, which had a high OSPF cost, and is used as an alternative path in case the whole firewall dies. It is a low-bandwidth link and should only be used in an emergency. When I upgraded the firewall to pfSense+ 24.3 from 23.09.1, it would then direct all traffic to this other low-bandwidth gateway, instead of sending it straight out its own WAN IP to the ISP.
I have now taken this alternative path's OPSF-learned default route out of the network, so that the firewall can function again.
I suspect that these kernel routes disappearing from FRR is why pfSense preferred some distant OSPF default route over its directly connected default route. I am now relying on the functionality that if FRR doesn't have a route in its routing table, then it defers to the system's routing table, and you can see with the "netstat -rn4" that it does have all the kernel routes.
But relying on this functionality is not robust. It should be fine to have a backup default route in OSPF costed out as to not break anything, but be there to take over in case of an emergency. I shouldn't have to remove that. with the age-old concept of "administrative distance", the firewall should always prefer a directly connected default route, other a high-cost OSPF route.
We really do need these kernel routes back in FRR, so that the FRR can have a proper and complete view of system routing logic. Therefore, I consider this a bug, and not a feature.
-
I have been doing some tests on my GNS3 lab, where I've replicated the production network.
On 23.09.1, no matter whether I advertise the secondary default route or not, the "netstat -rn" only shows one default route out the WAN1 link, and the "show ip route" in FRR only shows the one default route, and there are the 4 kernel routes.
On 24.3, I've had either none, three or all four kernel routes show up, but i have to bounce the WAN links and shake things up a bit to get things to change. There does seem like some kind of race condition going on with the learning and installing of routes.
When I have the secondary remote default route advertised, I end up seeing it in the "show ip route" display on 24.3 but not on 23.09.1 which has the kernel default route instead. Example of the route that shows up on 24.3 is:
O>* 0.0.0.0/0 [110/5] via 10.254.40.2, vmx1.40, weight 1, 00:07:47And in netstat, I see two default routes.
Destination Gateway Flags Nhop# Mtu Netif Expire
default <ISP's WAN IP> UGS 0 1500 vmx2
default 10.254.40.2 UG1 0 1500 vmx1.40It looks like it's load balancing, because some traceroutes work, and others ping-pong towards the firewall and back:
GT_Data> trace 1.2.3.4
trace to 1.2.3.4, 8 hops max, press Ctrl+C to stop
1 192.168.27.1 0.703 ms 0.554 ms 0.605 ms
2 10.254.40.1 1.323 ms 1.261 ms 1.767 ms
3 192.168.27.1 1.953 ms 1.248 ms 1.116 ms
4 10.254.40.1 2.337 ms 2.688 ms 1.876 ms
5 192.168.27.1 1.814 ms 2.516 ms 1.852 ms
6 10.254.40.1 3.325 ms 2.818 ms 2.670 ms
7 192.168.27.1 2.532 ms 3.280 ms *
8 10.254.40.1 4.716 ms 4.612 ms 2.889 msetc etc. There's where the network breaks, as per the title of this thread. Doesn't happen in 23.09.1.
On 23.09.1, the routing always converges to the same view no matter how or when I bounce various interfaces, but in 24.3, the "show ip route" can have different results based on when various interfaces are bounced and different things happen.
What's not satisfactory is how pfSense+ is load-balancing a directly connected default route, with one learned via OSPF. The connected one should always take precedence due to lower administrative distance.
[https://docs.frrouting.org/en/latest/zebra.html#administrative-distance](link url)System, kernel and connected routes should win out over OSPF routes.
Just looking at FRR source files, there are no build instructions for FreeBSD 15 - only up to FreeBSD 14. Is FreeBSD 15 too experimental for third party packages?!
The FRR package installed is "pfSense-pkg-frr-2.0.2_3". Is there a beta package I could test which contains FRR 9.1.1 or even 10.0 that I could test, to see if this issue is still in the newer versions?