Found 6 issues with FRR/OSPF in pfSense 2.5.1
-
@fireodo said in Found 6 issues with FRR/OSPF in pfSense 2.5.1:
@viktor_g said in Found 6 issues with FRR/OSPF in pfSense 2.5.1:
fix is ready: https://redmine.pfsense.org/issues/11768
The Link in the Fix does not work!
Try this patch: 80.diff
see https://docs.netgate.com/pfsense/en/latest/development/system-patches.html -
@viktor_g Hi Victor. I have also logged the ACCEPTFILTER prefix-list issue here:
https://redmine.pfsense.org/issues/11836
I will now investigate more thoroughly the deault behavior of Acess Control Lists, that seem to have switched from implicit deny at the end, to an implicit accept at the end, in my previous lab testing. If true (which seemed to be the case), that would also be another massive source of upgrade headaches.
-
@viktor_g said in Found 6 issues with FRR/OSPF in pfSense 2.5.1:
@fireodo said in Found 6 issues with FRR/OSPF in pfSense 2.5.1:
@viktor_g said in Found 6 issues with FRR/OSPF in pfSense 2.5.1:
fix is ready: https://redmine.pfsense.org/issues/11768
The Link in the Fix does not work!
Try this patch: 80.diff
see https://docs.netgate.com/pfsense/en/latest/development/system-patches.htmlThank you!
Kind regards,
fireodo -
@fireodo Phew! Last bug logged for my "issue #3" in this thread https://redmine.pfsense.org/issues/11841. I hammered away confirming that access lists now behave way differently in 2.5.x, defaulting to an implicit "permit any" rather than implicit "deny any".
This has huge ramifications for upgraders. It hit me like the proverbial tonne of bricks. Another reason why people have griped to me about the 2.5.x and routing issues I'd say.
-
@gcon said in Found 6 issues with FRR/OSPF in pfSense 2.5.1:
I will do more lab testing on those with the intention of logging those as well.
I have this issue with a client. Running FRR ospf and peering over ipsec VTI. Some routes stop working for no reason. They are in the ffr daemon but do not populate the route table under diagnostic>routes. Reseting ospf daemon fixes the issue. Also checked the "ignore ipsec restart events" to no avail.
Did you ever figure this out?
-
@hempfieldtech
I logged about 3 or 4 FRR-related issues. I saw that the ACCEPTFILTER bug already had a bug entry, as I didn't know at the time that the packages used a seperate bug tracking system.
For your issue, do the routes ever come back on their own if left long enough, or is the only fix resetting ospf/FRR? For my connected redistributed routes disappearing - they come back on their own, but they should never have dropped to begin with. Sounds like you might be dealing with a different issue.If you have a lab setup (GNS3 perhaps) you could try replicating it there and try substituting a VyOS or OPNsense device to see if it is happening their as well. Or even just a generic FreeBSD (or Linux) setup with FRR installed. Since I'm not seeing any urgency to the issues I logged, I have moved to OPNsense already as the routing issues I faced with pfSense are all fixed there. Will keep tracking these issues in pfSense ocassionally to see if/when they are addressed by Netgate engineers.
-
@hempfieldtech
Did you discover any more info?
How frequently did you encounter this?I think I saw this today on a site ( Italy )
Routes were present in the OSPF table.
One was missing under Diagnostic / Routes for at least 10 hours before restarting FRR brought it back.Restarting FRR Ospfd didn't bring the route in.
Restarting FRR Core Zebra did.The spoke off of London was present.
But the route to London ( hub) it self was missing.2.5.2-RELEASE
Package version frr net 1.1.0_15
About 3 months since last config change -
@ay Hi there. It's been 8 months since I did my deep-dive into pFsense dynamic routing with pfSense. I do recommend getting a GNS3 lab together though if you can do that - it's great for testing. I'll get back into pfSense testing probably this month when the new version comes out.
-
I setup a cron job to restart the OSPFD on schedule every morning. IT was the only way to overcome the route issue although i have not investigated further since. This is actually a smooth way to do it and it doesn't cause much of a blip on traffic.
-Favyan
-
@gcon I'm experiencing #4 as well and I can reproduce it consistently. I paid for support and opened a case with Netgate TAC and after looking at things for a couple days, turning on extended logging, and having me reproduce multiple times, I get, "Can you try upgrading to the 2.6 release candidate?" I mean I can and I will next week since I set up a whole test site to try to work with them on this, but it doesn't seem like they have any idea and based on the lack of response on your Redmine bug reports, I'm not confident anything will have changed. We've used PFSense for years and been pretty happy, but they don't seem to be treating critical FRR issues with any urgency as this issue started with 2.5.x over 8 months ago.
-
@mdomnis Unfortunately, I am able to reproduce the undesirable behavior (similar to your #4 it would seem) on 2.6 RC as well. :( Waiting for TAC to give me something else to try or have someone dig in further.
-
Summarising the initial 6 things I raised:
1. #1. SPF algorithm firing causes OSPF "redistribute connected" routes to flush.
This was raised in #11835.
I can see that no one has worked on this critical bug. I have tested and this is still an issue (!!)#2. OSPF protocol filtering (FRR GUI - Global Settings / Route Handling) causes FRR to do strange things (and make OSPF routes invalid / crash FRR etc)
I avoid the "FRR GUI - Global Settings / 'Route Handling'" way of filtering as I found that too unstable so haven't tested it since finding it a problem. I have done filtering elsewhere on my Mikrotik routers instead.I did raise 11836 for a related issue, and some things improved there, but not sure if this actual issue is fixed or not. Since I don't use the "route handling" features I stopped looking at this issue.
#3. ACL's no-longer have an implicit deny at the end.
I did raise 11841 but I am not looking at this issue as I found that prefix-lists weren't affected so I swapped over from access-lists (ACLs) to prefix lists for my needs (for the redistributing of specific connected routes into OSPF).#4. OpenVPN links re-establishing can cause "onlink" routes to become inactive
@mdomnis How did you end up going with this? I didn't actually raise a ticket for this but you've been working with pfSense on it I see. I'm not seeing it in pfSense 2.6, but my test lab might be different to when I had it last. Solved?Issues #5 and #6 - ACCEPTFILTER prefix list entries to be duplicated, and Interface descriptions cumulative
These got fixed - am not seeing these issues in pfSense 2.6. They would have been pretty trivial to sort out.======
so in short, #5 and #6 are fixed. #4 seems to be fixed (to be confirmed).. #2 and #3 - I have worked around (have avoided those features, thus I'm not affected).The only thing that that I am affected by right now (and cannot avoid) is issue #1. And it's still really bad. Here's one of my connected routes dropping the moment a backup link comes back up:
O>* 10.24.194.0/24 [110/20] via 10.255.195.2, ovpns2 onlink, weight 1, 01:07:33
pfsense01.it.somecompany.com.au# show ip route | include 10.24.1
O>* 10.24.194.0/24 [110/20] via 10.255.195.2, ovpns2 onlink, weight 1, 01:07:35
pfsense01.it.somecompany.com.au# show ip route | include 10.24.1
O>* 10.24.194.0/24 [110/20] via 10.255.195.2, ovpns2 onlink, weight 1, 01:07:37
pfsense01.it.somecompany.com.au# show ip route | include 10.24.1
O>* 10.24.194.0/24 [110/20] via 10.255.195.2, ovpns2 onlink, weight 1, 01:07:38
pfsense01.it.somecompany.com.au# show ip route | include 10.24.1
O>* 10.24.194.0/24 [110/20] via 10.255.195.2, ovpns2 onlink, weight 1, 01:07:40
pfsense01.it.somecompany.com.au# show ip route | include 10.24.1
pfsense01.it.somecompany.com.au# show ip route | include 10.24.1
pfsense01.it.somecompany.com.au# show ip route | include 10.24.1
pfsense01.it.somecompany.com.au# show ip route | include 10.24.1
pfsense01.it.somecompany.com.au# show ip route | include 10.24.1
pfsense01.it.somecompany.com.au# show ip route | include 10.24.1
O>* 10.24.194.0/24 [110/20] via 10.255.195.2, ovpns2 onlink, weight 1, 00:00:01
pfsense01.it.somecompany.com.au# show ip route | include 10.24.1
O>* 10.24.194.0/24 [110/20] via 10.255.195.2, ovpns2 onlink, weight 1, 00:00:03
pfsense01.it.somecompany.com.au# show ip route | include 10.24.1
O>* 10.24.194.0/24 [110/20] via 10.255.195.2, ovpns2 onlink, weight 1, 00:00:04
pfsense01.it.somecompany.com.au#10.255.195.2 is the far end of the primary link (p2p). The backup p2p link re-establishing should not cause this route learned over the primary link to flush and relearn. I'm testing pfSense 2.6.0-RELEASE which is built on FreeBSD 12.3-STABLE and has FRR version 7.5.1
update: I cloned my lab and updated pfSense to 2.7.0
2.7.0-DEVELOPMENT (amd64)
built on Mon Oct 17 06:04:34 UTC 2022
FreeBSD 14.0-CURRENTIt is still happening on there. The FRR on 2.7 is still only 7.5.1. Why so old? https://frrouting.org/release/ That's from March 7 2021. FRR is up to 8.3.1 now - 5 releases on from that. Really would like to see what happens in a later version of FRR and hoping the devs can update the FRR package to the latest release soon.