FreeBSD bug - default route changes unexpectedly
This is just a heads-up, since this bug seems to be affectith both FreeBSD 8.x and 9.x
It seems it's mis-classified as ipfw-only, since some of the people experiencing it use only PF+ALTQ or PF+dummynet
Interesting, I've never seen nor heard of that happening, and I've seen quite a few very busy systems.
It's possible that it's something we've patched away over the years, or that we haven't yet encountered the type of traffic that is capable of triggering such a problem.
According to a post by FreeBSD developer Adrian Chadd, it's a known issue:
Default route changes unexpectedly
Adrian Chadd adrian at freebsd.org
Wed Mar 6 07:32:42 UTC 2013
It's a known problem; it just seems that it doesn't overlap/intersect
the day to day activities of any network focused freebsd developers.
If you guys want it fixed then you may have to find a developer to
hire on contract to fix it, or find some kind of ruleset/traffic
generation setup that reliably triggers the bug.
On 5 March 2013 09:39, Nick Rogers <ncrogers at="" gmail.com="">wrote:
I am attempting to create awareness of a serious issue affecting users
of FreeBSD 9.x and PF. There appears to be a bug that allows the
kernel's routing table to be corrupted by traffic routing through the
system. Under heavy traffic load, the default route can seemingly
randomly change to an IP address that is not directly connected to the
network (i.e., is not configured anywhere). Dhclient is not in the
mix, nor is routed, bgpd, etc. Running
route monitorshows no
evidence of the change in the default route. The one commonality
between all the systems experiencing this problem seems to be the use
Obviously this is a serious problem as it causes all Internet-bound
traffic to stop routing until the default route is corrected. Some
users, including myself, are working around this problem by installing
a script that runs multiple times a second to check if the default
route is incorrect and fixing it if necessary, which mitigates the
amount of downtime caused by the bug.
Please refer to these past posts for more examples and evidence of
other users experiencing this problem:
There is also a PR that was incorrectly labeled as an IPFW issue.
Myself and others believe this issue is not restricted to the use of
IPFW and that the PR should be relabeled. I am inclined to think it is
strictly a PF issue since I am not using IPFW, however there is
evidence of the default route changing on people using IPFW for past
versions of FreeBSD (7.x/8.x), so perhaps this is related.
Another PR for the same problem but specific to IPFW and 8.2-RELEASE
I am hoping someone reading this can give the problem the attention it
deserves. Thank you.
It seems amazing then that nobody has observed it on pfSense that I know of. We usually uncover or encounter those sorts of things before others hit them.
I have been following that thread as well.
From me its flowtable issue and some mixure in-between.
Without proper analysis of that you cannot say much about it.
We might have patched in one of our slew patches all over and flowtable is not used in pfSense but hard to tell withoutany analysis behind.
There has been a commit in relation to this bug:
Date: Wed Apr 24 18:30:32 2013
New Revision: 249848
This fixes the issue with the "randomly changing" default
route. What it was is there are two places in ip_output.c
where we do a goto again. One place was fine, it
copies out the new address and then resets dst = ro->rt_dst;
But the other place does not do that, which means earlier
when we found the gateway, we have dst pointing there
aka dst = ro->rt_gateway is done.. then we do a
goto again.. bam now we clobber the default route.
The fix is just to move the again so we are always
doing dst = &ro->rt_dst; in the again loop.
MFC after: 1 week
–- head/sys/netinet/ip_output.c Wed Apr 24 18:00:28 2013 (r249847)
+++ head/sys/netinet/ip_output.c Wed Apr 24 18:30:32 2013 (r249848)
@@ -196,8 +196,8 @@ ip_output(struct mbuf *m, struct mbuf *o
hlen = ip->ip_hl << 2;
- dst = (struct sockaddr_in *)&ro->ro_dst;
- dst = (struct sockaddr_in )&ro->ro_dst;
ia = NULL;
- If there is a cached route,
- dst = (struct sockaddr_in *)&ro->ro_dst;
One of the lessons hammered into my brain by CS profs was just how evil GOTO is… nice to see we're still encountering GOTO issues.
/Yes, I know, "Used properly ... "
//I still hate it.
Interestingly, this (long-standing) FreeBSD bug has been fixed by this guy:
whose other work seems very impressive and is affiliated with Cisco.
I wonder if any recent Cisco products are using FreeBSD "under the hood" …
PS: I know Juniper's JunOS is built on FreeBSD, but afaik JunOS isn't using the native FreeBSD TCP/IP stack.
Hrm the bug seems to affect even pfSense even though never seen probably because that again loop is rarely(maybe at all) used.
I will merge teh patch just for correctnes.