FreeBSD bug - default route changes unexpectedly



  • This is just a heads-up, since this bug seems to be affectith both FreeBSD 8.x and 9.x

    http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/174749
    http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/157796

    It seems it's mis-classified as ipfw-only, since some of the people experiencing it use only PF+ALTQ or PF+dummynet


  • Rebel Alliance Developer Netgate

    Interesting, I've never seen nor heard of that happening, and I've seen quite a few very busy systems.

    It's possible that it's something we've patched away over the years, or that we haven't yet encountered the type of traffic that is capable of triggering such a problem.



  • According to a post by FreeBSD developer Adrian Chadd, it's a known issue:

    http://lists.freebsd.org/pipermail/freebsd-net/2013-March/034784.html
    Default route changes unexpectedly
    Adrian Chadd adrian at freebsd.org
    Wed Mar 6 07:32:42 UTC 2013

    It's a known problem; it just seems that it doesn't overlap/intersect
    the day to day activities of any network focused freebsd developers.

    If you guys want it fixed then you may have to find a developer to
    hire on contract to fix it, or find some kind of ruleset/traffic
    generation setup that reliably triggers the bug.

    Adrian

    On 5 March 2013 09:39, Nick Rogers <ncrogers at="" gmail.com="">wrote:

    Hello,

    I am attempting to create awareness of a serious issue affecting users
    of FreeBSD 9.x and PF. There appears to be a bug that allows the
    kernel's routing table to be corrupted by traffic routing through the
    system. Under heavy traffic load, the default route can seemingly
    randomly change to an IP address that is not directly connected to the
    network (i.e., is not configured anywhere). Dhclient is not in the
    mix, nor is routed, bgpd, etc. Running route monitor shows no
    evidence of the change in the default route. The one commonality
    between all the systems experiencing this problem seems to be the use
    of PF.

    Obviously this is a serious problem as it causes all Internet-bound
    traffic to stop routing until the default route is corrected. Some
    users, including myself, are working around this problem by installing
    a script that runs multiple times a second to check if the default
    route is incorrect and fixing it if necessary, which mitigates the
    amount of downtime caused by the bug.

    Please refer to these past posts for more examples and evidence of
    other users experiencing this problem:

    http://forums.freebsd.org/showthread.php?p=211610#post211610

    http://freebsd.1045724.n5.nabble.com/Default-route-quot-random-quot-gateway-modification-bug-td5750820.html

    http://lists.freebsd.org/pipermail/freebsd-net/2012-March/031879.html

    http://lists.freebsd.org/pipermail/freebsd-ipfw/2010-September/004361.html

    There is also a PR that was incorrectly labeled as an IPFW issue.
    Myself and others believe this issue is not restricted to the use of
    IPFW and that the PR should be relabeled. I am inclined to think it is
    strictly a PF issue since I am not using IPFW, however there is
    evidence of the default route changing on people using IPFW for past
    versions of FreeBSD (7.x/8.x), so perhaps this is related.

    http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/174749

    Another PR for the same problem but specific to IPFW and 8.2-RELEASE

    http://www.freebsd.org/cgi/query-pr.cgi?pr=157796

    I am hoping someone reading this can give the problem the attention it
    deserves. Thank you.

    -Nick</ncrogers>


  • Rebel Alliance Developer Netgate

    It seems amazing then that nobody has observed it on pfSense that I know of. We usually uncover or encounter those sorts of things before others hit them.



  • I have been following that thread as well.
    From me its flowtable issue and some mixure in-between.

    Without proper analysis of that you cannot say much about it.
    We might have patched in one of our slew patches all over and flowtable is not used in pfSense but hard to tell withoutany analysis behind.



  • There has been a commit in relation to this bug:

    http://lists.freebsd.org/pipermail/svn-src-all/2013-April/067954.html

    Author: rrs
    Date: Wed Apr 24 18:30:32 2013
    New Revision: 249848
    URL: http://svnweb.freebsd.org/changeset/base/249848

    Log:
      This fixes the issue with the "randomly changing" default
      route. What it was is there are two places in ip_output.c
      where we do a goto again. One place was fine, it
      copies out the new address and then resets dst = ro->rt_dst;
      But the other place does not do that, which means earlier
      when we found the gateway, we have dst pointing there
      aka dst = ro->rt_gateway is done.. then we do a
      goto again.. bam now we clobber the default route.
     
      The fix is just to move the again so we are always
      doing dst = &ro->rt_dst; in the again loop.
     
      PR: 174749,157796
      MFC after: 1 week

    Modified:
      head/sys/netinet/ip_output.c

    Modified: head/sys/netinet/ip_output.c

    –- head/sys/netinet/ip_output.c Wed Apr 24 18:00:28 2013 (r249847)
    +++ head/sys/netinet/ip_output.c Wed Apr 24 18:30:32 2013 (r249848)
    @@ -196,8 +196,8 @@ ip_output(struct mbuf *m, struct mbuf *o
    hlen = ip->ip_hl << 2;
    }

    • dst = (struct sockaddr_in *)&ro->ro_dst;
      again:
    • dst = (struct sockaddr_in )&ro->ro_dst;
      ia = NULL;
      /
    • If there is a cached route,

  • Rebel Alliance Developer Netgate

    One of the lessons hammered into my brain by CS profs was just how evil GOTO is… nice to see we're still encountering GOTO issues.

    /Yes, I know, "Used properly ... "
    //I still hate it.



  • Interestingly, this (long-standing) FreeBSD bug has been fixed by this guy:
    http://people.freebsd.org/~rrs/
    whose other work seems very impressive and is affiliated with Cisco.

    I wonder if any recent Cisco products are using FreeBSD "under the hood" …

    PS: I know Juniper's JunOS is built on FreeBSD, but afaik JunOS isn't using the native FreeBSD TCP/IP stack.



  • Hrm the bug seems to affect even pfSense even though never seen probably because that again loop is rarely(maybe at all) used.

    I will merge teh patch just for correctnes.


Log in to reply