There is definitely a rare, screwy bug with static routes

  • I already had a screwy situation with static routes before, which I was not able to solve without reinstalling pfsense:

    Now, I have just setup a new instance of pfsense at a new site, running version 2.3.2p1.  It has been running fine for about 3 weeks.

    This site has several static routes, as it connects to several remote sites via a VPN gateway (a separate box).

    Today, I setup a point-to-point antenna to two sites that are nearby.  So I changed the static route for the first site from the VPN gateway to the antenna gateway.  Everything worked great!

    Then I changed the second static route for the second site from the VPN gateway to the antenna gateway and…

    My entire local network died.  Everything went offline, and I could barely communicate with other devices on the local network, much less on the internet or at the remote sites.  It was as if my entire network was being internally DDOSed.

    I experienced a similar symptom when messing with static routes in the same post I referenced above.

    But the worst thing about this one is: even after completely deleting the two static routes in question, and rebooting pfsense, the internal flood continued.

    The log shows a bunch of dpinger errors with code 55 to the antenna gateway in question, but I've tested with pfsense off, and the antenna gateway is fine.  And also all my other devices come back online when the pfsense box is off.

    This is a really terrible bug because it seems it could strike at any time without warning, and bring down an entire network.  It would be catastrophic if this occurred while I was making changes to pfsense from a remote location.

    Looks like I'm going to have to reinstall pfSense from scratch, again.

  • LAYER 8 Netgate

    Sounds like you somehow created a loop.

    Probably not a bug. Probably a mistake.

    Without more details it is impossible to know.

    Please create a comprehensive network diagram detailing the starting state of the network, desired ending state of the network, and the exact steps you took.

  • It's not a loop.

    Site 1:
    pfsense router interfaces:
    internet gateway -> internet connection
    vpn gateway -> vpn router
    antenna gateway -> antenna to Site 2

    VPN router -> VPN router at Site 2

    Site 2:
    pfsense router interfaces:
    internet gateway -> internet connection
    vpn gateway -> vpn router
    antenna gateway 1 -> antenna to Site 1
    antenna gateway 2 -> antenna to Site 3

    VPN router -> VPN router at Site 1

    Site 3:
    pfsense router interfaces:
    antenna gateway -> antenna to Site 2

    Static Routes:
    Site 1: to Site 2 via vpn gateway, to Site 3 via vpn gateway
    Site 2: to Site 1 via vpn gateway, to Site 3 via antenna gateway 2
    Site 3: to Site 1 via antenna gateway, to Site 2 via antenna gateway

    I changed Site 1 Static Routes to:
    to Site 2 via antenna gateway <- worked great!
    to Site 3 via antenna gateway <- network death

    Did you even check my link in my original post? I also experienced a network storm with a perfectly valid static route config at a different site also using pfsense.  I had to reinstall pfsense completely to get it working again, with the exact same config.

    Also, everything is working fine now.  Guess what I did?

    Well, I left the machine on all night, network storm and all.  Didn't change any config on the pfsense box, or anywhere else on the network for that matter.

    I added some NICs to the virtual machine (my intent was to switch the existing VMware E1000 NICs to VMXNET 3 NICs, as I've read on the forums that E1000 can be problematic).

    So I shut down the machine, added the VMXNET NICs to the VM, restarted and … before I could even change anything in the pfsense config, everything was working fine this morning.

    Again, the configuration as is works just fine as long as pfsense decides to behave.  And this is the second time I have experienced a network storm when using a straight-forward routing table

    So, I'm sure there is no loop.  I was able to ping just fine from Site 1 to both Site 2 and Site 3 using the same configuration that borked my network last night.  I don't think it is specifically a configuration problem. I think it is a pfsense bug, somewhere deep in the network code.

    I have since changed all the interaces from E1000 to VMXNET 3, despite the fact that everything was working, just in case.  Note again that I did not make any other configuration changes and yet it all works fine.  We'll see if any problems return after the change of adapter type.

  • LAYER 8 Netgate

    Layer 3 routing loops generally do not bring down networks thanks to the TTL.

    Layer 2 loops do.

    When you keep getting the same bad results nobody else is getting it's probably something unique you are doing.

    Do it again and take a packet capture.

    What you posted is not a diagram. What is in my sig is a diagram.

  • I think, possibly, the problem was arising from an error in my VLAN configuration.  Thank you for encouraging me to examine my network in more detail.  In the end, I don't think it was a problem with the routes themselves, but rather with an error on my switch with the VLANs.

    I'm not 100% sure yet, but everything is working for now and I will see how it goes.

Log in to reply