Working on getting OpenVPN server bridging to fly.



  • I know this isn't currently supported, but there is a nice writeup on how to get this working on FreeBSD:

    http://www.mired.org/home/mwm/papers/FreeBSD-OpenVPN-Bridging.html

    I'm poking around in my pfSense config as we speak to set this up.  So far no luck, primarily because openvpn.inc insists on having an ifconfig line regardless of what I put in.  I'm trying a manual config file for now to see if I can get it working.  Just thought I'd toss it out there.



  • Wow, that was almost TOO easy.  The only thing that sucks is that my powerbook won't do Bonjour/Rendezvous/Zeroconf across the tap0 interface. :(

    All we need to do is the following:

    Get your tunnel working so you can connect from a remote location.  Don't bother setting up routing to other networks, just make sure you can get into the vpn, and that traffic flows as exptected.

    Now, the IP range you're using on the VPN, by definition, to this point has been different.  We are going to take tap0 and make it, for lack of a better term, a switch port, connected to another interface, and thus it is going to assume the IP address identity of that other interface.  DO NOT change the IP address settings you've put in, as you'll mess up the rest of your network otherwise (router that can find the same network on two interfaces….doesn't make for a happy router).

    Check "Use Static IP's" (if you haven't already)

    In "custom options", add the following to whatever you already have:

    dev tap0; float; server-bridge 172.16.10.1 255.255.255.0 172.16.10.64 172.16.10.191

    Okay, now 172.16.10.1 should be replaced with the IP address that will be used as the server end of the openvpn tunnel.  I chose that IP because it matched up well with the rest of my configuration, and of course it should be on the same network as the one you're bridging to.  replace 255.255.255.0 with the correct netmask of the network you're bridging to.

    That last two IPs are the beginning and end of your OpenVPN IP assignment range.  I allowed a 32 host block for this purpose.

    Now here's the kicker.

    We have to create bridge0.  In my original config, I went to Interfaces, tap0, and chose to bridge to my other interface (was an opt interface, sis0).  bridge0 gets created, but does not actually add the members tap0 and sis0.  ????

    On the page I linked above, it suggests creating a script that does this:

    ifconfig bridge0 addm sis0 addm tap0 up

    For whatever reason pfSense's code doesn't do this.  I also tried that document's method of creating the script, giving it execute permissions, and then adding it to my "custom options" line, but when I reboot the firewall, the tap0 interface comes up, but it never adds the two interfaces to bridge0.  ???

    So right now, I temporarily added that line to /etc/rc, right before "bootup complete".  Ta da!  Bridged openvpn!  My laptop is now on my opt network no matter where I go! :D

    Now if we can figure out how to get this to work right using the pfSense interface, I'm a happy man.



  • on the Bonjour/Rendezvous/ZeroConf side of things, per this page I should have a workaround:

    http://www.section6.net/wiki/index.php/Setting_up_a_Secure_Bridged_(Wireless)_Network_with_OpenVPN#A_note_on_OS_X_and_Bonjour

    echo "pass in quick on sis0 dup-to tap0 inet proto udp from any to 224.0.0.251 port = 5353" | pfctl -mf -
    echo "pass in quick on tap0 dup-to sis0 inet proto udp from any to 224.0.0.251 port = 5353" | pfctl -mf -

    HOWEVER…when  I do that, the entire system seems to stop answering any requests at all, and nothing gets put into filter.log for debugging. :\

    Anyone with a clue on that one please let me know.



  • Hmm.  I've had to resort to turning off bridging in the webui, and create the bridge /etc/rc.  So I now have this:

    ifconfig bridge0 create
    ifconfig bridge0 addm sis0 addm tap0 up
    #echo "pass in quick on sis0 dup-to tap0 inet proto udp from any to 224.0.0.251 port = 5353" | pfctl -mf -
    #echo "pass in quick on tap0 dup-to sis0 inet proto udp from any to 224.0.0.251 port = 5353" | pfctl -mf -

    (commented out the dup-to rules until I figure out why they kill things)



  • 1 Edit /tmp/rules.debug
    2 Add your custom items
    3 pfctl -f /tmp/rules.debug



  • Hmm.

    It appears that bridge0 needs to be an interface that pfSense recognizes for rules creation.  The moment I enable the bridge and connect to openvpn, no one on either the openvpn or the opt interface I've bridged to can go anywhere, and I'm getting blocks on bridge0 showing up in the filter logs. Yay.

    interfaces.inc looks a lot different from back in December, btw.  Can't figure out where to hack in a quick allowance for bridge0 so I can add it as an opt.



  • /etc/inc/filter.inc … Search for "outgoing".



  • Here we go.  The problem I have right now appears to be that since I'm running carp on the same interface I'm bridging to, it is causing hiccups.  Wonder if I can bridge to teh carp interface instead?



  • It's more than hiccups.  It's a packet storm kids.

    I think I'm creating a loop on the LAN that causes a storm.  As soon as a turn off the bridge, life is well again.

    Not sure precisely how it's happening.  Someone want to try this on a pfsense box without CARP and tell me what kind of luck you have?



  • If you are briding there is no need for the dup-to portions.  Simply allow the traffic.



  • Yeah, I think you're right.  The problem is actually that the mDNSResponder and tap driver on OSX need to be patched.  In case anyone stumbles onto this later, here's a link:

    http://tunnelblick.net/alpha/Tunnelblick-Tiger-3.0a2-bonjour-patched.dmg

    Still slowly wading through this mess, as CARP+Bridged OpenVPN don't appear to like one another much.



  • I'd really like to see someone without a carp setup try this to see what kind of success they have.  Any volunteers?  I just want to narrow down the cause.



  • Some quick observations about all of this:

    1.  We really need OpenVPN to not assign an IP at all to tap0, and just do an ifconfig tap0 up.  Assigning an IP futzes things up, but the current pfSense code insists on an IP address assignment in the webui.

    2.  Bridging works really really well….for all of about 5 minutes at a time.  Then it all goes straight to hell.  Routing just utterly and completely dies.  If you don't connect to the vpn, things stay up marginally longer, then (at least my system does) the system crashes.  The system will attempt to shut down (ACPI) if I do a ctrl-alt-del, but it never does.  It has to be hard rebooted.  It won't take console input, all interfaces stop answering, etc.  ???

    3.  Despite not setting up bridging from the webui, the interface detection code still picks up bridge0, and "knows" that sis0 and tap0 are on the bridge, as when I look at rules.debug. they are included in those script variables at the top.

    4.  Finally, per many OpenBSD docs, you really should only filter on one interface on a bridge, not both.  That said, I take that to mean we shouldn't have a "block all" rule at all on tap0.  Just an allow all statement, and any filter rules place on sis0 would then apply to tap0 as well.  Sound correct?

    EDIT -

    I think I've made a breakthrough here.  We need to add this to sysctl.conf:

    net.link.bridge.pfil_onlyip=0

    Apparently without this set, it doesn't want to pass non-IP packets from interface to interface, and this totally hoses up our bridged environment with a tap interface, given that tap is a layer2 thing, and not a layer3.  It looks like carp is now working, and tap works.  Bridge works.  Life seems to be good.  We'll see how it holds up after a reboot though. ;)

    Here's where I got a clue about it.  Note that this was in regards to a wireless interface and layer-2 traffic, but it seems to have cleared things up here.

    http://lists.freebsd.org/pipermail/freebsd-net/2006-April/010375.html



  • Alright, I commited the net.link.bridge.pfil_onlyip=0 change.

    What do you mean by CARP works correctly with the bridge?



  • Well….

    My config is like this:

    Two pfsense boxes, identical hardware.  sis0 on each is running carp.  So we have 172.16.10.2 on the first, and 172.16.10.3 on the second one.  They share 172.16.10.1.  This is our internal lan (although it is really an opt interface, I need to change that "someday real soon"), and I want to bridge that to tap0.

    Originally when I set this up (and it just happened again...grrr), it would work for about 5 mins, bridge tap0 and sis0, all of a sudden 172.16.10.1 would just stop answering.  CARP wouldn't fail over, as sis0 was still up and still had it's IP, but it would stop replying to ICMP's.  I get to about icmp_seq=288, then nothing.  No mention in tcpdump -i bridge0, sis0, or tap0.  Nothing.  It was really weird.

    What's bizarre is that it's almost EXACTLY 4 minutes.  I would get a ping reponse about once per second.  It's almost as if there is some sort of scheduled task that is killing the bridge.  If I do ifconfig bridge0 deletem sis0, it immediately comes back, then I can do ifconfig bridge0 addm sis0, things work for ~4-5 minutes, then we're back to square 1.

    Is there something here that rings a bell that perhaps wouldn't be immediately obvious to me?  Something that runs as a background agent?  I suppose it's possible that something odd happens in the state table that times out, or maybe a buffer is consistently filling up, but I'm having a hard time placing my finger on what would cause this kind of behavior.



  • @Numbski:

    Is there something here that rings a bell that perhaps wouldn't be immediately obvious to me?  Something that runs as a background agent?  I suppose it's possible that something odd happens in the state table that times out, or maybe a buffer is consistently filling up, but I'm having a hard time placing my finger on what would cause this kind of behavior.

    Strange, I cannot think of anything that runs in the background that would change anything.



  • I just timed it.  5 minutes on the dot.  I can cron an ifconfig bridge0 deletem sis0/addm sis0 once every 4 mins to mitigate the problem, but sorta kill any kind of long-term constant-state communications. :P

    Really have to ponder this.  Doesn't appear to be a pf thing though.



  • Do a killall cron just to make sure its nothing in there stepping on it.



  • Okay, done.  did a addm/deletem sis0 at 4:59:10 central time per my nice little mobile phone here.  It's on the clock.  We'll see how long it lasts. :D



  • Died at 5:04:20 pm central with no crons.  Hmm….

    addm/deletem sis0 of course revived it.



  • I'm out of time to work on this for now.  I added a crontab to run the deletem/addm every 4 mins.  It's a terrible, awful, dirty hack, but I'm hoping that the robustness of tcp/ip and associated apps will be able to resend and life will go on until I can figure out what is actually causing the issue to begin with.  Any thoughts on debugging please post up! ;)



  • Couple things.

    When it drops again, check ifconfig and look at the bridge status.  Does it show blocking?



  • ifconfig bridge0 says - UP,BROADCAST,RUNNING,MULTICAST

    To be fair, I'm not sure what causes an interface to go into BLOCKING mode, because I never (intentionally) use it. :\

    I'm looking in the right place, right?  Did my deletem/addm, came back.  Shows the same thing.



  • Look underneath that, there is a blocking / forwarding / listening entry for each interface in the bridge.



  • When working, they read: learning, discover.  Waiting for the next failure….

    Failure happened.  Same thing.  LEARNING, DISCOVER. For grins I've enabled STP on both, although I really don't think this is a packet storm problem anymore, since I'm not seeing broadcasts coming across the bridge0, sis0, or tap0 interfaces.  Probably a good measure anyway since at some point I need to duplicate this config on the other firewall.



  • You may be interested in this commit:

    http://pfsense.com/cgi-bin/cvsweb.cgi/pfSense/usr/local/www/status_interfaces.php?rev=1.29.2.7;only_with_tag=RELENG_1

    Shows the bridge status now under Status -> Interfaces



  • Hmm.  Is it safe for me to grab that one file and plug it in, or is there something more formal I should do? (ie, cvs?)



  • Yeah, its safe.  Simply replace /usr/local/www/status_interfaces.php with that new one.



  • Cool.  They both show learning.  Of course, after 5 mins it still dies, but they both show learning. ;)

    Seriously, have to put this to rest for now.  I'll come back to it later. :)



  • Testing remotely.  Quick note - works great, except for a minor detail.

    If you intend to use STP, DO NOT, I repeat, DO NOT, enable STP on the tap interface.  Your actual hardware interface is fine, but doing so on the tap interface creates a really odd situation where traffic hits the endpoint tap interface, and gets to your bridge, but nothing ever returns.  Disabling STP on the tap interface resolves that problem.

    Otherwise all is well.  Just need to figure out why CARP chokes after 5 mins.



  • Another update.  Looked like all was working just fine, until the firewall seized to a halt.  Same behavior as before too.  It responds to ctrl-alt-del by trying to shut down, but fails to actually do so.  Has to be hard rebooted.

    When I get a chance to power cycle it, I'll see if I can set up watchdog to mitigate this side effect until I can find the root cause.  Again, if you have any speculations as to the cause, post up and I'll try it.  Also, anyone without carp that wants to try this, see what happens, let me know.



  • For the sake of discussion, I think I left off an option that might be causing an issue.  Dunno yet:

    dev-node tap-bridge

    Here's the official OpenVPN docs on the matter.  Suprised that I overlooked that directive.

    http://openvpn.net/bridge.html

    It claims that directive is only required under windows though.  Another comment is this:

    A common mistake that people make when manually configuring an Ethernet bridge is that they add their primary ethernet adapter to the bridge before they have set the IP and netmask of the bridge interface. The result is that the primary ethernet interface "loses" its settings, but the equivalent bridge interface settings have not yet been defined, so the net effect is a loss of connectivity on the ethernet interface.

    So, despite what I was reading elsewhere, it appears that the openvpn folks would prefer we do this:

    ifconfig sis0 up
    ifconfig tap0 up
    ifconfig bridge0 create
    ifconfig bridge0 addm sis0 addm tap0
    ifconfig bridge0 172.16.10.2 netmask 255.255.255.0

    The problem here of course is the impact this would have on CARP.  I have sis0 in carp3, and I cannot do addm carp3.  I don't know (and can't easily test at this moment) whether I can ifconfig bridge0 instead of sis0, and still have it able to join a carp cluster.  If anyone wants to speak up on that point as well, please do.  It will be about a week before I can safely test that (I think?).  I might have an opportunity while in Montreal.

    If this is indeed correct, then from pfSense's point of view, we need to able to change the lan interface (or in my case, opt interface) to be bridge0 and not sis0.  That way all rules are being applied to the bridge and not to the physical interface, unless someone wants to step up with more information to say otherwise.  I'm honestly just not finding much info in regards to FreeBSD, bridging, and rules re: pf, only that you should only create rules for one interface and not both, as it screws things up.  I haven't found any documentation on whether rules should be applied specifically to the bridge, or to the physical ints.

    Also, I'm puzzled by STP hosing things up on tap0.  Doesn't make sense to me.



  • In Montreal now.  Noticed that I can't actually set up a watchdog timer, as it requires kernel support (and it isn't in GENERIC), so oops. :)

    Have to find another way for now.

    Might I suggest we officially enable watchdog in the kernel?  Seems like a very logical, sane thing to have in a firewall.  If the kernel stops responding for x seconds, reboot the system.



  • We already support the GEOD watchdog but I do not plan on adding the SW_WATCHDOG as it may interfere with systems this late in the testing cycle.

    We may be able to add it to 1.1.



  • Ah, cool.  Thanks.  Hopefully I'll have time later to rebuild with SW_WATCHDOG for my own purposes.  Doesn't really fix the problem at hand, but makes me feel better to know the system will kick itself. ;)



  • New observations. :D

    I had the opportunity to do an openvpn bridge on a pfSense RC2i box without CARP.  Worked 99% flawlessly with the current code.

    • Added server-bridge directive.
    • Assigned tap0 as an opt.
    • Bridged that opt to lan using the webui.
    • Set an any-any rule on the opt.

    The only thing that didn't work?  STP was enabled on the tap interface by default!  ifconfig bridge0 -stp tap0, and all was well.

    Really, REALLY screwy stuff here.  Wonder if I should just re-load my firewalls when I get back and start clean?  ???

    Would help if someone could verify my findings.



  • Been up for a couple of days, completely stable on bridging on everyone's pfSense boxes but my own.

    Go fig.  ;D

    So yeah.  Put in a statement to check if an interface is a tap interface, and if it is, don't enable STP.  Do that, OpenVPN bridging is good to go.  Works quite nicely with CARP too, despite my initial experiences.  Just do "local (CARP IP)" on both boxes, and presuming you've used the same server crt, ca cert, server key, and dh, it will fail over gracefully.

    Good stuff guys.  Sorry I made a three page thread on it.  At least someone else has issues that match mine, they'll have something to go on.  When I return I'll try doing a fresh load on my boxes and see what my results are.  I think you can safely say that OpenVPN bridging works though.

    NOTE: The change needs to be made ~ line 144 in /etc/inc/interfaces.inc.  Currently it looks like this:

    
    if(!is_interface_wireless($lancfg['if']) and
                       !is_interface_wireless($config['interfaces'][$lancfg['bridge']]['if']))
                            mwexec("/sbin/ifconfig bridge{$bridges_total} stp {$config['interfaces'][$lancfg['bridge']]['if']} stp {$lancfg['if']}");
    
    

    I'm thinking on that first line we need to add something along the lines of a negated regex, maybe !/tap/?  Don't know precisely how that goes in php.  Then after the mwexec line, we add an elsif block that says the same thing, only don't negate the regex, and on the mwexec line, leave off the stp part.  Make sense?



  • Good to hear.  If you want to create a patch using diff -rub I'll get it commited.



  • I actually don't know how to handle the php code for that situation.  :'(



  • Grrr.

    The bridge doesn't hold after a reboot.  bridge0 gets created, but the interfaces don't get added.  I have to do it manually.  At what point in the boot process do the bridges get brought up?  Is it possible that it's being attempted prior to openvpn being launched, thus tap0 doesn't exist and the interfaces don't get added?


Locked