Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Working on getting OpenVPN server bridging to fly.

    Scheduled Pinned Locked Moved OpenVPN
    94 Posts 13 Posters 86.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • N
      Numbski
      last edited by

      Yeah, I think you're right.  The problem is actually that the mDNSResponder and tap driver on OSX need to be patched.  In case anyone stumbles onto this later, here's a link:

      http://tunnelblick.net/alpha/Tunnelblick-Tiger-3.0a2-bonjour-patched.dmg

      Still slowly wading through this mess, as CARP+Bridged OpenVPN don't appear to like one another much.

      1 Reply Last reply Reply Quote 0
      • N
        Numbski
        last edited by

        I'd really like to see someone without a carp setup try this to see what kind of success they have.  Any volunteers?  I just want to narrow down the cause.

        1 Reply Last reply Reply Quote 0
        • N
          Numbski
          last edited by

          Some quick observations about all of this:

          1.  We really need OpenVPN to not assign an IP at all to tap0, and just do an ifconfig tap0 up.  Assigning an IP futzes things up, but the current pfSense code insists on an IP address assignment in the webui.

          2.  Bridging works really really well….for all of about 5 minutes at a time.  Then it all goes straight to hell.  Routing just utterly and completely dies.  If you don't connect to the vpn, things stay up marginally longer, then (at least my system does) the system crashes.  The system will attempt to shut down (ACPI) if I do a ctrl-alt-del, but it never does.  It has to be hard rebooted.  It won't take console input, all interfaces stop answering, etc.  ???

          3.  Despite not setting up bridging from the webui, the interface detection code still picks up bridge0, and "knows" that sis0 and tap0 are on the bridge, as when I look at rules.debug. they are included in those script variables at the top.

          4.  Finally, per many OpenBSD docs, you really should only filter on one interface on a bridge, not both.  That said, I take that to mean we shouldn't have a "block all" rule at all on tap0.  Just an allow all statement, and any filter rules place on sis0 would then apply to tap0 as well.  Sound correct?

          EDIT -

          I think I've made a breakthrough here.  We need to add this to sysctl.conf:

          net.link.bridge.pfil_onlyip=0

          Apparently without this set, it doesn't want to pass non-IP packets from interface to interface, and this totally hoses up our bridged environment with a tap interface, given that tap is a layer2 thing, and not a layer3.  It looks like carp is now working, and tap works.  Bridge works.  Life seems to be good.  We'll see how it holds up after a reboot though. ;)

          Here's where I got a clue about it.  Note that this was in regards to a wireless interface and layer-2 traffic, but it seems to have cleared things up here.

          http://lists.freebsd.org/pipermail/freebsd-net/2006-April/010375.html

          1 Reply Last reply Reply Quote 0
          • S
            sullrich
            last edited by

            Alright, I commited the net.link.bridge.pfil_onlyip=0 change.

            What do you mean by CARP works correctly with the bridge?

            1 Reply Last reply Reply Quote 0
            • N
              Numbski
              last edited by

              Well….

              My config is like this:

              Two pfsense boxes, identical hardware.  sis0 on each is running carp.  So we have 172.16.10.2 on the first, and 172.16.10.3 on the second one.  They share 172.16.10.1.  This is our internal lan (although it is really an opt interface, I need to change that "someday real soon"), and I want to bridge that to tap0.

              Originally when I set this up (and it just happened again...grrr), it would work for about 5 mins, bridge tap0 and sis0, all of a sudden 172.16.10.1 would just stop answering.  CARP wouldn't fail over, as sis0 was still up and still had it's IP, but it would stop replying to ICMP's.  I get to about icmp_seq=288, then nothing.  No mention in tcpdump -i bridge0, sis0, or tap0.  Nothing.  It was really weird.

              What's bizarre is that it's almost EXACTLY 4 minutes.  I would get a ping reponse about once per second.  It's almost as if there is some sort of scheduled task that is killing the bridge.  If I do ifconfig bridge0 deletem sis0, it immediately comes back, then I can do ifconfig bridge0 addm sis0, things work for ~4-5 minutes, then we're back to square 1.

              Is there something here that rings a bell that perhaps wouldn't be immediately obvious to me?  Something that runs as a background agent?  I suppose it's possible that something odd happens in the state table that times out, or maybe a buffer is consistently filling up, but I'm having a hard time placing my finger on what would cause this kind of behavior.

              1 Reply Last reply Reply Quote 0
              • S
                sullrich
                last edited by

                @Numbski:

                Is there something here that rings a bell that perhaps wouldn't be immediately obvious to me?  Something that runs as a background agent?  I suppose it's possible that something odd happens in the state table that times out, or maybe a buffer is consistently filling up, but I'm having a hard time placing my finger on what would cause this kind of behavior.

                Strange, I cannot think of anything that runs in the background that would change anything.

                1 Reply Last reply Reply Quote 0
                • N
                  Numbski
                  last edited by

                  I just timed it.  5 minutes on the dot.  I can cron an ifconfig bridge0 deletem sis0/addm sis0 once every 4 mins to mitigate the problem, but sorta kill any kind of long-term constant-state communications. :P

                  Really have to ponder this.  Doesn't appear to be a pf thing though.

                  1 Reply Last reply Reply Quote 0
                  • S
                    sullrich
                    last edited by

                    Do a killall cron just to make sure its nothing in there stepping on it.

                    1 Reply Last reply Reply Quote 0
                    • N
                      Numbski
                      last edited by

                      Okay, done.  did a addm/deletem sis0 at 4:59:10 central time per my nice little mobile phone here.  It's on the clock.  We'll see how long it lasts. :D

                      1 Reply Last reply Reply Quote 0
                      • N
                        Numbski
                        last edited by

                        Died at 5:04:20 pm central with no crons.  Hmm….

                        addm/deletem sis0 of course revived it.

                        1 Reply Last reply Reply Quote 0
                        • N
                          Numbski
                          last edited by

                          I'm out of time to work on this for now.  I added a crontab to run the deletem/addm every 4 mins.  It's a terrible, awful, dirty hack, but I'm hoping that the robustness of tcp/ip and associated apps will be able to resend and life will go on until I can figure out what is actually causing the issue to begin with.  Any thoughts on debugging please post up! ;)

                          1 Reply Last reply Reply Quote 0
                          • S
                            sullrich
                            last edited by

                            Couple things.

                            When it drops again, check ifconfig and look at the bridge status.  Does it show blocking?

                            1 Reply Last reply Reply Quote 0
                            • N
                              Numbski
                              last edited by

                              ifconfig bridge0 says - UP,BROADCAST,RUNNING,MULTICAST

                              To be fair, I'm not sure what causes an interface to go into BLOCKING mode, because I never (intentionally) use it. :\

                              I'm looking in the right place, right?  Did my deletem/addm, came back.  Shows the same thing.

                              1 Reply Last reply Reply Quote 0
                              • S
                                sullrich
                                last edited by

                                Look underneath that, there is a blocking / forwarding / listening entry for each interface in the bridge.

                                1 Reply Last reply Reply Quote 0
                                • N
                                  Numbski
                                  last edited by

                                  When working, they read: learning, discover.  Waiting for the next failure….

                                  Failure happened.  Same thing.  LEARNING, DISCOVER. For grins I've enabled STP on both, although I really don't think this is a packet storm problem anymore, since I'm not seeing broadcasts coming across the bridge0, sis0, or tap0 interfaces.  Probably a good measure anyway since at some point I need to duplicate this config on the other firewall.

                                  1 Reply Last reply Reply Quote 0
                                  • S
                                    sullrich
                                    last edited by

                                    You may be interested in this commit:

                                    http://pfsense.com/cgi-bin/cvsweb.cgi/pfSense/usr/local/www/status_interfaces.php?rev=1.29.2.7;only_with_tag=RELENG_1

                                    Shows the bridge status now under Status -> Interfaces

                                    1 Reply Last reply Reply Quote 0
                                    • N
                                      Numbski
                                      last edited by

                                      Hmm.  Is it safe for me to grab that one file and plug it in, or is there something more formal I should do? (ie, cvs?)

                                      1 Reply Last reply Reply Quote 0
                                      • S
                                        sullrich
                                        last edited by

                                        Yeah, its safe.  Simply replace /usr/local/www/status_interfaces.php with that new one.

                                        1 Reply Last reply Reply Quote 0
                                        • N
                                          Numbski
                                          last edited by

                                          Cool.  They both show learning.  Of course, after 5 mins it still dies, but they both show learning. ;)

                                          Seriously, have to put this to rest for now.  I'll come back to it later. :)

                                          1 Reply Last reply Reply Quote 0
                                          • N
                                            Numbski
                                            last edited by

                                            Testing remotely.  Quick note - works great, except for a minor detail.

                                            If you intend to use STP, DO NOT, I repeat, DO NOT, enable STP on the tap interface.  Your actual hardware interface is fine, but doing so on the tap interface creates a really odd situation where traffic hits the endpoint tap interface, and gets to your bridge, but nothing ever returns.  Disabling STP on the tap interface resolves that problem.

                                            Otherwise all is well.  Just need to figure out why CARP chokes after 5 mins.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.