Working on getting OpenVPN server bridging to fly.

Numbski

I just e-mailed him asking if a sanitized version of the config.xml would suffice. I would really prefer not to go giving out password hashes and IP addresses. :(

sullrich

Sanatize the passwords but your fear about ip addresses is kinda silly.

If you trust the code that we put into this product then I don't see why you cannot trust someone knowing your ip address.

Numbski

Sent.

Numbski

Just made an observation.

These hangups seem to occur consistently when I'm sending a whole lot of traffic through the firewalls, such as a cvsup. Doesn't have to be traffic across the vpn, just traffic in general.

Numbski

Heh, sullrich. You're not going to believe this.

I fully understand what you told me in irc about you guys not doing anything to or with tun/tap interfaces, and that everything is done via openvpn.

That said, after setting sysctl net.link.tap.user_open to 1, I've had the most uptime since I've started this whole debugging fiasco. Totally odd. Just thought I'd point it out in case someone might have an explanation for it.

To bring people who might be reading this up to speed, net.link.tap.user_open is set to 0 by default. What that means is that only root (or similarly privileged users) have permission to make changes to, or siginficantly impact a tap interface. When set to 1, non-privileged users can do the same. This might be construed as a security concern, but for testing purposes there's no harm. If indeed this "fixes" my problem, it raises more questions than it answers, as OpenVPN runs as root right now, meaning that either something else is touching the tap interface, OR openvpn is somehow dropping privs at some point.

Numbski

Uptime is up over a day on the box that was kicking the bucket about once every three hours before with the sysctl set. (Crosses fingers and prays….) Putting a pretty solid load on it too.

sullrich

I'll commit a change to force this sysctl for OpenVPN.

Update: commited to /etc/sysctl.conf

Numbski

Thanks!

Numbski

Just updating the status on this.

The watchdog daemon is still having to kill the machine if it has any active OpenVPN sessions about once every 24 hours. If no one connects, it stays up indefinitely.

There is definitely a difference between a pfSense box that is bridged to a carp-enabled interface vs one that is not. I have one with an uptime of over a month with the exact same config that has traffic flowing on it pretty consistently. The difference is that neither WAN nor LAN is running CARP, whereas on the configs where the hangups occur, both WAN and the bridged interface are part of a CARP cluster. That fact that I'm not all that familiar with how CARP really functions underneath doesn't help matters much. All I know is that it broadcasts (which pfSense passes all bridge traffic by default, so that means CARP broadcasts are getting onto the OpenVPN tap interface), but I don't see how that would case harm.

Numbski

is a glutton for punishment, I kid you not. :P

Doing some research on CARP and OpenVPN, I came across this document:

http://openvpn.net/archive/openvpn-devel/2005-10/msg00017.html

The thought occurs to me. We synchonrize states across firewalls in a CARP cluster. Just speculating on how this happens, but it is possible that OpenVPN on system A tries to synchonize to system B and fails somehow.

(This is mostly a note to myself to look into after I get back into the country, feel free to ignore me!)

Numbski

Since this thread is turning more into a blog and less into a support thread, I figured I should update it. :)

I've posted a doc topic on how to get things running as I have them currently here:

http://doc.pfsense.org/index.php/Setting_up_OpenVPN_with_pfSense#OpenVPN_Client_Bridging

Now, what has changed for me since the last time I posted? Well, up until sullrich beat it into me that I should not have tap0 assigned as an opt interface (heh), I had tried to bridged from the ui. I have since scrapped that, and the bridge is brought up at boot time using shellcmd/earlyshellcmd. Also, my uptime is at a new record since doing this….1 1/2 days. :)

We may finally have hit stability on this. Crossing my fingers. I'll update if my good luck continues, and if so, I'd like someone to volunteer to do a similar config. If we have this licked, I'll start petitioning to have the config merged into the OpenVPN webui pages.

Numbski

Okie, I have 3 full days of uptime without a kernel hang condition. I think we have this licked folks.

Any volunteers to duplicate my config to make sure? I'd like to get this into the webui sometime soon.

Numbski

I was wrong. Problem remains. Repeat - problem remains.

There is definitely a collision of some sort between bridging of tap interfaces and CARP. I have a little script watching the connectivity of the bridge, and all of a sudden the CARP interface involved on the physical interface just stops answering. Remove the physical interface from the bridge, wait a few secs, put it back, and all is fine again. ???

Really just don't know where to go with this anymore. When it works, it works great. It just doesn't stay working.

rajl

Why not just turn CARP off? Is it a service that everyone needs?

Icidic

I hate to bump an already huge topic, but, can I confirm that pfSense with OpenVPN Bridge Mode ONLY appears to kernel hang when CARP is involved? Or does it hang regardless of whether the pfSense machine is CARP aware or not?

Thanks :).

razor2000

@Icidic:

I hate to bump an already huge topic, but, can I confirm that pfSense with OpenVPN Bridge Mode ONLY appears to kernel hang when CARP is involved? Or does it hang regardless of whether the pfSense machine is CARP aware or not?

Thanks :).

I will add that without CARP on, I have no stability problems or kernel hangs with OpenVPN bridging enabled. My pfsense based alix board currently has an uptime of 8 days and 2 hours. Hope this helps…

valnar

Is there a set of instructions from start to finish that will accomplish this Layer-2 bridge over the Internet on pfSense boxes?

Thanks.

bviper47

I would also like to know if there is a full set of instructions for this.

valnar

I managed to get a L2 bridge working with DD-WRT on a pair of old Linksys WRT54G routers following this:
http://www.dd-wrt.com/wiki/index.php/OpenVPN_-_Site-to-Site_Bridged_VPN_Between_Two_Routers
I will try it on a couple pfSense boxes next, but I assume it would operate the same way.

I also wonder if it would have the same limitations, which I discovered is a low 'high end' frame or packet size. If you ping across the tunnel and set the don't-fragment-bit, the largest packet allowed will be 1342 bytes. 1343 fails without fragmentation. Of course, IP can normally handle this, but I need L2 connectivity for non-IP protocols which have no knowledge of fragments.

Since my underlying protocol can't fragment it's own frame, is there a way to make pfSense fragment the packet after encapsulation with this OpenVPN/Bridge method? Because after you add all the L3 & VPN overhead, it's quite easy to exceed the MTU allowed over the Internet, resulting in dropped packets (frames) at the source.

tekkon

I am on pfSense 1.21 final. I tried the OpenVPN bridging instructions over at this link:
http://doc.pfsense.org/index.php/VPN_Capability_OpenVPN#OpenVPN_Client_Bridging

The part where entering

"server-bridge 172.16.11.1 255.255.255.0 172.16.11.128 172.16.11.150"

in the 'Custom Option' box within OpenVPN's server settings didn't work.

I got the following error in my OpenVPN log:

"openvpn[15315]: Options error: --server and --server-bridge cannot be used together"

Since the '–server' option cannot co-exist with the '--server-bridge' option, which part of '/etc/inc/openvpn.inc' should I manually edit out the '--server' option?

Another part of the instruction that didn't work in 1.21 is where it instruct to enter

<earlyshellcmd>ifconfig bridge0 create</earlyshellcmd>
<earlyshellcmd>ifconfig bridge0 addm em2 up</earlyshellcmd>
<shellcmd>ifconfig bridge0 addm tap0</shellcmd>

in '/conf/config.xml' didn't load after a reboot. I had to manually execute it in cli to get the result.