Working on getting OpenVPN server bridging to fly.
-
Grrr.
The bridge doesn't hold after a reboot. bridge0 gets created, but the interfaces don't get added. I have to do it manually. At what point in the boot process do the bridges get brought up? Is it possible that it's being attempted prior to openvpn being launched, thus tap0 doesn't exist and the interfaces don't get added?
-
You will most likely need to use <shell_cmds>and friends to do all your custom commands.
I wonder if Fernando can write in a bridging module at some point. I'll ask him.</shell_cmds>
-
Care to expound? I've never seen that tag before. Can I place that manually into config.xml or something?
Also…boxes are randomly hanging after a period of openvpn use. These aren't low-end systems either. Some are custom boxes with 1.4Ghz-2.0Ghz P4's with lots and lots of RAM, and the others are from vendors listed on this site with plenty of RAM.
Hmm. Behavior is consistent across the board in that regard to. Nothing logged points to the failure though. Tends to make me think it is a FreeBSD issue and not pfSense-specific. I'll keep digging.
-
See http://faq.pfsense.com/index.php?action=artikel&cat=10&id=38&artlang=en&highlight=hidden for additional tags in the config.xml that can be used to trigger some additional actions for example on bootup or filter reload.
-
Thank you!
(BTW, have I mentioned how frustrating it is to hunt down random lockups???)
-
erm…hmmm
Heres's what I added in the system section of config.xml:
<shellcmd>ifconfig bridge0 create</shellcmd> <shellcmd>ifconfig bridge0 addm sis0 addm tap0 stp sis0 up</shellcmd>
The commands appear to be ignored at bootup though. I know you posted shell_cmd, but the faq says it is just one word, shellcmd. Is the faq wrong?
The default config.xml from cvs doesn't help much either:
http://pfsense.com/cgi-bin/cvsweb.cgi/pfSense/conf.default/config.xml?rev=1.19;content-type=text%2Fplain
:)
-
<shellcmd>is correct. Are you putting it in side the <system>tags?</system></shellcmd>
-
Im about to start debugging this, any chance you could send me your config to pf@fud.org.nz
-
Yup. They're definitely enclose in the system tags.
I'm having one of those weeks. :P
So far as sending my config, I can't right now, perhaps tomorrow though. What part were you wanting to debug? The shellcmd issue, the random lockups, or the bridging in general?
-
thompsa is the FreeBSD bridge commiter. He wants to check out the tap interface stp issue.
-
Ah, gotcha.
NP, will send it along as soon as reasonable.
-
I'm going to try to rebuild my firewalls today, and I'll send along the config after that.
-
I think I may have had an epiphany on the random-lockup thing with OpenVPN and CARP. This doesn't effect the bridge problem itself (bridge just stops working).
I've been having openvpn listen on the WAN carp interface's IP address.
So visualize this - you have two firewalls listening on the CARP IP, so only one can answer. You make a change or do something that causes a filter reload. Temporarily, the CARP IP address becomes unavailable, so the second box takes over. Now, you have OpenVPN set up on a "keep state", so it has a state going on the first box, but now suddenly the second box answers. The CARP IP becomes available on the first box again. So now you flip back, all the while, we're tunneling layer two traffic, both boxes bridged onto the LAN.
Something tells me that this exchange is far from graceful, and in fact we're really hacking off OpenVPN, and causing an exception that the kernel just can't deal with. I have to keep reminding myself that tap is a kernel driver, so making tap unhappy makes the kernel unhappy. Thus the unresponsive kernel.
So the way to deploy this is probably to have both boxes listen on the "real" WAN interface, and have two remote statements on the clients.
http://openvpn.net/howto.html#client
If you look, it has provisions for load balancing between OpenVPN servers. I'll give this a try, see if it resolves our issues.
-
Update - this does not fix the aribtrary system lockup problem. :(
-
Ah ha!!!
http://www.sigmasoft.com/~openbsd/archives/html/openbsd-bugs/2005-08/msg00018.html
Just running out of time to dig into this right now….but this looks like we might have a winner. (Or loser, depending.)
Just realized I hadn't explained myself. I had remote syslogging going, and for the first time I caught the openvpn logging right at the time of the crash. The last line read:
Sep 14 19:10/46 lbfw openvpn[367]: event_wait : Interrupted system call (code=4)
Then nothing until I rebooted, which is what lead me to this article. This happens to be a triple gigabit Hacom box.
More complete log:
Sep 14 11:23:58 lbfw1 openvpn[96002]: TUN/TAP device /dev/tap0 opened Sep 14 11:23:58 lbfw1 openvpn[96002]: /sbin/ifconfig tap0 192.168.168.169 netmask 192.168.168.170 mtu 1500 up Sep 14 11:23:58 lbfw1 openvpn[96002]: UDPv4 link local (bound): x.x.x.x:1194 Sep 14 11:23:58 lbfw1 openvpn[96002]: UDPv4 link remote: [undef] Sep 14 11:23:58 lbfw1 openvpn[96002]: Initialization Sequence Completed Sep 14 11:24:02 lbfw1 openvpn[96002]: 208.231.66.99:52385 Re-using SSL/TLS context Sep 14 11:24:02 lbfw1 openvpn[96002]: 208.231.66.99:52385 LZO compression initialized Sep 14 11:24:03 lbfw1 openvpn[96002]: 208.231.66.99:52385 [Tony_Shadwick] Peer Connection Initiated with 208.231.66.99:52385 Sep 14 11:56:39 lbfw1 openvpn[96002]: 208.231.66.99:52398 Re-using SSL/TLS context Sep 14 11:56:39 lbfw1 openvpn[96002]: 208.231.66.99:52398 LZO compression initialized Sep 14 11:56:40 lbfw1 openvpn[96002]: 208.231.66.99:52398 [Tony_Shadwick] Peer Connection Initiated with 208.231.66.99:52398 Sep 14 18:32:41 lbfw1 openvpn[371]: 208.231.66.99:52637 Re-using SSL/TLS context Sep 14 18:32:41 lbfw1 openvpn[371]: 208.231.66.99:52637 LZO compression initialized Sep 14 18:32:42 lbfw1 openvpn[371]: 208.231.66.99:52637 [Tony_Shadwick] Peer Connection Initiated with 208.231.66.99:52637 Sep 14 18:36:00 lbfw1 openvpn[371]: Tony_Shadwick/208.231.66.99:52637 [Tony_Shadwick] Inactivity timeout (--ping-restart), restarting Sep 14 18:48:32 lbfw1 openvpn[367]: 208.231.66.99:52663 Re-using SSL/TLS context Sep 14 18:48:32 lbfw1 openvpn[367]: 208.231.66.99:52663 LZO compression initialized Sep 14 18:48:33 lbfw1 openvpn[367]: 208.231.66.99:52663 [Tony_Shadwick] Peer Connection Initiated with 208.231.66.99:52663 Sep 14 18:51:09 lbfw1 openvpn[367]: Tony_Shadwick/208.231.66.99:52663 [Tony_Shadwick] Inactivity timeout (--ping-restart), restarting Sep 14 19:10:46 lbfw1 openvpn[367]: event_wait : Interrupted system call (code=4) Sep 14 21:00:31 lbfw1 openvpn[371]: 208.231.66.99:52809 Re-using SSL/TLS context Sep 14 21:00:31 lbfw1 openvpn[371]: 208.231.66.99:52809 LZO compression initialized Sep 14 21:00:32 lbfw1 openvpn[371]: 208.231.66.99:52809 [Tony_Shadwick] Peer Connection Initiated with 208.231.66.99:52809
-
We run FreeBSD. Are you seeing this exact error?
-
Now that you mention it, no. :( kif == NULL probably is shorthand for kernel interface equals null. So likely not it.
That even_wait: Interrupted system call line bugs me though. It's almost as though something tried to interrupt openvpn, failed to do so, and the entire system just sits there in an endless loop waiting for some even to occur that never will. All interfaces stop responding, occassionally I can hit ctrl-alt-del, and the system will attempt a halt, but will never actually be able to fully halt itself.
-
I really should quit this and start winding down for bed, but it's driving me nuts.
Okay, we're dealing with two enigmas here. 1, a tap interface that the system may or may not know what to do with, and 2, and a bridge utilizing that tap interface.
Now I seem to recall that every time there is a change to config.xml, the openvpn process gets killed and relaunched. Is it possible that there's a condition that may be seemingly random to me that might come along and try to reap bridges or individual interfaces, or even the openvpn process itself, fails to do so, and then chases its tail until there are no more resources available to consume?
-
/etc/rc.bootup, line 181.
Seen any harm in moving that down two commands so it comes after openvpn_resync_all();? Theoretically it would mean openvpn would be up, tap0 would be created prior to bridges being brought online, right?
Only thing that comes to mind that runs all the time is /usr/local/sbin/check_reload_status, which is a binary daemon, not php, not a cron job. It appears that it just keeps checking /tmp/check_reload_status, which usually says "sleeping", unless something more interesting is going on. I don't know what it does is there's something more interesting going on though.
-
Promise, last post for the night.
the shellcmd tags DO work, but it requires not one, but two reboots to take effect. I haven't the slightest idea why that is, but upon reboot, nothing happens. Reboot again, it works. ???
Really. Going to go rest now.
Really.