2.3.1 / site-to-site: routing/pf issue after upgrade from 2.2.6

brevilo

The only thing I noticed is the default setting of "Topology", which is now subnet and was net30 before. And pfSense changed this setting during the upgrade.

To be precise, the default changed, not any existing setting. There was a bug in 2.3.0 (AFAIK) which did also change existing configs on upgrade, but that's fixed in 2.3.1 and shouldn't apply here. Anyhow I already tried that as noted above.

Another cause could be deprecated advanced options settings. Have you any?

Only one the client, presumably unrelated since establishing the connection isn't the issue here:

verify-x509-name HOSTNAME name;
ns-cert-type server;

Have you more than one OpenVPN instance running?

What do you mean? More than one instance on one node?

Thanks

viragomann

@brevilo:

Have you more than one OpenVPN instance running?

What do you mean? More than one instance on one node?

Yes, if there are running multiple instances on one node you've to assign an interface to each VPN instance for correct routing, otherwise pfSense handles the virtual VPN interfaces as an interface group.
But I think that won't be the case in your installation.

@brevilo:

When I'm on the tunnel client node, I can ping hosts in the server's LAN as before - good. However, this only works as long as I use the automatic source IP or the virtual OpenVPN client IP. As soon as I use the client node's LAN IP it doesn't work

Looks like the server has no route for the clients LAN. So double check the routes at server site.
Packets from clients node are not affected by client sites filter rule, just by these ones at server site.

@brevilo:

as the packets don't even reach the server's virtual tunnel interface anymore (they do reach the client's tunnel interface though).

Have you checked this by packet capture or is this just you assumption?

brevilo

@viragomann:

Yes, if there are running multiple instances on one node you've to assign an interface to each VPN instance for correct routing, otherwise pfSense handles the virtual VPN interfaces as an interface group.
But I think that won't be the case in your installation.

It's just one instance and it's assigned to the virtual WAN interface (CARP) as before.

Looks like the server has no route for the clients LAN. So double check the routes at server site.
Packets from clients node are not affected by client sites filter rule, just by these ones at server site.

Hm, I don't think that's the problem here as the client->server ping gets lost depending on the source IP (see above/below). Also, the routing table looks the same as with 2.2.6 and includes a route to the client LAN via the tunnel network via the ovpnc1 interface.

@brevilo:

as the packets don't even reach the server's virtual tunnel interface anymore (they do reach the client's tunnel interface though).

Have you checked this by packet capture or is this just you assumption?

The former, of course. Thus it appears to be an OpenVPN-internal issue. I even diff'ed the effective pf rules (2.2.6 vs 2.3.1) but they too do not show significant differences.

brevilo

Update: I just updated client and server from 2.3.1_1 to 2.3.1_5 and things do work again! Since the changelog of 2.3.1_5 doesn't contain anything related I presume the internal updates 2 to 4 did the trick somehow. Go figure…

brevilo

Too bad, the issue is back, without any config changes. I believe this is an issue between CARP and OpenVPN. So for me 2.3.1 is still broken :(

viragomann

So please check the routes on the server node (Diagnostic > Routes).
If OpenVPN is configured correctly there has to be a route to clients LAN using clients OpenVPN IP as gateway.

brevilo

I know, and I did all that (see above, just confirmed it again). I'm currently experimenting with a third pair of nodes, not using CARP. I also think that I'm affected by https://redmine.pfsense.org/issues/6499…

brevilo

Ok, here's concrete example on the test pair (no CARP, currently with subnet topology) I just mentioned.

Routing table on the server (relevant excerpt):


192.168.10.0/24		192.168.100.2	UGS	1687	1500		ovpns1	
192.168.100.0/24	192.168.100.1	UGS	0	1500		ovpns1	
192.168.100.1		link#9		UHS	0	16384		lo0	
192.168.100.2		link#9		UH	24	1500		ovpns1

When I capture ICMP on the server's and the client's OpenVPN interfaces respectively, and ping the client from the server, I get the following:

server-src: LAN / client-dst: OpenVPN: ok (request/reply seen on server and client)
server-src: LAN / client-dst: LAN: fails (request seen on server, no request seen on client)
server-src: OpenVPN / client-dst: LAN: fails (request seen on server, no request seen on client)

That means for some reason all packets (requests) targeting the client LAN are sent via the server's OpenVPN interface but they never appear at the client's OpenVPN interface. This is not a routing issue on the server itself, as the packets seem to leave the server as expected. That's why I described this as an "OpenVPN-internal issue" above…

viragomann

@brevilo:

server-src: LAN / client-dst: LAN: fails (request on server, no request on client)

Have you taken this capture at the client node or at server?

brevilo

Both, as described above. The results are in parentheses.

viragomann

And you're sure that the client node is the default gateway at the hosts behind?

brevilo

That's not the issue here. When I said "client-dst: LAN" in the tests above I meant the VPN client's LAN address/interface, so the client's LAN nodes behind that NIC aren't of interest here (but yes, they do have proper routes). Also, how should client LAN routes affect echo requests "getting lost" between VPN server and client, in the tunnel itself…?

Again, the whole setup worked just fine until I upgraded to 2.3.1.

Soyokaze

To which interface OpenVPN servers are binded?
Which NAT setting do you use (default, hybrid etc)?
Do you use dynamic routing (OSPF for ex) or all links are static binded?

brevilo

@pan_2:

To which interface OpenVPN servers are binded?
Which NAT setting do you use (default, hybrid etc)?
Do you use dynamic routing (OSPF for ex) or all links are static binded?

WAN (of course?)
Hybrid, with source = remote LAN and NAT = local LAN/CARP interface (and client and server respectively)
Not sure what you mean. There's no special routing apart from the OpenVPN site-to-site settings.

brevilo

Update: ok, my third test pair is working again. That particular setup got screwed up when I disabled its iroute for testing purposes :-[ So I'm now back at [url=https://forum.pfsense.org/index.php?topic=113151.msg636321#msg636321]reply #9 where my test rig works with 2.3.1_5 but my production rig using CARP needs further debugging.

Stay tuned…

Soyokaze

@brevilo:

WAN (of course?)

Hybrid, with source = remote LAN and NAT = local LAN/CARP interface (and client and server respectively)

Not sure what you mean. There's no special routing apart from the OpenVPN site-to-site settings.

1. Not always. Sometimes I bind it localhost, sometimes to VIP.
3. Then it is ''static''.
2. >>with source = remote LAN and NAT = local LAN/CARP interface
Err? You NATing your openvpn network to local LAN? Or I did not understood something?

Maybe you draw network diagram? Will help a little to understand your topology.

NB: I have plenty S2S links, with outbound NAT (traffic redirection) and inbound (D/PNAT), everything works OK on 2.3+

brevilo

Err? You NATing your openvpn network to local LAN?

No, why would I? I source NAT the respective remote LAN on each side, that is the client LAN on the server side and vice versa.

The topology is a straight-forward site-to-site with HA/failover: ClientLAN–-Client===Server---ServerLAN (with client and server using CARP, so two nodes each with virtual IPs on LAN and WAN).

Cheers

brevilo

Darn, I think I finally found the culprit: I had an old IPSec config on those boxes (preferred VPN solution but never got it work reliably) which was inactive but not disabled. It seems as if this messes up some internal routing (not reflected by the routing table). It also seems that this is a regression in 2.3.? since 2.2.6 still doesn't have this problem! IIRC, the enable/disable implementation of IPSec changed in 2.3 so that would explain it.

That was a mean one…

Cheers