2.3.1 / site-to-site: routing/pf issue after upgrade from 2.2.6

brevilo

Hi,

I've been running an OpenVPN site-to-site installation in a HA/CARP setup on 2.2.6 (four identical hardware nodes) for quite a while now. In order to prepare the production upgrade I upgraded the secondary nodes from 2.2.6 to 2.3.1 and subsequently updated them to 2.3.1_1. Unfortunately the tunnel doesn't work anymore and it seems to be a strange routing or packet filtering issue.

When I'm on the tunnel client node, I can ping hosts in the server's LAN as before - good. However, this only works as long as I use the automatic source IP or the virtual OpenVPN client IP. As soon as I use the client node's LAN IP it doesn't work, as the packets don't even reach the server's virtual tunnel interface anymore (they do reach the client's tunnel interface though). Again, the only thing I did was the upgrade; the routing table is the same; the packet filter rules are the same. I even tried net30 vs. subnet topology but to no avail.

Any idea?

Supay

I am using a very basic Open VPN setup. I just have it configured to allow me and a few friends to connect. I have a client specific override set to assign each user their own static IP and this is then setup as an alias in the firewall rules. I have access to all areas of my network, while my friends are blocked on everything except my NAS so they can access shared files there and other utilities running on it such as my TS server etc and they can also access back out through my WAN device to the internet. it has worked perfectly for ages, I haven't had to do anything to the system.

However, since I ran the upgrade to 2.3.1 it has completely broken. We can all connect to the VPN and I can see everyone being assigned their static IPs as expected, however that is where we end. No-one, including me who has full access, can see any of the devices on my network. We get full packet loss from even a basic ping. I have been looking through trying to work out what is wrong but as far as I can my firewall rules are all correct and everything looks as if it should work, especially as it is exactly the same as before 2.3.1. Has there been some change in how pfsense handles OpenVPN or firewall rules on that version update?

if you solved your issue, please let me know, this is driving me mad!

viragomann

The only thing I noticed is the default setting of "Topology", which is now subnet and was net30 before. And pfSense changed this setting during the upgrade.

So check this setting and check the clients IPs . If that is changed your aliases in firewall rules won't fit anymore.

Another cause could be deprecated advanced options settings. Have you any?
Have you more than one OpenVPN instance running?

brevilo

@viragomann:

The only thing I noticed is the default setting of "Topology", which is now subnet and was net30 before. And pfSense changed this setting during the upgrade.

To be precise, the default changed, not any existing setting. There was a bug in 2.3.0 (AFAIK) which did also change existing configs on upgrade, but that's fixed in 2.3.1 and shouldn't apply here. Anyhow I already tried that as noted above.

Another cause could be deprecated advanced options settings. Have you any?

Only one the client, presumably unrelated since establishing the connection isn't the issue here:

verify-x509-name HOSTNAME name;
ns-cert-type server;

Have you more than one OpenVPN instance running?

What do you mean? More than one instance on one node?

Thanks

viragomann

@brevilo:

Have you more than one OpenVPN instance running?

What do you mean? More than one instance on one node?

Yes, if there are running multiple instances on one node you've to assign an interface to each VPN instance for correct routing, otherwise pfSense handles the virtual VPN interfaces as an interface group.
But I think that won't be the case in your installation.

@brevilo:

When I'm on the tunnel client node, I can ping hosts in the server's LAN as before - good. However, this only works as long as I use the automatic source IP or the virtual OpenVPN client IP. As soon as I use the client node's LAN IP it doesn't work

Looks like the server has no route for the clients LAN. So double check the routes at server site.
Packets from clients node are not affected by client sites filter rule, just by these ones at server site.

@brevilo:

as the packets don't even reach the server's virtual tunnel interface anymore (they do reach the client's tunnel interface though).

Have you checked this by packet capture or is this just you assumption?

brevilo

@viragomann:

Yes, if there are running multiple instances on one node you've to assign an interface to each VPN instance for correct routing, otherwise pfSense handles the virtual VPN interfaces as an interface group.
But I think that won't be the case in your installation.

It's just one instance and it's assigned to the virtual WAN interface (CARP) as before.

Looks like the server has no route for the clients LAN. So double check the routes at server site.
Packets from clients node are not affected by client sites filter rule, just by these ones at server site.

Hm, I don't think that's the problem here as the client->server ping gets lost depending on the source IP (see above/below). Also, the routing table looks the same as with 2.2.6 and includes a route to the client LAN via the tunnel network via the ovpnc1 interface.

@brevilo:

as the packets don't even reach the server's virtual tunnel interface anymore (they do reach the client's tunnel interface though).

Have you checked this by packet capture or is this just you assumption?

The former, of course. Thus it appears to be an OpenVPN-internal issue. I even diff'ed the effective pf rules (2.2.6 vs 2.3.1) but they too do not show significant differences.

brevilo

Update: I just updated client and server from 2.3.1_1 to 2.3.1_5 and things do work again! Since the changelog of 2.3.1_5 doesn't contain anything related I presume the internal updates 2 to 4 did the trick somehow. Go figure…

brevilo

Too bad, the issue is back, without any config changes. I believe this is an issue between CARP and OpenVPN. So for me 2.3.1 is still broken :(

viragomann

So please check the routes on the server node (Diagnostic > Routes).
If OpenVPN is configured correctly there has to be a route to clients LAN using clients OpenVPN IP as gateway.

brevilo

I know, and I did all that (see above, just confirmed it again). I'm currently experimenting with a third pair of nodes, not using CARP. I also think that I'm affected by https://redmine.pfsense.org/issues/6499…

brevilo

Ok, here's concrete example on the test pair (no CARP, currently with subnet topology) I just mentioned.

Routing table on the server (relevant excerpt):


192.168.10.0/24		192.168.100.2	UGS	1687	1500		ovpns1	
192.168.100.0/24	192.168.100.1	UGS	0	1500		ovpns1	
192.168.100.1		link#9		UHS	0	16384		lo0	
192.168.100.2		link#9		UH	24	1500		ovpns1

When I capture ICMP on the server's and the client's OpenVPN interfaces respectively, and ping the client from the server, I get the following:

server-src: LAN / client-dst: OpenVPN: ok (request/reply seen on server and client)
server-src: LAN / client-dst: LAN: fails (request seen on server, no request seen on client)
server-src: OpenVPN / client-dst: LAN: fails (request seen on server, no request seen on client)

That means for some reason all packets (requests) targeting the client LAN are sent via the server's OpenVPN interface but they never appear at the client's OpenVPN interface. This is not a routing issue on the server itself, as the packets seem to leave the server as expected. That's why I described this as an "OpenVPN-internal issue" above…

viragomann

@brevilo:

server-src: LAN / client-dst: LAN: fails (request on server, no request on client)

Have you taken this capture at the client node or at server?

brevilo

Both, as described above. The results are in parentheses.

viragomann

And you're sure that the client node is the default gateway at the hosts behind?

brevilo

That's not the issue here. When I said "client-dst: LAN" in the tests above I meant the VPN client's LAN address/interface, so the client's LAN nodes behind that NIC aren't of interest here (but yes, they do have proper routes). Also, how should client LAN routes affect echo requests "getting lost" between VPN server and client, in the tunnel itself…?

Again, the whole setup worked just fine until I upgraded to 2.3.1.

Soyokaze

To which interface OpenVPN servers are binded?
Which NAT setting do you use (default, hybrid etc)?
Do you use dynamic routing (OSPF for ex) or all links are static binded?

brevilo

@pan_2:

To which interface OpenVPN servers are binded?
Which NAT setting do you use (default, hybrid etc)?
Do you use dynamic routing (OSPF for ex) or all links are static binded?

WAN (of course?)
Hybrid, with source = remote LAN and NAT = local LAN/CARP interface (and client and server respectively)
Not sure what you mean. There's no special routing apart from the OpenVPN site-to-site settings.

brevilo

Update: ok, my third test pair is working again. That particular setup got screwed up when I disabled its iroute for testing purposes :-[ So I'm now back at [url=https://forum.pfsense.org/index.php?topic=113151.msg636321#msg636321]reply #9 where my test rig works with 2.3.1_5 but my production rig using CARP needs further debugging.

Stay tuned…

Soyokaze

@brevilo:

WAN (of course?)

Hybrid, with source = remote LAN and NAT = local LAN/CARP interface (and client and server respectively)

Not sure what you mean. There's no special routing apart from the OpenVPN site-to-site settings.

1. Not always. Sometimes I bind it localhost, sometimes to VIP.
3. Then it is ''static''.
2. >>with source = remote LAN and NAT = local LAN/CARP interface
Err? You NATing your openvpn network to local LAN? Or I did not understood something?

Maybe you draw network diagram? Will help a little to understand your topology.

NB: I have plenty S2S links, with outbound NAT (traffic redirection) and inbound (D/PNAT), everything works OK on 2.3+

brevilo

Err? You NATing your openvpn network to local LAN?

No, why would I? I source NAT the respective remote LAN on each side, that is the client LAN on the server side and vice versa.

The topology is a straight-forward site-to-site with HA/failover: ClientLAN–-Client===Server---ServerLAN (with client and server using CARP, so two nodes each with virtual IPs on LAN and WAN).

Cheers