Site-to-Site OpenVPN problem on 2.7.0, possibly affected by Outbound NAT

frater

@nazelus said in No Site-to-Site VPN after upgrading CE from 2.6.0 to 2.7.0:

ERROR: FreeBSD route add command failed: external program exited with error status: 1

Might be this

It wouldn't be the first time that some feature isn't working while the expert/developer insists that it should work without testing it with the current version.

@jimp Can you setup a VPN between 2 pfsense 2.7 boxes and test the LAN-device <=> LAN-device connectivity???

It's not that I know these people who are also having this problem.
It's too easy to say it's user error if you can't show it works NOW.

frater

@nazelus

I see you did not change your VPN following the tutorial that was posted to use certificates.

Like you, I also used the "Peer to Peer Shared key" before, but if something is not working I prefer to follow the advice of following a tutorial to the letter. This is why I completely setup the server and client using "Peer to Peer SSL/TLS"

I admit it doesn't change one bit in the end where it counts, but for debugging purposes I think it's best to stay with the script.

Are you willing to set everything up from scratch?
I think I will now try to setup the 2nd client. Maybe that one does work.

frater

Well, well.....

I have a little progress....

I just configured a 2nd OVPN-client and that one was able to ping to all LAN-devices from a LAN-device.
LAN-devices on the other client and on the server are still unable to reach other devices.

For completeness I would like to write that I'm using /23 networks instead of /24 networks.
For 2 locations I changed the /24 to /23 because other IT companies were putting devices in the LAN with a static address.
By switching to /23 I could move a lot of the DHCP-clients out of the way of these static devices and have less chance that they put new ones there.

192.168.1.1/24 became 192.168.1.1/23 turning the network to 192.168.0.0/23
192.168.17.1/24 became 192.168.17.1/23 turning the network to 192.168.16.0/23

192.168.18.1/23 was a /23 from the start...

Of course I'm using /23 networks in my oVPN setup
I'm writing this because the 192.168.18.1/23 is working and it's the only router that's also in the 192.168.x.0/24 network
Maybe there's some awkward bug where a /24 is somewhere hardcoded.

Anyhow....
I will now be focusing on the differences between the 2 clients.
Maybe I will remove the first client and set it up again..

frater

SUCCESS

I solved it on my box and it was indeed something of a misconfiguration...

I still need to test some more, but I already have several LAN-devices that can ping other LAN-devices on remote networks.

Because the newly configured oVPN-client was (partially) working and the other oVPNclient not, I started to focus on the network that didn't have a connection.

That's the 192.168.0.0/23 network

I started with a grep -C5 '192.168.1.' /cf/conf/config.xml and noticed some outbound NAT rules to 192.168.1.0/24

There are no networks like that on that server.

That router was configured with hybrid outbound NAT and when I set it up I used an existing configuration of another router, deleting everything I didn't need.
I think this outbound rule was created for a Vigor bridged modem, which this location didn't have.
On setting it up 2 years ago I deleted that interface.
It seems now that this rule in outbound NAT settings should have been deleted manually as well

After I deleted the 2 outbound rules and restarted all the oVPN instances it still didn't work,
so I rebooted the whole router and.... tadaaa..... it worked.

I haven't checked everything, but it feels good

frater

I didn't create a screenshot of the outbound rules for 192.168.1.0/24 and because I removed those entries I can't make one now, but my config still has this orphaned outbound ruleset

Another network which doesn't exist anymore on this box.

I have no reason to return to the old config as the "shared key" seems to be deprecated, so I will leave it like this.

I wonder what the culprit is on your boxes.
Do take a peak at the outbound NAT rules and see if there are any orphans..

frater

@nazelus

You are now using the certificates instead of shared key?
Not that it should matter, but best is to start out with a recommended configuration.

server 192.168.17.1/23
client 192.168.18.1/23
client 192.168.1.1/23

here I'm pinging 192.168.1.4 from an access point on 192.168.19.20

# ifconfig | grep 192
          inet addr:192.168.19.20  Bcast:192.168.19.255  Mask:255.255.254.0
# ping -c2 192.168.1.4
PING 192.168.1.4 (192.168.1.4): 56 data bytes
64 bytes from 192.168.1.4: seq=0 ttl=61 time=18.857 ms
64 bytes from 192.168.1.4: seq=1 ttl=61 time=21.602 ms

--- 192.168.1.4 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 18.857/20.229/21.602 ms

frater

I previously reported that site-to-site was working after I removed an outbound NAT-rule.
This turned out to be not entirely true.

To test this I logged into a device on the site "clientC" and pinged a device on "clientB"
This worked...

Device on clientC:

# ifconfig | grep 192
          inet addr:192.168.19.20  Bcast:192.168.19.255  Mask:255.255.254.0
# ping -c2 192.168.1.4
PING 192.168.1.4 (192.168.1.4): 56 data bytes
64 bytes from 192.168.1.4: seq=0 ttl=61 time=18.857 ms
64 bytes from 192.168.1.4: seq=1 ttl=61 time=21.602 ms

--- 192.168.1.4 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 18.857/20.229/21.602 ms

I now did the same on that same pingable device on "clientB" and tried to ping the device on "clientC" this did NOT work.

Device on ClientB

[~] # ifconfig eth3 | grep 192.168.
          inet addr:192.168.1.4  Bcast:192.168.1.255  Mask:255.255.254.0
[~] # ping -c2 192.168.19.20
PING 192.168.19.20 (192.168.19.20): 56 data bytes

--- 192.168.19.20 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss

ClientB itself:

ifconfig igb0 | grep 192
        inet 192.168.1.1 netmask 0xfffffe00 broadcast 192.168.1.255
[2.7.0-RELEASE][root@pfSense.filmhallen.lan]/root: ping -c2 192.168.19.20
PING 192.168.19.20 (192.168.19.20): 56 data bytes
92 bytes from 10.0.16.1: Redirect Host(New addr: 10.0.16.2)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 65ca   0 0000  3f  01 2820 10.0.16.3  192.168.19.20

64 bytes from 192.168.19.20: icmp_seq=0 ttl=62 time=20.493 ms
92 bytes from 10.0.16.1: Redirect Host(New addr: 10.0.16.2)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 7c93   0 0000  3f  01 1157 10.0.16.3  192.168.19.20

64 bytes from 192.168.19.20: icmp_seq=1 ttl=62 time=19.468 ms

--- 192.168.19.20 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 19.468/19.980/20.493/0.512 ms

I'm getting a reply from 10.0.16.3
This is not the IP from the device on clientC (that's 192.168.19.20), but the oVPN-address of router clientB.

That source address should be translated be the router, I would think.

jimp

I forked this off into a new thread so it would all be together since it's likely a different issue than the post it was on before.

What do the state table entries on each firewall look like when you try those ping tests?

frater

@jimp

serverA = 192.168.17.1/23
clientB = 192.168.1.1/23
clientC = 192.168.18.1/23

From device 192.168.1.4/23 I'm unsuccesfully pinging to 192.168.19.209/23

My guess is that the WAN-interface shouldn't be there.

From device 192.168.19.209/23 I'm succesfully pinging to 192.168.1.4/23

jimp

That WAN interface state definitely shouldn't be there, which means the two most likely causes are:

There is no route in the table on that firewall for 198.168.19.0/23 so it falls through to the default route and out WAN
The LAN firewall rules on there have a gateway set and are forcing the traffic out WAN

As an extra protection against 1, consider adding reject rules on the Floating tab, quick, outbound, on your WAN(s), matching a destination of private networks (either an alias or a large enough mask to catch them all, such as 192.168.0.0/16). That will stop potentially private traffic from attempting to exit the WAN. Having that set to log is probably also a good idea.

frater

@jimp

There was/is indeed a gateway rule on the LAN, but I disabled it just now....

frater

@frater

removing the gateway rule on the LAN tab was sufficient to get that WAN state gone
I still can't ping to 192.168.19.209 from 192.168.1.4

jimp

Since it appears to be making it to the VPN there, now you'd check the states, rules, and routing on the other nodes. Make sure the OpenVPN rules allow it on both the serverA and clientC firewalls, and check the states along each leg.

frater

@jimp

I checked the LAN firewall again and noticed an autocreated rule pfB_PRI1_v4 of pfBlockerNG.
I removed it from pfBlockerNG and it started to work.

It was the ClientC network that was able to ping itself, but couldn't be pinged....

I'm still having this when I ping from pfsense clientB to pfsense clientC
I'm getting the answer from the oVPN ip if I don't give a source address:

/root: ping -c2 -S 192.168.1.1 192.168.18.1
PING 192.168.18.1 (192.168.18.1) from 192.168.1.1: 56 data bytes
64 bytes from 192.168.18.1: icmp_seq=0 ttl=63 time=11.856 ms
64 bytes from 192.168.18.1: icmp_seq=1 ttl=63 time=13.479 ms

--- 192.168.18.1 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 11.856/12.668/13.479/0.812 ms
[2.7.0-RELEASE][root@pfSense.filmhallen.lan]/root: ping -c2 192.168.18.1
PING 192.168.18.1 (192.168.18.1): 56 data bytes
92 bytes from 10.0.16.1: Redirect Host(New addr: 10.0.16.2)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 f430   0 0000  3f  01 9acd 10.0.16.2  192.168.18.1

64 bytes from 192.168.18.1: icmp_seq=0 ttl=63 time=14.299 ms
92 bytes from 10.0.16.1: Redirect Host(New addr: 10.0.16.2)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 83fc   0 0000  3f  01 0b02 10.0.16.2  192.168.18.1

64 bytes from 192.168.18.1: icmp_seq=1 ttl=63 time=11.461 ms

--- 192.168.18.1 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 11.461/12.880/14.299/1.419 ms

jimp

The redirect is sort of expected due to how OpenVPN handles its routing. It usually stuffs a dummy address in the table as a destination just to make sure the traffic gets handed off to OpenVPN and then OpenVPN deals with it from there, but depending on what is hitting what it may end up getting that kind of response.

As long as the traffic goes through it's not a huge concern.

frater

@jimp

I still have a problem pinging 192.168.19.209 even though I can ping it from the network itself.
It's a Windows machine, so I think that's a problem with that firewall not accepting a connection from other LANs

I moved an AP from 192.168.19.20 to 192.168.19.210 and I was able to ping it....
I will revisit this thread if I find out it has to be something else...

jimp

That sounds like a local network config issue on the target system. There are some cases where Windows will only accept inbound traffic from its own subnet unless it thinks it's on a certain type of network. Like if it's set to public vs private but maybe not exactly that.

If you need to fudge that you could setup a hybrid outbound NAT rule on the LAN to make the source of traffic appear to be the local network, but that can break or complicate certain protocols. It's best to fix the local network config on the client system.