WG not routing or sending traffic

xxGBHxx

@dma_pf
Ok I've spent an hour going through this this morning. What I have found is simply adding the third interface breaks it.

As I mentioned, I've had it running 3 days with only 2 interfaces and it has continued to work without issue. This is what is shown in the various screenshots for the WORKING config. I will follow these with the changes that broke the config. (by the way there's no port forwarding as the remote is the VPN provider who never initiate and inbound connection)

Architecture Network Diagram
Network Diagram.jpg

Working DNS
Working DNS.JPG

Working Gateway
Working Gateway.JPG

Working LAN FW Rules
Working LAN FW Rule.JPG

Working VPN Firewall Rule
Working VPN FW Rule.JPG

Working WAN Firewall Rule
Working WAN FW Rule.JPG

Working Outbound NAT
Working Outbound NAT.JPG

Working WG Tunnel Rule
Working WG Tunnel Info 1.JPG

Working WG Peer Rule
Working WG Tunnel Info 2.JPG

So this is where it has been for the past 72 hours. Everything has been working perfectly fine (though to be fair I only have a single machine testing this but ever time I've gone to check it's worked without a problem).

I then did the following

In VMWARE
Add new "physical" Interface on the network of my other Internet connection (VMX2)

In PF

Interfaces add new interface VMX2
Interfaces OPT2
-- Description "CleanWAN"
-- Static IPv4
-- Address 192.168.2.52/24
-- Add a new gateway
-- Gateway Name "CleanWAN_GW"
-- Gateway IP 192.168.2.254
-- Gateway Description "Clean Internet Feed"
Enable Interface

Then "Apply Changes"

Without touching or doing anything else at this point I reboot to check everything is working and it no longer does.

NOT Working Gateway
NOT Working Gateway.JPG

NOT Working CleanWAN Interface

Absolutely nothing else in the setup has changed or has been touched at this point. Unless I'm missing something fundamental (which I might be!) there is absolutely no reason this should stop working at this point.

I can always instantly tell if it's stopped working because the dashboard shows unable to tell if there's an update

Not Working Front Page.JPG

Another point to note is that it continues to work right up until the point I re-boot. So the changes I apply to enable VMX2 and the CleanWAN interface work until I reboot which I think is why I've been confused/unsure what's been causing it.

So where do I go from here?

G

dma_pf

@xxgbhxx Thanks for your last post. I'm a little bit confused so hopefully you can answer a couple of questions. So here they are:

In the Architecture Network Diagram you have ISP1 and ISP2. Are these 2 actually different ISP's with unique IP addresses out to the internet? Can you describe their connection out to the net (cable, fiber, wireless, etc.)?
In the Architecture Network Diagram are Vmx1 and Vmx2 both virtual adapters? Are they bound to the the same physical NIC? If so are they bound to the same port on the NIC? How many ports are on that NIC?
Is pfSense installed virtually and hosted by Vmware? Are both Vmx1 and Vmx2 devices configured within the pfSense virtual machine settings?
How are you planning to determine what needs to go out to either Vmx1 or Vmx2? (certain devices, particular destinations, failover, etc.)

xxGBHxx

@dma_pf

Yes they are completely different ISP's on completely different equipment (actually two completely different companies) with completely unique IP's on the internet. They're both Fibre to the Cabinet (FTTC) connections that come into the house on copper. They are even completely different piece of copper.
Yes Vmx0, 1 and 2 are three completely different virtual adapters. They are bound to the same NIC but it's a quad port NIC so each virtual network is on it's own physical port. It's a direct 1 to 1 connection.
Yes pfSense is VMWare hosted (and I've been running it in this config for 10 years). Vmx1 and 2 are virtual NIC's assigned to this VM in VSphere. Those two NIC's are on two virtual switches in VMware. Each switch is assigned a different physical uplink. One uplink is plugged into one ISP router, the other uplink is plugged into the other router.
As you can see from the rule above, it's set to send all LAN traffic to that interface. For the CleanWAN interface/connection, within pfBlocker I create alias which maps the ASN's for Netflix and Amazon Prime to a single Alias called "Clear_Connect". Once that's done I then put a new PBR rule above that LAN one for all IP's in those ASN's to route on the clear interface. This is how I've been using PF for the past 10 years with OpenVPN on he same ISP.

G

AB5G

@xxgbhxx You default gateway IPv4 cannot be WIREGUARD_IVPN_WGV4. Set it to WAN and reboot / restart the WireGuard tunnel. It should start working then

xxGBHxx

@ab5g

The VPN providers own guide says to set it to the WG interface as do many other guides.

Can't look past it was working with 2 interfaces set to that. Are you suggesting that adding that thirsd interface means I can't leave it set to the WG interface?

Either way I've just tried it and it made no difference so I don't think it's that.

Thanks for the suggestion though.

G

AB5G

@xxgbhxx - The default gateway should is for your entire system, I'm comfortable in having it set to the WAN links and then use policy based routing for the source IP's that I want to use the tunnel.
With default gateway for the system set as 'WIREGUARD_IVPN_WGV4' - you are telling the system to use it for everything. When the tunnel goes down - it may render things unreachable. Anyways, I've seen more predictable performance when not seeing the default gateway to the 'WIREGUARD_IVPN_WGV4'.

Also from the Netgate docs
"Before assigning the interface, make sure default gateway for the firewall is not set to Automatic or the firewall may end up using the wg interface as the default gateway, which is unlikely to be the desired outcome. "

You can delete the WG interface, reboot once - then set the default GW to WAN and then add the WG interface. After this add a route in LAN to match source host that you want to send over the tunnel.

xxGBHxx

@ab5g said in WG not routing or sending traffic:

@xxgbhxx - The default gateway should is for your entire system, I'm comfortable in having it set to the WAN links and then use policy based routing for the source IP's that I want to use the tunnel.
With default gateway for the system set as 'WIREGUARD_IVPN_WGV4' - you are telling the system to use it for everything. When the tunnel goes down - it may render things unreachable. Anyways, I've seen more predictable performance when not seeing the default gateway to the 'WIREGUARD_IVPN_WGV4'.

Also from the Netgate docs
"Before assigning the interface, make sure default gateway for the firewall is not set to Automatic or the firewall may end up using the wg interface as the default gateway, which is unlikely to be the desired outcome. "

You can delete the WG interface, reboot once - then set the default GW to WAN and then add the WG interface. After this add a route in LAN to match source host that you want to send over the tunnel.

As I see it you have 2 choices

EVERYTHING goes VPN and PBR stuff on WAN that doesn't
EVERYTHING goes WAN and PBR stuff on VPN that doesn't

You're suggesting 2 I've configured 1.

Either way it shouldn't matter, both should work. Option 1 DID work when I only had 2 interfaces.

For my system everything DOES go down the VPN except for a small subset of streaming services that don't allow VPN. I've not set up that PBR yet (but it's what @dma_pf was asking in question 4 above). Right now I'm not worried about that PBR bit.

Anyway, I did as you suggested. I deleted the WG interface and rebooted. I set the default GW to WAN and then re-added the WG interface. I then re-added the WG FW rule any/any and I've left the same LAN rule as above that tells all LAN traffic to use the tunnel. That didn't work.

I don't think it's anything to do with the way I've done it. I just don't think it's working as intended. There's indications that there's something "wrong" with PF and WG as theres another thread on PBR with two interfaces not working as instended/expected.

Keep the suggestions coming happy to try anything at this point.

G

dma_pf

@xxgbhxx said in WG not routing or sending traffic:

As I see it you have 2 choices

EVERYTHING goes VPN and PBR stuff on WAN that doesn't

EVERYTHING goes WAN and PBR stuff on VPN that doesn't

I've always used option 2. I've got interfaces for 2 native networks, 4 vlans a site-to-site OpenVPN tunnel, and 3 Wireguard connections to IVPN. My Default Gateway IPv4 is set to WAN_DHCP and the Default Gateway IPv6 is set to None. I then policy route my traffic either out the WAN or through IVPN (Wireguard) in rules per each interface.

In PF

Interfaces add new interface VMX2
Interfaces OPT2
-- Description "CleanWAN"
-- Static IPv4
-- Address 192.168.2.52/24
-- Add a new gateway
-- Gateway Name "CleanWAN_GW"
-- Gateway IP 192.168.2.254
-- Gateway Description "Clean Internet Feed"
Enable Interface

Then "Apply Changes"
Without touching or doing anything else at this point I reboot to check everything is working and it no longer does.

It seems very strange that just adding an interface would make the others not work. I'm assuming you've tried to ping out to the internet (8.8.8.8 and google.com) with a Source Address of WAN, Wireguard_IVPN and CleanWAN? Do you any get responses?

xxGBHxx

@dma_pf said in WG not routing or sending traffic:

@xxgbhxx said in WG not routing or sending traffic:

As I see it you have 2 choices

EVERYTHING goes VPN and PBR stuff on WAN that doesn't

EVERYTHING goes WAN and PBR stuff on VPN that doesn't

I've always used option 2. I've got interfaces for 2 native networks, 4 vlans a site-to-site OpenVPN tunnel, and 3 Wireguard connections to IVPN. My Default Gateway IPv4 is set to WAN_DHCP and the Default Gateway IPv6 is set to None. I then policy route my traffic either out the WAN or through IVPN (Wireguard) in rules per each interface.

In PF

Interfaces add new interface VMX2
Interfaces OPT2
-- Description "CleanWAN"
-- Static IPv4
-- Address 192.168.2.52/24
-- Add a new gateway
-- Gateway Name "CleanWAN_GW"
-- Gateway IP 192.168.2.254
-- Gateway Description "Clean Internet Feed"
Enable Interface

Then "Apply Changes"
Without touching or doing anything else at this point I reboot to check everything is working and it no longer does.

It seems very strange that just adding an interface would make the others not work. I'm assuming you've tried to ping out to the internet (8.8.8.8 and google.com) with a Source Address of WAN, Wireguard_IVPN and CleanWAN? Do you any get responses?

It's just a consequence of my simple setup. 99% of my traffic goes via VPN so it makes sense to me that's the default.

Yes it's strange, hence why I'm so confused. Right now, in it's broken state when I go to a command prompt on the firewall

No Interface Specified
8.8.8.8 no response
1.1.1.1 no response
192.168.1.1 (Internet gateway) responds
google.com (doesn't resolve as the DNS is the other side of the WG connection)

Specify WAN (192.168.1.52)
8.8.8.8 responds
1.1.1.1 responds
192.168.1.1 responds
google.com no response (can't resolve)

Specify Wireguard_IVPN (172.26..)
8.8.8.8 no response
1.1.1.1 no response
192.168.1.1 responds
google.com no response (can't resolve)

Specify CleanWAN (192.168.2.52)
8.8.8.8 responds
1.1.1.1 responds
google.com no response (can't resolve)

Perhaps the interesting one for me is the fact that using wg0 as the source interface it gets a response from the WAN gateway. I wouldn't expect the tunnel to be able to see that IP as it's on a completely different subnet range. The firewall has to be rooting somehow but I'm not quite sure how/why.

But apart from that it's pretty much responding as I'd expect. DNS doesn't work as it's not contactable through the VPN. Bypassing the VPN means everything you'd expect to be contactable is contactable.

I'm really at a loss. I am looking at a number of the other posts on this forum though and it's clear that WG isn't probably production ready yet. There's a number of people seemingly having a lot of "odd" behaviour, especially with multiple connections. If I hadn't built from complete scratch I'd put it down to some legacy issue with my existing firewall but this is a brand new 2.5 build with absolutely zero other config on it before hand. You're absolutely right, it should break just by adding an interface and yet, here we are.

Thanks again for your effort it really is hugely appreciated.

G

dma_pf

@xxgbhxx said in WG not routing or sending traffic:

Specify Wireguard_IVPN (172.26..)
8.8.8.8 no response
1.1.1.1 no response
192.168.1.1 responds
google.com no response (can't resolve)

Try changing the Endpoint Address of the IVPN Peer from gb2.wg.ivpn.net to its actual IP address of 185.59.221.225 and repeat this test.

xxGBHxx

@dma_pf

Rebooted after changing and exactly the same result on all interfaces.

G

dma_pf

@xxgbhxx Something is keeping the Wireguard_IVPN from getting to the IVPN server. The reason I suggested changing the Endpoint Address to the IP address from the FQDN is because your DNS is set to use the DNS servers of IVPN. So it would make sense that they would not connect to the FQDN as it would not be able to resolve the FQDN and therefore fail to make the connection. But that did not resolve the issue. So I would think at this point it would be a NAT or Firewall rule issue.

For the time being I think it would be helpful to try the following:

Verify that the settings for the Wireguard Tunnel are as follows:

The Interface wg0 Address needs to match the IP address tied to your key in the Key Management section of your IVPN account. The CIDR mask has to be /32. Make sure that Public Key for the interface matches exactly the key you used in the Key Management section to generate the IP address in IVPN.
In the PEER section of the tunnel continue to use the IP address of 185.59.221.255. Make sure that the complete public key, x0BTRaxsdxAd58ZyU2YMX4bmuj+Eg+8/urT2F3Vs1n8= is entered exactly. The Peer Wireguard Address needs to be 172.16.0.1. Port 2049 and keep alive 25 are ok.
Save and test as above.

If 1 above didn't fix it then I'd suggest:

Checking your Firewall Rules for the Wireguard_VPN interface. In my setup there are no rules...it is empty.
Checking your LAN Rules. For now I would create a rule as follows:
- Protocol: IPv4 Any
- Source, Port, Destination, Port: Any
- Gateway: Wireguard_IVPN
Make sure you place the rule high enough in your rule order so that packets hit it. I'd make sure the rule is set for logging for now.

Modify your NAT Rules shown in the Working Outbound NAT picture you posted above. For now I would create a rule as follows:

WAN Rule
- Interface: WAN
- Source: 172.17.10.0/24 and Source Port: Blank
- Destination and Destination Port: Blank
- NAT Address: WAN Address
Wireguard_IVPN Rule
- Interface: Wireguard_IVPN
- Source: 172.17.10.0/24 and Source Port: Blank
- Destination and Destination Port: Blank
- NAT Address: Wireguard_IVPN Address

This will duplicate the NAT and Firewall rules in my set up. Test along the way.....I would suggest doing so from Diagnostics/Ping in the GUI. At this point the only thing that I can see so far that would be different would be that I use unbound for DNS, my Default Gateway is set to WAN and I don't use multi WAN.

The multi WAN has me a bit puzzled because I've just never worked with it. I can't think my way through it yet as to whether or not there many need to be some NAT rule for the 192.168.1.0/24 and 192.168.2.0/24 networks. But we may be getting ahead of ourselves on that. So for now, work on the above and let me know what happens.

Good luck!

xxGBHxx

@dma_pf said in WG not routing or sending traffic:

@xxgbhxx Something is keeping the Wireguard_IVPN from getting to the IVPN server. The reason I suggested changing the Endpoint Address to the IP address from the FQDN is because your DNS is set to use the DNS servers of IVPN. So it would make sense that they would not connect to the FQDN as it would not be able to resolve the FQDN and therefore fail to make the connection. But that did not resolve the issue. So I would think at this point it would be a NAT or Firewall rule issue.

Good luck!

Yeh I get that. I had tried that in one of my previous attempts but it's always good to try again.

The rules are exactly as I posted above and haven't been changed. I will read through your suggestion tomorrow when I'm on my main PC instead of my laptop.

We need to not lose sight that this worked perfectly until I added the third interface. If any of what you'd suggested was different it wouldn't have worked at all. I've even been through multiple reloads on the previous config without issues.

I tell you what I'll do over the next few days is build another FW and test that too. That way we can rule out that it's something in the original config thats causing it.

Thanks again for your help here.

G

dma_pf

@xxgbhxx I thought of 2 other things to check.

Look at the System Time in the dashboard of pfSense. You have all traffic locked down to only going out through IVPN. I had a similar configuration on a remote site years ago. It resulted in a chicken and the egg scenario. The OpenVpn tunnel to IVPN could not connect because the time stamps were so off from each other. As a result, pfSense could not get the correct time because it could not resolve the FQDN to get to a time server. This caused all sorts of issues. I fixed the problem by making sure that I had some time servers defined by their actual IP addresses. My System/General Setup/Localization/Timeservers looks like this: us.pool.ntp.org 129.6.15.27 129.6.15.28 129.6.15.29
I'm not sure if the configuration you have been testing on is on your production environment or if your had a second virtual machine set up. If it's the latter you might need to reboot the modems when switching between the pfsense virtual machines. See this: https://forum.netgate.com/topic/97622/how-to-configure-wan-with-static-ip

I tell you what I'll do over the next few days is build another FW and test that too. That way we can rule out that it's something in the original config thats causing it.

If you do startup a fresh VM, I'd suggest going slow. Get the LAN & WAN up and working. Then get the IVPN wireguard tunnel up, but leave the DNS to the default unbound and the default gateway to WAN. When IVPN is working correctly then add the 2nd interface and see what happens.

Try to leave pfSense as close to its default settings. Just build it one step at a time and see where/if it breaks by testing at each step.

xxGBHxx

@dma_pf said in WG not routing or sending traffic:

@xxgbhxx I thought of 2 other things to check.

Look at the System Time in the dashboard of pfSense. You have all traffic locked down to only going out through IVPN. I had a similar configuration on a remote site years ago. It resulted in a chicken and the egg scenario. The OpenVpn tunnel to IVPN could not connect because the time stamps were so off from each other. As a result, pfSense could not get the correct time because it could not resolve the FQDN to get to a time server. This caused all sorts of issues. I fixed the problem by making sure that I had some time servers defined by their actual IP addresses. My System/General Setup/Localization/Timeservers looks like this: us.pool.ntp.org 129.6.15.27 129.6.15.28 129.6.15.29

I'm not sure if the configuration you have been testing on is on your production environment or if your had a second virtual machine set up. If it's the latter you might need to reboot the modems when switching between the pfsense virtual machines. See this: https://forum.netgate.com/topic/97622/how-to-configure-wan-with-static-ip

I tell you what I'll do over the next few days is build another FW and test that too. That way we can rule out that it's something in the original config thats causing it.

If you do startup a fresh VM, I'd suggest going slow. Get the LAN & WAN up and working. Then get the IVPN wireguard tunnel up, but leave the DNS to the default unbound and the default gateway to WAN. When IVPN is working correctly then add the 2nd interface and see what happens.

Try to leave pfSense as close to its default settings. Just build it one step at a time and see where/if it breaks by testing at each step.

Time is correct to within a couple of seconds. You're usually allowed a minute or more grace with most crypto/PKI
This is a test server not my production server. I'd already "broken" the production server by doing an in-situ 2.5 upgrade which broke everything. That was the primary reason why I decided to start from scratch. Unless I'm misreading, that problem is related to bridge mode, which I don't use. As far as the ADSL modems are concerned these are just devices on the LAN.
I did go slow with this onealready. I take snapshots of the config after each change as long as it's still working. As I mentioned I already had a working config with two interfaces. I still have that snapshotted. I can load up that snapshot in seconds and it's all working again.

In fact lets do just that.

Here's the screenshots I've literally just taken of that same system with 2 interfaces before I add the third. It's all working exactly how you/others describe. I have DNS set to a single server 172.16.0.1 which is IVPN. I can use my connected PC exactly as I would expect. I can re-boot and it re-connects without an issue.

Working Dual Interface and WG.JPG

As I mentioned above, I know instantly when it's working as you get this massage on the front page.

If you like, I can go from this point and I can take screenshots every single time I make a change so you can see what changes I'm making and at what point it breaks?

G

xxGBHxx

@dma_pf

Actually I just did a test.

Just ADDING the third network card in VMWare is enough to kill it.

So fully working as per picture above. Works fine through reboots etc. Go into VMWare, add a third interface to the PF machine. Gets instantly detected in PF. I then simply reboot and I get this

NOT Working Third Card.JPG

You can see the WG in now down. I've not actually configured the IF in Interfaces>Assignments (though it appears obviously). All I did was add the IF in VMWare and reboot.

SO...

Is this an incompatibility with VMWare or in the way it adds interfaces. About to go play some more.

G

xxGBHxx

@dma_pf

We have progress....

The thought occurred to me if it was something that pf didn't like about me adding a "live" virtual adapter. So I powered down the fw, added the adapter with the fw powered down and powered back up.

Working

Added the Interface in Assignments

Working

Rebooted

Working

SOOOOOOOOOOOOOOOOOOOOOOOOOOOO.....

It seems so far that ALL my problems were to do with how/when I was adding in that third interface. If you add it in after you build with the system live it locks the system so hard nothing works.

I have no ideas why but I have to assume this is a bug.

I will keep testing to see if I can break it. If nothing else this whole process has given me a far better understanding of the firewall.

Thanks

G

dma_pf

@xxgbhxx Glad to hear things are moving along!

xxGBHxx

@dma_pf said in WG not routing or sending traffic:

@xxgbhxx Glad to hear things are moving along!

Getting there.

I had not realised how important that 1412 MSS change would be. What I was finding was that I could happily browse to some websites (e.g. ivpn.net) but I couldn't browse to others (www.bbc.co.uk). I then made the change to MSS and everything I've tested is now working flawlessly.

So now I'm going to continue building up the environment and start adding the additional layers of things like snort, pfblocker for my blocklists and so on and the conditional access for my streaming.

So far it's looking good.

Again can I say thank you @dma_pf for your patience in working on this. Sometimes you just need a sounding board to work through things and maybe this thread will help others as there's some good knowledge in it.

I am still debating if this should be raised as a bug or if it's normal behavior.

@jimp is this a bug with pf, FreeBSD, WG or is this expected behaviour? (you only need to read first and last post to get the gist though there's a post in the middle that has a network diagram too). I will raise a bug report if this isn't normal.

G

xxGBHxx

Been almost 2 weeks now so thought I'd make one last update to this thread.

I have everything working. Just a few headline PSA's from my perspective for people who just skim the rest of the posts

Reducing the MSS on the LAN interface is vitally important as without it certain URL's would not work. I reduced it to 1412 (per my ISP guidance) and it's been perfect ever since
You can't add a new network interface in VMWare while pfSense is running as it breaks the networking when you reboot pfSense
pfSense seemingly has no easy way of monitoring what's going on with the WG tunnel including no easy way to see if the tunnel is actually up or being negotiated other than pinging the far side gateway
I suspect pfSense' WG implementation is not yet completely stable as there's a lot of niggly issues being reported in this thread with "odd" behaviour
It doesn't matter whether you use policy based routing to encrypt all and PBR the unencrypted (as I have mine) or use PBR to only encrypt certain traffic - both work
I never did get any word back from Netgate on whether this is a bug or known/expected behaviour

Now it's actually working it's rock stable and hasn't dropped a connection yet (which is something OpenVPN was doing on a daily basis for me). It's clearly a little faster than OpenVPN but that might just be my imagination.

Again a huge thanks to @dma_pf and others for their contribution.

G