PfSense on Proxmox can't ping internet after reboot, OVH hosting

Chluz

Hi Everyone,

let me first extend my thanks to the community and software team for this great firewall.

I've recently been noticing strange behaviour on my virtualised PfSense instance. I hope I am posting this in the right place, since I believe it has to do with routing and gateways.

The background
Pfsense is virtualised under Proxmox. The host server is hosted by OVH (I mention this as they are using peculiar routing methods based on mac addresses).
All other VMs on proxmox use the virtualised PfSense instance as a gateway for the internet; some have 1:1 ip mapping, others don't.
Pfsense is using virtio interfaces, and has Hardware Checksum Offloading, Hardware TCP Segmentation Offloading and Hardware Large Receive Offloading disabled (boxes ticked)

In terms of IP assignments, WAN (vtnet0) has ip 1.2.3.16/32 with gateway 1.2.3.254, and the gateway has the option 'Use non-local gateway through interface specific route.' active.
The mac address of the WAN interface is set (directly in Proxmox) to the mac of the IPs from a OVH failover bloc 1.2.3.16/29 : 03:00:00:f9:c3:3b.
The first ip of the bloc 1.2.3.16/32 is used for WAN, and the other IPs are used for 1:1 nating for other VMs. This seems to be working fine.

routing table seems ok, including:

default	1.2.3.254	UGS	7191	1500	vtnet0
1.2.3.254		03:00:00:f9:c3:3b	UHS	254	1500	vtnet0

(the mac above has been changed for anonimity)

The Issue
I noticed after setting up Ipsec VTI (although I do not think this is linked) that my pfsense instance couldn't ping 8.8.8.8.
I check to see if VMs behind pfsense could access the internet, and there was no issue there.
All incoming connections to PfSense (Ipsec, or OpenVPN), were also working fine.
The default gateway was also setup properly.

At that point I thought it might be some kind of strange incompatibility with the virtio interfaces. Since Proxmox interface hot-swapping was active, I hot-swapped my virtio interface with an e1000 interface. After reassigning the wan to that new interface, pfsense could ping 8.8.8.8 again.
Great, problem solved I thought, and went ahead and rebooted the pfsense VM to make sure everything was working ok.

After reboot, pfsense could not ping 8.8.8.8. Everything else was still working fine. I then switched back to virtio, and once again, pings from pfsense to 8.8.8.8 started working. I have since confirmed that the type of interface is not the issue, but changing the interface seems to fix the underlying problem (Probably by resetting something I havn't identified).

At the moment, my workaround is therefore to swap the interface after every reboot. This is obviously not very practical, and I would like to hear from the community to see if they have any thoughts with respect to why this is happening.
I personally suspect something is going on during boot (bad route assignement ? although they look fine), but I'm not sure how I could verify this. I also suspect that a command on pfsense (toggling the wan interface on and off) should also be a better workaround, but it doesn't seem to work.

If anyone has any ideas about this, I#m all ears, and thanks in advance for your help.

Chluz

I'll add something I just noticed: After applying the workaround, I get a lots of 'arprequest: cannot find matching address' in my log, pretty much every second.

A Former User

Install shellcmd via packages and configure the following under services / shellcmd :

Command: route add -net 1.2.3.254/32 -iface vtnet0
Shellcmd Type: shellcmd
Description: Default Gateway fix afer Reboot

Uncheck the Option 'Use non-local gateway through interface specific route.' under the gateway configuration.

Thats the way I'm dealing with pfSense on a OVH hosted ESXi.

Chluz

Thanks for the reply Artes!
I dont seem to be able to deactivate the 'Use non-local gateway through interface specific route'option, I get the error message 'The gateway address 1.2.3.254 does not lie within one of the chosen interface's subnets.''

Shall I just try to delete the gateway ? I'm afraid this might cause other issues though

A Former User

Well, leave the option checked for now. In my configuration it's unchecked but when I recall it right, it's because I've configured the initial IP Configuration through the CLI and didn't touch it anymore.
Did you try to add the route manually after a reboot? If it helps you can make it persistent through the shellcmd package.

I assume you have another virtual machine on the Proxmox Cluster which you used to configure the firewall from a direct connect network through the GUI.

Chluz

Hi Artes,

thanks very much for the suggestion. I tested what you suggested, and although that did not work, it motivated me to try to delete and and then re-add the default gateway on a broken post reboot system.
I therefore deleted it

route delete default 1.2.3.254

I then checked the routing table and the entry was indeed gone.
Note that this default gateway appeared correctly in the routing table before deletion.
Then as I was about to re-add it, I waited a couple of seconds and the route was automatically re-added, and pinging 8.8.8.8 worked again.

So using your suggestion with the shell command, I changed the command as follows:

Command: route delete default 1.2.3.254
Shellcmd Type: shellcmd
Description: Default Gateway fix afer Reboot

After reboot no more problems. I'm happy it works, but I still don't understand how a route can be showing on the routing table, but not be functional. Could this be a bug, and issue with a boot race condition, or something similar ?

EDIT: if developers are reading this, Im using pfsense 2.4.5

A Former User

@Chluz said in PfSense on Proxmox can't ping internet after reboot, OVH hosting:

I'm happy it works, but I still don't understand how a route can be showing on the routing table, but not be functional.

It's because the firewall has no direct connected route to the default gateway and don't know how to reach it. That's what the fix suppose to accomplish, by telling the IP-Stack you can reach 1.2.3.254/32 through interface vtnet0. It generates a entry in the routing table which looks like this:

1.2.3.254/32   1.2.3.254     UGS         vtnet0

Usually a gateway is within the same subnet as the router interface. Lets assume your WAN Interface is vtnet0 with IP 1.2.3.1**/24** and your gateway has the IP 1.2.3.254**/24**. Under this circumstance your router would have a direct connect route in the table which says.

Destination        Gateway            Flags     Netif
1.2.3.0/24         link#n             U         vtnet0

With this entry in the routing table. the router knows, that the default gateway is reachable through vtnet0. Since OVH's virtual MAC concept puts your fw IP and GW IP in different subnets, that entry is missing and your firewall has no clue how to reach the gateway.

Krisbe

See this page about routing and DGW not in the same subnet. https://docs.netgate.com/pfsense/en/latest/book/routing/gateway-settings.html#use-non-local-gateway
Maybe this can help?

Chluz

Thanks Krisbe for pointing that out. I did however have that option already checked.

@Artes, its strange as after a reboot with a system that can't ping 8.8.8.8, I already have a route:

1.2.3.254		03:00:00:f9:c3:3b	UHS	254	1500	vtnet0

Running the command you suggested ads a slightly different route with destination 1.2.3.254/32 instead of 1.2.3.254.
In both cases, the gateway for these destinations is always a mac address, and not an ip as in your example.

A Former User

okay, can you Post or PM your entire routing Table? Please use netstat -4rn on the command line.

Chluz

@Artes said in PfSense on Proxmox can't ping internet after reboot, OVH hosting:

netstat -4rn

Hi Artes, here is the output after a reboot when I cannot ping 8.8.8.9

Destination        Gateway            Flags     Netif Expire
default            1.2.3.254       UGS      vtnet0
2.205.127.125      1.2.3.254      UGHS     vtnet0
10.6.106.1         link#13            UHS         lo0
10.6.106.2         link#13            UH     ipsec100
10.6.107.1         link#14            UHS         lo0
10.6.107.2         link#14            UH     ipsec200
10.8.18.0/24       10.8.18.2          UGS      ovpns1
10.8.18.1          link#15            UHS         lo0
10.8.18.2          link#15            UH       ovpns1
10.8.19.0/24       10.8.19.2          UGS      ovpns2
10.8.19.1          link#16            UHS         lo0
10.8.19.2          link#16            UH       ovpns2
10.8.128.0/19      192.168.67.6       UGS      vtnet1
10.8.144.0/24      192.168.67.6       UGS      vtnet1
1.2.3.16        link#1             UHS         lo0
1.2.3.16/32     link#1             U        vtnet0
1.2.3.17        link#1             UHS         lo0
1.2.3.17/32     link#1             U        vtnet0
1.2.3.18        link#1             UHS         lo0
1.2.3.18/32     link#1             U        vtnet0
1.2.3.19        link#1             UHS         lo0
1.2.3.19/32     link#1             U        vtnet0
1.2.3.20        link#1             UHS         lo0
1.2.3.20/32     link#1             U        vtnet0
1.2.3.21        link#1             UHS         lo0
1.2.3.21/32     link#1             U        vtnet0
1.2.3.22        link#1             UHS         lo0
1.2.3.22/32     link#1             U        vtnet0
1.2.3.23        link#1             UHS         lo0
1.2.3.23/32     link#1             U        vtnet0
1.2.3.254      02:00:00:f8:c2:2b  UHS      vtnet0
1.2.4.35       link#4             UHS         lo0
1.2.4.35/32    link#4             U        vtnet3
1.2.4.254      02:00:00:06:4d:0d  UHS      vtnet3
1.2.5.127       link#3             UHS         lo0
1.2.5.127/32    link#3             U        vtnet2
1.2.5.254       02:00:00:e1:4e:f8  UHS      vtnet2
1.2.6.223     link#5             UHS         lo0
1.2.6.223/32  link#5             U        vtnet4
1.2.6.254     02:00:00:35:59:6e  UHS      vtnet4
109.0.200.133      1.2.3.254      UGHS     vtnet0
127.0.0.1          link#10            UH          lo0
1.2.7.247      link#6             UHS         lo0
1.2.7.247/32   link#6             U        vtnet5
1.2.7.254      02:00:00:59:a8:69  UHS      vtnet5
1.2.8.167      link#7             UHS         lo0
1.2.8.167/32   link#7             U        vtnet6
1.2.8.254      02:00:00:81:c9:07  UHS      vtnet6
192.168.0.0/24     10.6.107.2         UGS    ipsec200
192.168.8.0/21     10.8.18.4          UGS      ovpns1
192.168.27.0/24    192.168.67.6       UGS      vtnet1
192.168.48.0/21    10.6.106.2         UGS    ipsec100
192.168.67.0/24    link#2             U        vtnet1
192.168.67.8       link#2             UHS         lo0
192.168.72.0/21    10.6.107.2         UGS    ipsec200
192.168.80.0/22    10.6.107.2         UGS    ipsec200

I have run a file compare to a working routing table, after applying the fix I listed above and a reboot: they are identical ! This is very suspicious

A Former User

Interesting - I only know host routes with mac addresses as Gateway if a direct connected route within the same subnet exists, but all your routes within 1.2.3.X have /32 masks. Did you compared the mac addresses with these from a "working" routing table. Does the macs in the routing table matches the entries in the arp cache? At the moment I have no clue whats going on at your end nor what could be the root cause of your issue
I must say I don't have much experience with pfSense 2.4.5, since I was hitting latency issues and instability with BGP routing when I installed it on my OVH VM . But general reachability for the default route was not an issue after reboots, so I didn't look closely into the routing table for that part. Also I rolled it back to 2.4.4_p3 pretty quickly due to the problems I had. All my installations are running on 2.4.4_p3 at the moment.
If you have capacity on your proxmox cluster and another free IP-Addresses, I would suggest you provision another pfSense installation for test purposes and see if situation stays the same when you configure it from scratch.