PfSense on Proxmox can't ping internet after reboot, OVH hosting
let me first extend my thanks to the community and software team for this great firewall.
I've recently been noticing strange behaviour on my virtualised PfSense instance. I hope I am posting this in the right place, since I believe it has to do with routing and gateways.
Pfsense is virtualised under Proxmox. The host server is hosted by OVH (I mention this as they are using peculiar routing methods based on mac addresses).
All other VMs on proxmox use the virtualised PfSense instance as a gateway for the internet; some have 1:1 ip mapping, others don't.
Pfsense is using virtio interfaces, and has Hardware Checksum Offloading, Hardware TCP Segmentation Offloading and Hardware Large Receive Offloading disabled (boxes ticked)
In terms of IP assignments, WAN (vtnet0) has ip
184.108.40.206, and the gateway has the option 'Use non-local gateway through interface specific route.' active.
The mac address of the WAN interface is set (directly in Proxmox) to the mac of the IPs from a OVH failover bloc
The first ip of the bloc
220.127.116.11/32is used for WAN, and the other IPs are used for 1:1 nating for other VMs. This seems to be working fine.
routing table seems ok, including:
default 18.104.22.168 UGS 7191 1500 vtnet0 22.214.171.124 03:00:00:f9:c3:3b UHS 254 1500 vtnet0
(the mac above has been changed for anonimity)
I noticed after setting up Ipsec VTI (although I do not think this is linked) that my pfsense instance couldn't ping 126.96.36.199.
I check to see if VMs behind pfsense could access the internet, and there was no issue there.
All incoming connections to PfSense (Ipsec, or OpenVPN), were also working fine.
The default gateway was also setup properly.
At that point I thought it might be some kind of strange incompatibility with the virtio interfaces. Since Proxmox interface hot-swapping was active, I hot-swapped my virtio interface with an e1000 interface. After reassigning the wan to that new interface, pfsense could ping 188.8.131.52 again.
Great, problem solved I thought, and went ahead and rebooted the pfsense VM to make sure everything was working ok.
After reboot, pfsense could not ping 184.108.40.206. Everything else was still working fine. I then switched back to virtio, and once again, pings from pfsense to 220.127.116.11 started working. I have since confirmed that the type of interface is not the issue, but changing the interface seems to fix the underlying problem (Probably by resetting something I havn't identified).
At the moment, my workaround is therefore to swap the interface after every reboot. This is obviously not very practical, and I would like to hear from the community to see if they have any thoughts with respect to why this is happening.
I personally suspect something is going on during boot (bad route assignement ? although they look fine), but I'm not sure how I could verify this. I also suspect that a command on pfsense (toggling the wan interface on and off) should also be a better workaround, but it doesn't seem to work.
If anyone has any ideas about this, I#m all ears, and thanks in advance for your help.
I'll add something I just noticed: After applying the workaround, I get a lots of 'arprequest: cannot find matching address' in my log, pretty much every second.
Install shellcmd via packages and configure the following under services / shellcmd :
Command: route add -net 18.104.22.168/32 -iface vtnet0 Shellcmd Type: shellcmd Description: Default Gateway fix afer Reboot
Uncheck the Option 'Use non-local gateway through interface specific route.' under the gateway configuration.
Thats the way I'm dealing with pfSense on a OVH hosted ESXi.
Thanks for the reply Artes!
I dont seem to be able to deactivate the 'Use non-local gateway through interface specific route'option, I get the error message 'The gateway address 22.214.171.124 does not lie within one of the chosen interface's subnets.''
Shall I just try to delete the gateway ? I'm afraid this might cause other issues though
Well, leave the option checked for now. In my configuration it's unchecked but when I recall it right, it's because I've configured the initial IP Configuration through the CLI and didn't touch it anymore.
Did you try to add the route manually after a reboot? If it helps you can make it persistent through the shellcmd package.
I assume you have another virtual machine on the Proxmox Cluster which you used to configure the firewall from a direct connect network through the GUI.
thanks very much for the suggestion. I tested what you suggested, and although that did not work, it motivated me to try to delete and and then re-add the default gateway on a broken post reboot system.
I therefore deleted it
route delete default 126.96.36.199
I then checked the routing table and the entry was indeed gone.
Note that this default gateway appeared correctly in the routing table before deletion.
Then as I was about to re-add it, I waited a couple of seconds and the route was automatically re-added, and pinging 188.8.131.52 worked again.
So using your suggestion with the shell command, I changed the command as follows:
Command: route delete default 184.108.40.206 Shellcmd Type: shellcmd Description: Default Gateway fix afer Reboot
After reboot no more problems. I'm happy it works, but I still don't understand how a route can be showing on the routing table, but not be functional. Could this be a bug, and issue with a boot race condition, or something similar ?
EDIT: if developers are reading this, Im using pfsense 2.4.5
I'm happy it works, but I still don't understand how a route can be showing on the routing table, but not be functional.
It's because the firewall has no direct connected route to the default gateway and don't know how to reach it. That's what the fix suppose to accomplish, by telling the IP-Stack you can reach 220.127.116.11/32 through interface vtnet0. It generates a entry in the routing table which looks like this:
18.104.22.168/32 22.214.171.124 UGS vtnet0
Usually a gateway is within the same subnet as the router interface. Lets assume your WAN Interface is vtnet0 with IP 126.96.36.199**/24** and your gateway has the IP 188.8.131.52**/24**. Under this circumstance your router would have a direct connect route in the table which says.
Destination Gateway Flags Netif 184.108.40.206/24 link#n U vtnet0
With this entry in the routing table. the router knows, that the default gateway is reachable through vtnet0. Since OVH's virtual MAC concept puts your fw IP and GW IP in different subnets, that entry is missing and your firewall has no clue how to reach the gateway.
See this page about routing and DGW not in the same subnet. https://docs.netgate.com/pfsense/en/latest/book/routing/gateway-settings.html#use-non-local-gateway
Maybe this can help?
Thanks Krisbe for pointing that out. I did however have that option already checked.
@Artes, its strange as after a reboot with a system that can't ping 220.127.116.11, I already have a route:
18.104.22.168 03:00:00:f9:c3:3b UHS 254 1500 vtnet0
Running the command you suggested ads a slightly different route with destination
In both cases, the gateway for these destinations is always a mac address, and not an ip as in your example.
okay, can you Post or PM your entire routing Table? Please use netstat -4rn on the command line.
Hi Artes, here is the output after a reboot when I cannot ping 22.214.171.124
Destination Gateway Flags Netif Expire default 126.96.36.199 UGS vtnet0 188.8.131.52 184.108.40.206 UGHS vtnet0 10.6.106.1 link#13 UHS lo0 10.6.106.2 link#13 UH ipsec100 10.6.107.1 link#14 UHS lo0 10.6.107.2 link#14 UH ipsec200 10.8.18.0/24 10.8.18.2 UGS ovpns1 10.8.18.1 link#15 UHS lo0 10.8.18.2 link#15 UH ovpns1 10.8.19.0/24 10.8.19.2 UGS ovpns2 10.8.19.1 link#16 UHS lo0 10.8.19.2 link#16 UH ovpns2 10.8.128.0/19 192.168.67.6 UGS vtnet1 10.8.144.0/24 192.168.67.6 UGS vtnet1 220.127.116.11 link#1 UHS lo0 18.104.22.168/32 link#1 U vtnet0 22.214.171.124 link#1 UHS lo0 126.96.36.199/32 link#1 U vtnet0 188.8.131.52 link#1 UHS lo0 184.108.40.206/32 link#1 U vtnet0 220.127.116.11 link#1 UHS lo0 18.104.22.168/32 link#1 U vtnet0 22.214.171.124 link#1 UHS lo0 126.96.36.199/32 link#1 U vtnet0 188.8.131.52 link#1 UHS lo0 184.108.40.206/32 link#1 U vtnet0 220.127.116.11 link#1 UHS lo0 18.104.22.168/32 link#1 U vtnet0 22.214.171.124 link#1 UHS lo0 126.96.36.199/32 link#1 U vtnet0 188.8.131.52 02:00:00:f8:c2:2b UHS vtnet0 184.108.40.206 link#4 UHS lo0 220.127.116.11/32 link#4 U vtnet3 18.104.22.168 02:00:00:06:4d:0d UHS vtnet3 22.214.171.124 link#3 UHS lo0 126.96.36.199/32 link#3 U vtnet2 188.8.131.52 02:00:00:e1:4e:f8 UHS vtnet2 184.108.40.206 link#5 UHS lo0 220.127.116.11/32 link#5 U vtnet4 18.104.22.168 02:00:00:35:59:6e UHS vtnet4 22.214.171.124 126.96.36.199 UGHS vtnet0 127.0.0.1 link#10 UH lo0 188.8.131.52 link#6 UHS lo0 184.108.40.206/32 link#6 U vtnet5 220.127.116.11 02:00:00:59:a8:69 UHS vtnet5 18.104.22.168 link#7 UHS lo0 22.214.171.124/32 link#7 U vtnet6 126.96.36.199 02:00:00:81:c9:07 UHS vtnet6 192.168.0.0/24 10.6.107.2 UGS ipsec200 192.168.8.0/21 10.8.18.4 UGS ovpns1 192.168.27.0/24 192.168.67.6 UGS vtnet1 192.168.48.0/21 10.6.106.2 UGS ipsec100 192.168.67.0/24 link#2 U vtnet1 192.168.67.8 link#2 UHS lo0 192.168.72.0/21 10.6.107.2 UGS ipsec200 192.168.80.0/22 10.6.107.2 UGS ipsec200
I have run a file compare to a working routing table, after applying the fix I listed above and a reboot: they are identical ! This is very suspicious
Interesting - I only know host routes with mac addresses as Gateway if a direct connected route within the same subnet exists, but all your routes within 1.2.3.X have /32 masks. Did you compared the mac addresses with these from a "working" routing table. Does the macs in the routing table matches the entries in the arp cache? At the moment I have no clue whats going on at your end nor what could be the root cause of your issue
I must say I don't have much experience with pfSense 2.4.5, since I was hitting latency issues and instability with BGP routing when I installed it on my OVH VM . But general reachability for the default route was not an issue after reboots, so I didn't look closely into the routing table for that part. Also I rolled it back to 2.4.4_p3 pretty quickly due to the problems I had. All my installations are running on 2.4.4_p3 at the moment.
If you have capacity on your proxmox cluster and another free IP-Addresses, I would suggest you provision another pfSense installation for test purposes and see if situation stays the same when you configure it from scratch.