2.3.4->2.4.4 Upgrade 100% Packet Loss on WAN Interface
I use pfSense for my home firewall (I am not an IT Pro). I have been using 2.3.4 successfully for quite some time but felt I ought to upgrade to the latest. I run it on Hyper-V. The upgrade seemed to go smoothly, I didn't see any issues reported and the admin portal comes up just fine etc.
However, the WAN_DHCP gateway is now showing offline, and the logs show 100% packet loss on the WAN interface (which I think is why the gateway is offline). netstat -r shows what seems to be a valid route to the default gateway. I have checked the firewall rules, there is nothing in there that will block anything, the firewall logs don't show traffic being blocked either.
I am not sure what the problem could be. Thankfully, as I run it on Hyper-V I can easily revert to my previous 2.3.4 installation, so I can post to this forum! Any ideas why the upgrade should cause all packets to be lost?
Two things to check for.
Make sure you have a default gateway set.
There is a new feature in 2.4.4 where you can set a gateway group as the default and it will use Automatic to select one itself if there is not one set. Some edge cases are seeing issue with that (fixed in 2.4.5 snapshots).
In System > Routing > Gateway select the WAN-DHCP as the default v4 gateway if it is not already.
A change to the DHCP client in 2.4.4 means it now correctly respects an MTU setting given to it by an upstream server. If you have any custom options on the WAN it will take that value now where it previously ignored it.
Some DHCP servers seem to be handing out crazy values that were previously ignored.
Check the WAN MTU in Status > Interfaces.
Thank you, I will try those when I get the next opportunity in a few days from now. I will report back.
I made the two suggested checks.
First on the gateway, WAN_DHCP seems to be selected as the default as per the screenshot below:
I checked the Gateway Groups and Static Routes pages, both are empty.
Is all that as it should be?
On the MTU it is showing 1500, so I think that is OK.
Did you try changing the gateway monitoring to a different IP?
With only one gateway though it will always be used even if it shows as off-line. I assume you cannot actually connect out at all?
It is pulling a DHCP lease though so there is a link of some sort.
Check the routing table in Diag > Routes just to be sure there is a default route. Although the gateway is marked default you still have the selection set to automatic.
I have tried another monitoring address but it doesn't make a difference (I used 188.8.131.52). I did try changing the default gateway not to be automatic, explicitly choosing WAN_DHCP, but again to no avail. I have checked Diagnostics->Routes and there is a default route. You are correct though, I cannot connect out at all, nothing I do results in any packets on the WAN interface, but it does manage to lease a DHCP address.
Is it giving you a rational address via DHCP? Gateway IP in the same subnet?
Check the DHCP logs match what you are seeing on the interface.
Check the system logs for errors.
I assume it's pulling a DHCP lease directly from your ISP?
I did check for all these things before and they all seemed fine as I recall. I won't get another chance to check again until next weekend, I will report back then.
I checked the DHCP logs and the gateway IP is in the same subnet as the leased IP address.
I do see some IPv6 errors, but I am assuming they are benign as my ISP is IPv4.
I looked in the system logs. I am getting some errors relating to the time of day (there is an oddity with FreeBSD not seeming to get the time from Hyper-V correctly). The errors are like this one:
rc.bootup: The command '/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/ipsec-packets.rrd N:U:U:U:U:U:U:U:U' returned exit code '1', the output was 'ERROR: /var/db/rrd/ipsec-packets.rrd: illegal attempt to update using time 1541839764 when last update time is 1571980920 (minimum one second step)'
In the Gateways part of the system log I see lots of errors from dpinger, I assume they are symptoms rather than cause though. Here they are:
send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr a.b.c.1 bind_addr a.b.c.86 identifier "WAN_DHCP "
WAN_DHCP a.b.c.1: Alarm latency 0us stddev 0us loss 100%
When DHCP works and nothing else does that's usually a bad firewall rule. Thouhg here that would be somewhere upstream unless you have a blocking floating OUT rule.
Try running a packet capture on WAN. Do you see the ping packets leaving? Do you see any packets coming back?
I tried a packet capture before, and I tried it again today. I don't see any packets originating from computers on the LAN going out on the WAN. I do see some DNS lookups that appear to be coming from pfSense itself. I have looked at the firewall rules and there doesn't really seem to be anything that would block traffic. Looking at the firewall logs I see traffic being blocked, but it all appears to be IPv6.
Are those DNS lookups actually working? Does Diag > DNS Lookup work?
Can you packet capture the DHCP exchange?
What is the MAC of the gateway? Maybe it's something odd. Though that should affect 2.3.4 just the same.
gateway IP is in the same subnet as the leased IP address.
And this is public IP or private IP?? This is a VM right... Are we sure interfaces are not moving about and changing order on update? If your pfsense can not talk to your gateway your going to have a problem.
Your gateway and public IP should be the same when on 2.3.4 as it is when you upgrade... I do not see your mac changing on your vm... So if you can ping your gateway when your on 2.3.4 and not when on 2.4.4 something really odd is going on..