Loss of WAN link associated with power failure and subsequent NTP synch issue

senior_tron

I wanted to report this and see if anyone else has seen this issue or knows a reliable way to mitigate it. I am currently running PFSense 2.4.4 on an ESXi/vSphere server set to auto start on vSphere startup. I had a power failure and PFSense immediately returned to operation when the vSphere server booted back on, however, the WAN link did not come back up. All LAN/local subnets worked, and from the front page, it appeared the WAN was up, but Internet connectivity was not actually up.

The only indication that WAN connectivity was an issue was the obvious lack of internet along with the gateway status page showing it is offline. The WAN interface was able to release and renew it's IP, no problem, but connectivity was not regained.

When I connected the PFSense box to another router that I own, connectivity came back up no problem, but when I tried to reconnect directly to the ISP, connectivity was lost again (The VPN doesn't work behind the consumer grade router).
Eventually I found when looking through the Gateway logs an issue with latency on the WAN DHCP interface:

send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr XX.XX.XX.XX bind_addr XX.XX.XX.XX identifier "WAN_DHCP "

After discovering this I started troubleshooting NTP as I am pointed at a local Windows Server 2012 R2 server for NTP services and figured that this might be the issue. What finally seemed to fix the issue was disabling NTP and then re-enabling it. After this, there was no latency issue logged by the WAN Gateway and internet connectivity resumed.

Anything to address this issue would be great as well as any advice on how to avoid it altogether with a configuration change would also be great as this is probably not the last unexpected power outage it will suffer. Thanks!

johnpoz

Set your gateway to always be considered always up.. Problem solved..

I am not sure how you think ntp not being reachable has anything to do with your delay in pinging your gateway calculation of offset..

NTP being off isn't how the loss or latency time of return of ping is calculated.. If so then someone having a few seconds off in their pfsense time would think that that latency is a few seconds? Or that they have 100% packet loss..

senior_tron

OK. So I went and got a UPS today to connect to the server for some further power stability. Putting this in place, of course, necessitated powering off and on again the server/PFSense host. Same issue. Cycling NTP this time did not seem to work so I am betting it was coincidentally noted before. I also tried marking the Gateway as up- still same issues with no connectivity. In desperation, I finally go and simply unplug the CAT 5 from the fiber modem and plug it back in. Boom. Full connectivity comes back up immediately. Makes me wonder if ESXI/ the fiber modem is maintaining some sort of state between the two which PFSense can't break till I physically reconnect? At that point, PFSense and the modem synch up and connectivity is restored I suppose. Looks like I am going to have to just make sure it gets reconnected physically every time the server is rebooted. Thanks for the response.