WAN Ports Down but ISP routers Up.
My system has been running well for several years with Dual WAN connected to a Cable Modem (200Mbps) and DSL (60Mbps). Load balancing, failover, etc all working ok.
Both WAN are static IP to the relevant router, although for lots of ISP reasons the system is running Double-NAT. This works fine for my setup.
In the last few days I have suddenly lost both WAN connections on the PfSense side although the ISP routers remain active.
The only fix seems to be to reboot pfsense. On Tuesday I did this cleanly via SSH menu, but today I had to get my Wife to do it by cycling power (she cannot do SSH or GUI.)
Cannot find anything in system.log, etc. although this does cover the failure time slots.
Any ideas what to look for, or any extra debug settings I can enable ?
How 'down' were they? Gateway status showing as down on both?
You might want to check the routing table if it happens again. Make sure you have a default gateway set rather than 'automatic'.
I would run a packet capture on both to see if gateway monitoring pings are actually leaving.
For both to go down at the same time seems like something more than an actual gateway failure.
Yes GW status was showing down on both in Dashboard. I was not able to ping either Router from the Pfsense SSH Shell, but didn't do other checks.
If (when) it happens again, I will see what else seems to be working or not. Also I will check the Interface status & maybe try re-plugging cables before resorting to reboot.
Gateways are set, plus I have a Gateway_Group containing both, which is then used in LAN Rules for default traffic.
You should not pick a Gateway Group with both gateways in the same Tier.
@rico Why cannot both GW have same Tier ? They have different Weights in each to allow for bandwidth difference.
It's noted in the 2.4.4 changelog here https://www.netgate.com/docs/pfsense/releases/2-4-4-new-features-and-changes.html
and also mentioned in the 2.4.4 short topics hangout by @jimp https://www.netgate.com/resources/videos/pfsense-244-short-topics.html (32:25 min).
Yes, you can't use a load-balancing gateway group as the default gateway. You can still use it on a LAN rule to load-balance client traffic as previously though.
Default gateway is on Wan-1 as shown above. LAN rules route all outbound internet traffic to the Gateway Group. When I had the two WAN on different Tiers, the load balancing did not work, but failover was working. Changed back to same tier, but with 8:2 weights, and now the balancing works. If either WAN fails the balancer simply uses the available WAN.
As far as the original issue is concerned, it has not re-occured. I did find an old VPN route that referred to an interface that no longer existed was creating loads of syslog entries, but can't see this being the cause. Removed the route just in case.
When I get a minute I will connect monitor & keyboard and do a single-user disk check just in case.
Update on this issue. This morning both WAN ports were down again. Before reboot I checked console, and everything looked ok, but no connectivity between either WAN and ISP modems. Nothing was hogging CPU, and no errors in system.log.
This time, after reboot WAN's were still down so checked "demsg" logs. It looks like Intel Pro-1000 Gigabit card failure. All ethernet parameters ok, but ifconfig shows "no carrier". Did usual Cat-5 re-plug & replacement, but no joy.
So I have ordered a replacement dual-Gig card, and in the meantime fired up virtual pfsense on my DL160 ESXi box. Restored the config and now have full connectivity again.
I guess the Gig Card was starting to fail, and now gone into hard fault in the Layer-1 interface chips.
Hmm, unusual failure in those cards. Assuming it's a genuine one.
Nice catch though.