Dual WAN behaviour if one WAN is down at startup
I came across a strange issue at a client site using dual WAN with load balancing last week. They were reporting that the internet was down. I determined that one of the WAN links was up and logged into the WebUI remotely. It appeared that the primary WAN was down, but the secondary WAN was up and working fine, however the load balancer status suggested that both WANs were up (however the primary WAN had no IP assigned).
What I hypothesize happened was a power outage overnight (uptime at the box was only 8 hours at the time I checked), and that the cable modem (primary WAN) took a while to resync - longer than pfSense took to boot up. Because no address was assigned to that link, the routes that force the load balancer's pings to go out that link never got set up, so the pings were successful even though the link wasn't operational.
Has anyone else encountered a similar situation? I did some quick testing and it seems like this is reproducible if you unplug or otherwise disable one of the WAN links while booting a dual-WAN pfSense configuration.
Also this leads me to believe that pfSense never retries the DHCP negotiation after it fails once. What happens if the link goes down for long enough that the lease expires and the renewal fails? I assume that in several situations pfSense will never be able to recover a lease again. Shouldn't pfSense retry DHCP periodically when the link is up but a lease hasn't been obtained?
You know I have seen funnies too if the wan is unavailable and backup(opt1) remains up. Internally everything carried on working fine, all out bound web traffic got diverted out the backup link but inbound web traffic to the backup link was getting lost! I turned on logging on the inbound backup rules and I could see hits on the firewall but not on the mail server or web server so from outside it looked dead! Once the wan link came back on-line the backup (opt1) interface started behaving its self again.
I have also seen what you describe if you pull the cable out of the wan port and the interface appears down, you can still ping things you shouldn't.
Also pfsense doesn't seem to like dynamic WAN IPs too much and its a pain to have to change any static routes you've added. I've now added bridges between both my modems and pfsense and staticly assigned wan interfaces address and she's allot happier now.
Wondering if you ever found out the cause (or a solution) for this problem? We just had this happen here and I'm having trouble explaining the cause. WAN went down, and outbound started flowing to OPT1 as it should have. Inbound connections to the failover connection were getting lost entirely though. I couldn't even see the web GUI remotely on OPT1. As soon as WAN came back up, I could see the web GUI via OPT1. Odd… I'd be appreciative of any suggestions!