Loadbalanced WAN strangeness with one WAN failure
-
I've been using pfSense for quite a while now with multi-WAN configurations and haven't really had any problems with it, it's always worked perfectly for me (with the minor exception that starting the pfSense box up while a WAN is down wreaks havoc) and I've never had problems with it failing over. Until today.
I have a fairly simple configuration at this site, a couple VLANs and two physical WAN interfaces, one to a cable modem and one on ADSL. The cable modem is pfSense's WAN interface, with ADSL on OPT1. My load balancer setup uses a DNS server at each ISP as the monitor IP (which are also configured as the pfSense DNS IPs), and weights the cable modem 4:1. I am running 1.2.3-RC1 at this site.
Today the cable modem link went down and apparently the Motorola modem starts giving out 192.168.100.0/24 addresses if its link is down. This doesn't overlap with any of my LAN segments (I use 192.168.10.0/24 and 192.168.20-22.0/24 for LANs), and I wouldn't expect it to cause any problems other than the monitor to fail and the route to only go out the ADSL link, which was working fine.
Unfortunately my colleague onsite rebooted the pfSense box before I could take a look, so I don't know the exact state of the system after the cable link failed, but after the reboot it behaved strangely:
a) Load balancer status showed that all the cable modem monitors were 'down' while all ADSL monitors where 'up', as expected - but no traffic was routed to the Internet (as far as I could tell, not even the 1/4 of new connections that would normally go to ADSL)
b) Even my failover (not load balanced) gateway that gives the ADSL link as the first test wouldn't route traffic through my normal rules
c) Removing the cable modem monitors from all the pools sort of worked. Pings would go to the 'net and work properly, but actual connections were unreliable and mostly didn't work at all.The interim solution is to change all my routes (quite a few :|) to use the OPT1 gateway instead of the load balancer pool.
This is all rather strange behaviour, as in the past whenever the load balancer reports the monitors as down, I haven't had any issues. The only issues I've had is where the monitors are marked 'up', but the link is actually down (due to booting with a WAN down so static routes can't be created, or when the monitor is up but the actual internet link is down). So does anyone have any suggestions as to why this might have happened? I like to understand these sorts of issues so I can at least know when I need to use particular workarounds, and preferably prevent it from happening again.
-
Hi,
Have you ever found a solution for this problem? -
No, but neither has it appeared again for me to know it's still an issue in 1.2.3 final.