Can't Get Failover (not Load Balance) To Work – Possible Hardware Issue?
-
I'm at wits end trying to get this to work, as you wouldn't think it'd really be too hard, so please feel free to smack me if I'm missing something really obvious.
I have three LANs and two WANs on a 1.2.3 box. Two of the LANs are going out all the time on our T1 line, with the 3rd LAN (a public WiFi) using our Cable Modem's net connection. That's all working just fine, so I know both WAN connections work fine (RRD graphs show the same, and public WiFi are using the cable modem connection). The T1 is a static IP, cable modem is DHCP, plugged into a Sun 4-port NIC on a HP DL380G3 using the hme drivers.
I'd like to set it up so that if the T1 fails, all traffic from both the two LANs that are using the T1 will go over the Cable Modem (I don't want load balancing, as the various Web apps the folks here use don't do well with changing IPs). So I've setup a failover load balance pool (see attached image), and then set the LAN to use the failover pool (see attached). It works fine, all traffic is going out the T1 like it's supposed to. Load balancer status page has both connections green the entire time.
To test failover, I decide to unplug the ethernet connection on T1 router, and then all heck breaks loose. I look at the load balancer status page on the web interface, BOTH connections are red, not just the T1 line, and the connectivity is now long-gone. I plug the cable back into the T1 router, and the pfsense box never re-detects the T1 line unless I do a full reboot, and then things work fine again.
I had similar problems with other NICs, but chalked it up to just cheap realtek nics or poorly packed Intel NICs that I got on eBay cheap. The system works fine otherwise.
Ideas?
-
Bump – sorry to do this, but I'm at a loss here.
-
Maybe you have just misconfigured the pool. It happened to me some times when mis-selecting monitor ip and interfaces.
assuming the wan is 66.39.178.10 and opt1 is 216.87.224.12
Try to set up a new pool:
name: T1FailoverToCable
description: T1 1st, Cable 2nd
type: gateway
behavior: failover
monitor ip: select WAN
interface name: select WANclick add to pool
monitor ip: select OPT1
interface name: select OPT1
click add to pool
click to save
click to applyok, now remove ALL your LAN rules,
add a new ruleaction: PASS
interface: LAN
protocol: any
source: any
destination: any
gateway: T1FailoverToCable
description: T1 1st, Cable 2ndclick to save
click to applytry testing again it should work
btw you should check the load balancer logs too. Status > system logs > Load Balancer -
Sorry about the late reply on this. I re-set it up just like you suggested, and it worked sporadically. I think, however, it's because I tested it differently than I did before. Before to test it I pulled the power to the T1 router. For some reason that knocked both WAN connections offline, according to the load balancer system status page (both WANs are connected to the same NIC – a Sun 64 bit 4-port NIC). When I just unplugged the incoming T1 line from the T1 router (but left the network cable between the T1 router and the pfsense box), it failed over fine to the cable modem, and when I plugged it back in, it (eventually) went back to using the T1 line (it did take quite a while).
I'm going to experiment a bit more tonight, moving the T1 line to a separate NIC (as I have a couple free slots, was just trying to avoid using the hot-swap slots on this system as they seem to have issues with some cards) and seeing if that fixes the problem. But if anybody has any input, I'm all ears :)
-
Update: Kept the T1 on the same 4 Port NIC (along with my Public WiFi connection) and moved the backup cable modem to a separate Intel dual-port NIC (that's all I had handy for a NIC was a dual-port – didn't want to use a RealTek single port). Killed the power to the T1, and it failed over to the cable modem flawlessly within 20 seconds. Unplugged the ethernet cable from the T1 router, same deal. And then everything went back to the T1 line where appropriate.
Don't know what the deal was, but it's failing over fine now. However, I do see "apinger: command (/usr/bin/touch /tmp/filter_dirty) exited with status: 1" in my load balancer status logs, so I'll have to look into that.
In the future, I just need to make sure if I'm going to use cheap NICs (that Sun 4-port was less than $10), I don't put both my WAN connections on a single card as that seems to freak the thing out.