CARP + Dual WAN connection failover

tritu

I setup the clusters today and found out that whenever I enable my outbound to use the connection failover with dual wan there seem to be a problem.

Without set the outbound fw rule to use the gateway with the failover pool, it works great. Unplugged the cable, carp vip failover to the other machine, put it back in and it fall back to the master.

However, when I edit the FW rule to use the gate with the failover pool on my dual wan, carp vip was failover and stayed master/master on both machines. Here is the carp interfaces on both machines at that time.

carp0: flags=49 <up,loopback,running>mtu 1500
inet 172.17.10.3 netmask 0xffff0000
carp: MASTER vhid 1 advbase 1 advskew 0

carp0: flags=49 <up,loopback,running>mtu 1500
inet 172.17.10.3 netmask 0xffff0000
carp: MASTER vhid 1 advbase 1 advskew 100

Also, if I rebooted the machine, there will be no outbound from internal LAN subnet machine unless I have to edit something (doesn't matter what to edit - I edited the VIP type other; which has nothing to do with the failover pool) and Save Changes than it started working back. But still master/master on both machines.

It seems to me that there must be something with reload changes somewhere that need to call to update or something. Does anybody have any idea why? Can CARP work with loadbalancer/failover?

BTW, I'm using the 1.0.1-SNAPSHOT-01-24-2007</up,loopback,running></up,loopback,running>

tritu

It seems to me that there is a bug with CARP since the advskew are correctly set on each machine with advskew 0 (master) and advskew 100 (slave). Tcpdump shows the traffic are sending out correctly as vrid & prio but don't know why that the CARP interface are both MASTER/MASTER.

14:47:06.318923 IP 172.17.10.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 36
14:47:06.414624 IP 172.17.10.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
14:47:07.417878 IP 172.17.10.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
14:47:07.711430 IP 172.17.10.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 36

14:47:06.376316 IP 172.17.10.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 36
14:47:06.958311 IP 172.17.10.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
14:47:07.761120 IP 172.17.10.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 36
14:47:07.956013 IP 172.17.10.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36

carp0: flags=49 <up,loopback,running>mtu 1500
inet 172.17.10.3 netmask 0xffff0000
carp: MASTER vhid 1 advbase 1 advskew 0

carp0: flags=49 <up,loopback,running>mtu 1500
inet 172.17.10.3 netmask 0xffff0000
carp: MASTER vhid 1 advbase 1 advskew 100

Hope that someone else there know what is the problem or know how to fix it.</up,loopback,running></up,loopback,running>

sullrich

Generally when this happens its switch related.

I've got 10+ CARP installs, no issues.

tritu

Scott,

Thanks for replying to the message.

If it's the switches, how does it work when I don't set the FW rules to use with connection failover for its Gateway.

LAN: (doesn't work)
* LAN net * * * Failover Lan - Any

LAN: (works fine)
* LAN net * * * * Lan - Any

sullrich

@tritu:

Scott,

Thanks for replying to the message.

If it's the switches, how does it work when I don't set the FW rules to use with connection failover for its Gateway.

LAN: (doesn't work)
* LAN net * * * Failover Lan - Any

LAN: (works fine)
* LAN net * * * * Lan - Any

Sorry but you where talking about CARP. I responded to the CARP question.

tritu

I have found out the root causes. It's not the problem with the switch. It was b/c of the firewall rules. When the FW outbound LAN rule got change to use the Failover pool, the default route is no long effective. When master & slave send out the broadcast message of VRRP to 244.0.0.18, it used the Failover for it routing table and Failover pool is only routed to either WAN1 or WAN2 which doesn't know the route of the internal LAN subnet. That's why stage MASTER/MASTER were on both machines. Once I create the new rule for LAN subnet to allow traffic to 244.0.0.18 using the default gateway, then it fixed the problem.

It's working great now. Disable on master –> switch over to salve. Enable back --> fallback to master.