CARP issues on WatchGuard x550e
-
I am tearing my hair out trying to get CARP to behave between my pair of x550e boxes with 2.2.4.
My systems have the WAN interface on sk0, LAN on sk1 and SYNC on sk3; sk2 is unused. At first everything will seem fine with my primary box as the master and the secondary in backup. However, at some point (usually within 30 mins) the master will stop responding to traffic on the LAN port and fail the LAN over to the secondary. Soon after (if it didn't fail already), the secondary will also stop responding to traffic on the LAN port and attempt to fail back over. In almost every case I will quickly end up with two boxes stuck in INIT on the LAN and sometimes WAN ports, neither on responding to traffic on either the CARP IP or the IP on the interface.
In trying to debug it from the console, I don't see anything odd. The ifconfig output for the interface looks correct, and if I run tcpdump I will see traffic on the LAN interface. However, it won't respond to ping and cannot ping out of the interface.
Even stranger, if I power off the secondary and reboot the primary, I will still see behaviour like the above. This morning for example I was running only the primary and I lost all connectivity on the LAN interface. When I went in through a backdoor I have setup on the WAN interface because of this, I saw the LAN interface in INIT on the CARP dashboard and I couldn't ping in or out on the LAN.
If I run a single device with just aliases on the interfaces instead of configuring it for CARP, it works fine.
I am running IPv6 as well, I haven't tried removing the v6 alias from the LAN port to see if that is the problem. Since my IPv6 is over a tunnel I don't have it configured for CARP since I don't know of any way to have the tunnel fail over to the secondary. My LAN port also have a VLAN on it for access to my DSL router, I can drop that for testing as well if anyone thinks it may make a difference.
Here is my current ifconfig with aliases on the master (IP addresses have been obscured). I can reconfigure it for CARP when I get home tonight and post that as well. With the CARP config all of the IPs have unique vhids, 1 2 and 3 respectively for the WAN interface and 10 for the LAN.
sk0: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500 options=8000b <rxcsum,txcsum,vlan_mtu,linkstate>ether 00:90:7f:3e:38:29 inet6 fe80::290:7fff:fe3e:3829%sk0 prefixlen 64 scopeid 0x1 inet a.b.c.243 netmask 0xffffff00 broadcast a.b.d.255 inet a.b.c.240 netmask 0xffffff00 broadcast a.b.d.255 inet a.b.c.241 netmask 0xffffff00 broadcast a.b.d.255 inet a.b.c.242 netmask 0xffffff00 broadcast a.b.d.255 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex>) status: active sk1: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500 options=80009 <rxcsum,vlan_mtu,linkstate>ether 00:90:7f:3e:38:28 inet6 fe80::290:7fff:fe3e:3828%sk1 prefixlen 64 scopeid 0x2 inet e.f.g.4 netmask 0xffffff00 broadcast e.f.g.255 inet6 2001:x:y:z::4 prefixlen 64 inet e.f.g.1 netmask 0xffffff00 broadcast e.f.g.255 inet6 2001:x:y:z::1 prefixlen 64 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex>) status: active sk2: flags=8802 <broadcast,simplex,multicast>metric 0 mtu 1500 options=8000b <rxcsum,txcsum,vlan_mtu,linkstate>ether 00:90:7f:3e:38:27 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (none) status: no carrier sk3: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 options=8000b <rxcsum,txcsum,vlan_mtu,linkstate>ether 00:90:7f:3e:38:26 inet6 fe80::290:7fff:fe3e:3826%sk3 prefixlen 64 scopeid 0x4 inet 172.16.4.1 netmask 0xffffff00 broadcast 172.16.4.255 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (none) status: no carrier pflog0: flags=100 <promisc>metric 0 mtu 33172 pfsync0: flags=0<> metric 0 mtu 1500 syncpeer: 224.0.0.240 maxupd: 128 defer: on syncok: 1 enc0: flags=0<> metric 0 mtu 1536 nd6 options=21 <performnud,auto_linklocal>lo0: flags=8049 <up,loopback,running,multicast>metric 0 mtu 16384 options=600003 <rxcsum,txcsum,rxcsum_ipv6,txcsum_ipv6>inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x8 nd6 options=21 <performnud,auto_linklocal>sk1_vlan192: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 ether 00:90:7f:3e:38:28 inet6 fe80::290:7fff:fe3e:3828%sk1_vlan192 prefixlen 64 scopeid 0x9 inet 192.168.42.4 netmask 0xffffff00 broadcast 192.168.42.255 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 192 vlanpcp: 0 parent interface: sk1 gif0: flags=8051 <up,pointopoint,running,multicast>metric 0 mtu 1280 tunnel inet a.b.c.240 --> 208.201.234.221 inet6 2001:5a8:0:1::e15 --> 2001:5a8:0:1::e14 prefixlen 128 inet6 fe80::290:7fff:fe3e:3829%gif0 prefixlen 64 scopeid 0xa nd6 options=21 <performnud,auto_linklocal>bridge0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 ether 02:54:4e:4d:39:00 nd6 options=1 <performnud>id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: ovpns2 flags=143 <learning,discover,autoedge,autoptp>ifmaxaddr 0 port 13 priority 128 path cost 2000000 member: sk1 flags=143 <learning,discover,autoedge,autoptp>ifmaxaddr 0 port 2 priority 128 path cost 55 ovpns1: flags=8051 <up,pointopoint,running,multicast>metric 0 mtu 1500 options=80000 <linkstate>inet6 fe80::290:7fff:fe3e:3829%ovpns1 prefixlen 64 scopeid 0xc inet i.j.k.17 --> i.j.k.18 netmask 0xfffffff0 inet6 2001:x:y:z::43:1 prefixlen 64 nd6 options=21 <performnud,auto_linklocal>Opened by PID 21120 ovpns2: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500 options=80000 <linkstate>ether 00:bd:f9:e1:00:02 inet6 fe80::2bd:f9ff:fee1:2%ovpns2 prefixlen 64 scopeid 0xd nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect status: active Opened by PID 21616</performnud,auto_linklocal></linkstate></up,broadcast,running,promisc,simplex,multicast></performnud,auto_linklocal></linkstate></up,pointopoint,running,multicast></learning,discover,autoedge,autoptp></learning,discover,autoedge,autoptp></performnud></up,broadcast,running,simplex,multicast></performnud,auto_linklocal></up,pointopoint,running,multicast></full-duplex></performnud,auto_linklocal></up,broadcast,running,simplex,multicast></performnud,auto_linklocal></rxcsum,txcsum,rxcsum_ipv6,txcsum_ipv6></up,loopback,running,multicast></performnud,auto_linklocal></promisc></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,linkstate></up,broadcast,running,simplex,multicast></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,linkstate></broadcast,simplex,multicast></full-duplex></performnud,auto_linklocal></rxcsum,vlan_mtu,linkstate></up,broadcast,running,promisc,simplex,multicast></full-duplex></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,linkstate></up,broadcast,running,promisc,simplex,multicast>
-
Here is the ifconfig output with CARP re-configured. We'll see how long it lasts before the LAN interface hangs…
sk0: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500 options=8000b <rxcsum,txcsum,vlan_mtu,linkstate>ether 00:90:7f:3e:38:29 inet6 fe80::290:7fff:fe3e:3829%sk0 prefixlen 64 scopeid 0x1 inet a.b.c.243 netmask 0xffffff00 broadcast a.b.c.255 inet a.b.c.241 netmask 0xffffff00 broadcast a.b.c.255 vhid 2 inet a.b.c.242 netmask 0xffffff00 broadcast a.b.c.255 vhid 3 inet a.b.c.240 netmask 0xffffff00 broadcast a.b.c.255 vhid 1 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex>) status: active carp: MASTER vhid 2 advbase 1 advskew 0 carp: MASTER vhid 3 advbase 1 advskew 0 carp: MASTER vhid 1 advbase 1 advskew 0 sk1: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500 options=80009 <rxcsum,vlan_mtu,linkstate>ether 00:90:7f:3e:38:28 inet6 fe80::290:7fff:fe3e:3828%sk1 prefixlen 64 scopeid 0x2 inet e.f.g.4 netmask 0xffffff00 broadcast e.f.g.255 inet6 2001:5a8:4:70a0::4 prefixlen 64 inet e.f.g.1 netmask 0xffffff00 broadcast e.f.g.255 vhid 10 inet6 2001:5a8:4:70a0::1 prefixlen 64 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex>) status: active carp: MASTER vhid 10 advbase 1 advskew 0 sk2: flags=8802 <broadcast,simplex,multicast>metric 0 mtu 1500 options=8000b <rxcsum,txcsum,vlan_mtu,linkstate>ether 00:90:7f:3e:38:27 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (none) status: no carrier sk3: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 options=8000b <rxcsum,txcsum,vlan_mtu,linkstate>ether 00:90:7f:3e:38:26 inet6 fe80::290:7fff:fe3e:3826%sk3 prefixlen 64 scopeid 0x4 inet 172.16.4.1 netmask 0xffffff00 broadcast 172.16.4.255 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (none) status: no carrier pflog0: flags=100 <promisc>metric 0 mtu 33172 pfsync0: flags=0<> metric 0 mtu 1500 syncpeer: 224.0.0.240 maxupd: 128 defer: on syncok: 1 enc0: flags=0<> metric 0 mtu 1536 nd6 options=21 <performnud,auto_linklocal>lo0: flags=8049 <up,loopback,running,multicast>metric 0 mtu 16384 options=600003 <rxcsum,txcsum,rxcsum_ipv6,txcsum_ipv6>inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x8 nd6 options=21 <performnud,auto_linklocal>sk1_vlan192: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 ether 00:90:7f:3e:38:28 inet6 fe80::290:7fff:fe3e:3828%sk1_vlan192 prefixlen 64 scopeid 0x9 inet 192.168.42.4 netmask 0xffffff00 broadcast 192.168.42.255 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex>) status: active vlan: 192 vlanpcp: 0 parent interface: sk1 gif0: flags=8051 <up,pointopoint,running,multicast>metric 0 mtu 1280 tunnel inet a.b.c.240 --> 208.201.234.221 inet6 2001:5a8:0:1::e15 --> 2001:5a8:0:1::e14 prefixlen 128 inet6 fe80::290:7fff:fe3e:3829%gif0 prefixlen 64 scopeid 0xa nd6 options=21 <performnud,auto_linklocal>bridge0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 ether 02:08:2c:92:bd:00 nd6 options=1 <performnud>id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: ovpns2 flags=143 <learning,discover,autoedge,autoptp>ifmaxaddr 0 port 13 priority 128 path cost 2000000 member: sk1 flags=143 <learning,discover,autoedge,autoptp>ifmaxaddr 0 port 2 priority 128 path cost 55 ovpns1: flags=8051 <up,pointopoint,running,multicast>metric 0 mtu 1500 options=80000 <linkstate>inet6 fe80::290:7fff:fe3e:3829%ovpns1 prefixlen 64 scopeid 0xc inet i.j.k.17 --> i.j.k.18 netmask 0xfffffff0 inet6 2001:5a8:4:70a0::43:1 prefixlen 64 nd6 options=21 <performnud,auto_linklocal>Opened by PID 34443 ovpns2: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500 options=80000 <linkstate>ether 00:bd:2f:e2:00:02 inet6 fe80::2bd:2fff:fee2:2%ovpns2 prefixlen 64 scopeid 0xd nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect status: active Opened by PID 20343</performnud,auto_linklocal></linkstate></up,broadcast,running,promisc,simplex,multicast></performnud,auto_linklocal></linkstate></up,pointopoint,running,multicast></learning,discover,autoedge,autoptp></learning,discover,autoedge,autoptp></performnud></up,broadcast,running,simplex,multicast></performnud,auto_linklocal></up,pointopoint,running,multicast></full-duplex></performnud,auto_linklocal></up,broadcast,running,simplex,multicast></performnud,auto_linklocal></rxcsum,txcsum,rxcsum_ipv6,txcsum_ipv6></up,loopback,running,multicast></performnud,auto_linklocal></promisc></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,linkstate></up,broadcast,running,simplex,multicast></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,linkstate></broadcast,simplex,multicast></full-duplex></performnud,auto_linklocal></rxcsum,vlan_mtu,linkstate></up,broadcast,running,promisc,simplex,multicast></full-duplex></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,linkstate></up,broadcast,running,promisc,simplex,multicast>