CARP Problems

NAmorim

Hello all!

I have repeated entries on the system log about CARP, that might be the cause for some problems I'm having with failover.

The issue is that when I configure two boxes with CARP, the keep changing the master and backup states, and sometimes stop with the two nodes being master.

In the system log, I have these entries:

kernel: carp_input: received len 20 < sizeof(struct carp_header)
last message repeated 351 times

I think that this is caused by the VRRP anouncements from a pair of Alteons on one of the networks:

14:51:40.895688 IP 172.22.3.5 > 224.0.0.18: VRRPv2, Advertisement, vrid 3, prio 130, authtype none, intvl 1s, length 20
14:51:40.895691 IP 172.22.3.5 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 114, authtype none, intvl 1s, length 20
14:51:40.895694 IP 172.22.3.5 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 122, authtype none, intvl 1s, length 20
14:51:41.896614 IP 172.22.3.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 253, prio 200, authtype none, intvl 1s, length 36

IP 172.22.3.2 is the firewall. What I also find strange is that the authtype is set to none, when I've configured a password.

This behaviour happends in Beta-1, snapshot 2-19-06, snapshot 2-20-06 and now in Beta-2

Thank you in advance for the help.

billm

CARP and VRRP use the same protocol number. tcpdump decodes it as VRRP is it's the IETF owner of that protocol number. However, the field layout is different although similar up to the VRID field. Sooooo, if you are running VRRP and CARP on the same physical segment, you absolutely must have different VRID/VHID's else they will stomp on each other. Also, some equipment tends to be unhappy with VRRP and CARP sitting on the same segment, but that's typically due to piss poor VRRP implementations on that equipment.

For the record, the two do coexist, quite happily. I run many firewalls w/ CARP on the same physical segments that VRRP (and HSRP - although different protocol I thought it worth mentioning) run on.

–Bill

aldo

just to clarify what bill said the same here we have quiet a few networks with carp and vrrp with no issues

NAmorim

OK… so that log entries are only information. Is there a way to disable those logs?

The problem I'm having with CARP is reproducible in two vmware installs I've made. I create one master VIP on node 1. This is syncronised to node 2 with advskew of 101. Then, in about 20 seconds the advskew of node 1 goes to 200. I disable CARP on node 2, the carp interfaces are destroyed. I re-enable CARP, the interfaces are created but not up. So node 2 stays in INIT state.

If I go to shell and do an "ifconfg carp0 up", sometimes both nodes stay as master. After some time, carp interface on node 2 goes down, to INIT state again.

I can't find why this is happening. I have found another forum post about this problem, that was solved with the removal of some FC intel card. I don't have those, and the same happens in virtual machines.
Looking at CARP documentation in OpenBSD site, there's one option that I don't have in pfsense, the "carpdev". Could that be an issue?

rafael_r

Hi yall.

I'm spending a couple of hours wondering why CARP ain't working on two boxes.
After creating VIP's their advskew are different, interfaces don't come up etc.

Then, I checked the "/etc/inc/interfaces.inc" (using 1.0BETA2, and think on other versions too)
I think this is the reason (lines 408 and 409):
fwrite($fd, "/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast "
. $broadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew 200 " . $password . "\n");
409 mwexec("/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast " . $b
roadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew 200 " . $password);

the advskew is code-fixed to 200, no matter what is in configuration (/conf/config.xml).

So, you can edit the /etc/inc/interfaces.inc, go to line 408 and 409, convert then to this

fwrite($fd, "/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast " . $broadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew " . $vip['advskew'] . " " . $password . "\n");
mwexec("/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast " . $broadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew " . $vip['advskew'] . " " . $password);

it works. correct advskew are now shown in the /tmp/carp.sh and of course, in the "ifconfig" command.

Another point is that interfaces at the slave unit, even if commando "ifconfig carpX up" is on the script, interfaces come up for a while and goes down. As my knowlodge in ifconfig using carp isn't that good, I don't know what it can be. I tried to add some 'sleep 1' after the lines I mentioned before, but it didn't work.

rafael_r

Good news for CARP problems!

After sending my previous message, I look in the intefaces.inc for the second routing called in the "/etc/rc.interfaces_carp_configure".

Well, at line 466 in the intefaces.inc, there's a call to mwexec("ifconfig carpx up") and after this a mwexec() to configure the inteface. What I did was put the "ifconfig up" after the mwexec(), togheter w/ a "sleep 1".

By doing this change, I just realized why carp interfaces went UP/DOWN so fast, and never came up again, staying in INIT status.
The /etc/rc.interfaces_carp_configure actualy call twice the configure for the interfaces. So, interfaces come up, goes down, and come up again, and stay there.

I didn't change the "/etc/rc.inter….' file cuz I don't know the entire bootup sequence. BUT it works now. Graceful takeover for VIP!
:)

If someone knows how, if possible, submit this to author.

Best Regards.

sullrich

http://cvstrac.pfsense.com/chngview?cn=10447

rafael_r

I don't if I'm missing on configuring something, but I cannot ping the VIP, so, not routing through it.

An ifconfig on the carp0 shows:

ifconfig carp0

carp0: flags=49 <up,loopback,running>mtu 1500
inet 192.168.15.254 netmask 0xffffff00
carp: BACKUP vhid 1 advbase 1 advskew 0

At the workstation "arp -a"
192.168.15.254 00-00-5e-00-01-01 dynamic
I see it's the virtual MAC, so, someone is replying.

But the carp0 is bound to the loopback?!?!

#netstat -rn
192.168.15.254 192.168.15.254 UH 0 0 carp0

I think this is the answer why in the system log this message appears:
kernel: arp_rtrequest: bad gateway 192.168.15.254 (!AF_LINK)

The carpdev parameter is not avaiable at the ifconfig.

Any suggestions?
Thank you,

Rafael Vitto Ruthes</up,loopback,running>

sullrich

Did you add firewall rules permitting ICMP to the same interface with the IP that a VIP would bind to?

billm

@rafael_r:

I don't if I'm missing on configuring something, but I cannot ping the VIP, so, not routing through it.

An ifconfig on the carp0 shows:

ifconfig carp0

carp0: flags=49 <up,loopback,running>mtu 1500
inet 192.168.15.254 netmask 0xffffff00
carp: BACKUP vhid 1 advbase 1 advskew 0</up,loopback,running>

It's in backup. It's only advertising on the network if it's in MASTER - check this boxes peer.

–Bill

rafael_r

Sorry, at the moment i copied the output the interface was on backup state, but for sure I'm testing w/ it UP and MASTER.

Sullrich, yes, there's the default rule permit * * * * * …. and for testing purpose, I added another permit icmp * * * * *

The strange thing is that tcpdump doesn't show the ping request packets.
ifconfig and arp commands doesn't show a virtual mac address for the interface, although the mac 00-00-5e-00-01-01 is learnt from the virtual interface...

could any hidden rule keeps blocking requests to the virtual mac? (a layer2 hidden pf rule).

sullrich

Could be the block private ip option in WAN.

NAmorim

Well… my problem is solved. Now the carp interfaces behave how they should.

Thank you all

rafael_r

Hi yall,

by now everything is almost perfect.
to ping to the VIP is possible (the NIC for some reason wasn't replying when VIP should….).

but a problem w/ the preemption is happening...
the two systems see each other, they self-elect master and backup, but after sometime they change it! the master becomes the 'standy' and vice-versa.

when I previous posted some possible correction for the skewadv, it looked like a problem solved.

i don't know what to do...
tks for your attention.

sullrich

We need to see the ifconfig output.

Please show us so we know what we are talking about, otherwise we are pissing in the dark and nobody wants to piss on themselves.

NAmorim

well… I have just upgraded to RELENG_1_SNAPSHOT_04-03-2006.

I started again to have the same problems with carp, with the backup becoming master. I've checked the file /etc/inc/interfaces.inc and in line 409 and 410 the advskew is hard coded to 200. This is an error that was solved in past versions.

sullrich

It stays at 200 for 60-90 seconds on bootup then switches back.

NAmorim

It stays with advskew 200 all the time.

@rafael_r:

Hi yall.

I'm spending a couple of hours wondering why CARP ain't working on two boxes.
After creating VIP's their advskew are different, interfaces don't come up etc.

Then, I checked the "/etc/inc/interfaces.inc" (using 1.0BETA2, and think on other versions too)
I think this is the reason (lines 408 and 409):
fwrite($fd, "/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast "
. $broadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew 200 " . $password . "\n");
409 mwexec("/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast " . $b
roadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew 200 " . $password);

the advskew is code-fixed to 200, no matter what is in configuration (/conf/config.xml).

So, you can edit the /etc/inc/interfaces.inc, go to line 408 and 409, convert then to this

fwrite($fd, "/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast " . $broadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew " . $vip['advskew'] . " " . $password . "\n");
mwexec("/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast " . $broadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew " . $vip['advskew'] . " " . $password);

In the previous version I had this changed. I thought that it was already in cvs, but only the sleep issue was changed.

sullrich

Then you have a configuration issue. Check these issues:

Make sure you have a static address on each of the pfsync interfaces in the same subnet
Try pinging the other end of pfsync to ensure connectivity (if this doesnt work, then stop here and double check everything)
Make sure each CARP ip has the same VHID shared across the cluster per ip
Make sure each CARP pair has the same password

ane

@sullrich:

Then you have a configuration issue. Check these issues:

Make sure you have a static address on each of the pfsync interfaces in the same subnet

Try pinging the other end of pfsync to ensure connectivity (if this doesnt work, then stop here and double check everything)

Make sure each CARP ip has the same VHID shared across the cluster per ip

Make sure each CARP pair has the same password

I have checked all of the obove, but…

        Master
       ___________   ~~~~~
       |     sis2|----DMZ
---WAN-|sis1     |   ~~~~~
   |   |         |                     ~~~~~
   |   |_____sis0|----LAN---------------LAN
   |                   |               ~~~~~
   |                   |           ~~~~~
   |                   |___VLAN0 - pfsync
   |                   |           ~~~~~
   |                   |           
   |                   |           ~~~~~
   |                   |___VLAN1 - WLAN
   |    Backup                     ~~~~~
   |
   |   ___________   ~~~~~
   |   |     sis2|----DMZ
---WAN-|sis1     |   ~~~~~
       |         |                     ~~~~~
       |_____sis0|----LAN---------------LAN
                      |                ~~~~~
                      |           ~~~~~
                      |___VLAN0 - pfsync
                      |           ~~~~~
                      |           
                      |          ~~~~~
                      |___VLAN1 - WLAN
                                 ~~~~~

I configured CARP-VIPs for the DMZ, LAN and WLAN-vlan.

Now I have the same phenomenon as described before:
the boxes keep changing Master/Slave on DMZ and LAN, the backup box being Master most of the time.

On the vlan however, both insist on being master. tcpdump on LAN shows the same strangeness in changing advskew.

I have * * * * * rules for all non-WAN interfaces.

Edit: Here's the ifconfig output of the Master


ifconfig 
sis0: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
        options=8 <vlan_mtu>inet6 fe80::20d:b9ff:fe02:7a8c%sis0 prefixlen 64 scopeid 0x1 
        inet 10.1.1.1 netmask 0xffff0000 broadcast 10.1.255.255
        ether 00:0d:b9:02:7a:8c
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
sis1: flags=8843 <up,broadcast,running,simplex,multicast>mtu 1500
        options=8 <vlan_mtu>inet6 fe80::20d:b9ff:fe02:7a8d%sis1 prefixlen 64 scopeid 0x2 
        ether 00:0d:b9:02:7a:8d
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
sis2: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
        options=8 <vlan_mtu>inet 10.5.1.1 netmask 0xffff0000 broadcast 10.5.255.255
        inet6 fe80::20d:b9ff:fe02:7a8e%sis2 prefixlen 64 scopeid 0x3 
        ether 00:0d:b9:02:7a:8e
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
pfsync0: flags=41 <up,running>mtu 1348
        pfsync: syncdev: vlan0 maxupd: 128
lo0: flags=8049 <up,loopback,running,multicast>mtu 16384
        inet 127.0.0.1 netmask 0xff000000 
        inet6 ::1 prefixlen 128 
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 
pflog0: flags=100 <promisc>mtu 33208
vlan0: flags=8843 <up,broadcast,running,simplex,multicast>mtu 1500
        inet 192.168.254.1 netmask 0xffffff00 broadcast 192.168.254.255
        inet6 fe80::20d:b9ff:fe02:7a8c%vlan0 prefixlen 64 scopeid 0x7 
        ether 00:0d:b9:02:7a:8c
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
        vlan: 30 parent interface: sis0
vlan1: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
        inet 10.4.1.1 netmask 0xffff0000 broadcast 10.4.255.255
        inet6 fe80::20d:b9ff:fe02:7a8c%vlan1 prefixlen 64 scopeid 0x8 
        ether 00:0d:b9:02:7a:8c
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
        vlan: 4 parent interface: sis0
ng0: flags=88d1 <up,pointopoint,running,noarp,simplex,multicast>mtu 1492
        inet6 fe80::20d:b9ff:fe02:7a8c%ng0 prefixlen 64 scopeid 0x9 
        inet 80.136.201.83 --> 217.0.116.148 netmask 0xffffffff 
carp0: flags=49 <up,loopback,running>mtu 1500
        inet 10.1.1.10 netmask 0xffff0000 
        carp: BACKUP vhid 1 advbase 1 advskew 200
carp1: flags=49 <up,loopback,running>mtu 1500
        inet 10.4.1.10 netmask 0xffff0000 
        carp: BACKUP vhid 4 advbase 1 advskew 200
carp2: flags=49 <up,loopback,running>mtu 1500
        inet 10.5.1.10 netmask 0xffff0000 
        carp: MASTER vhid 5 advbase 1 advskew 200</up,loopback,running></up,loopback,running></up,loopback,running></up,pointopoint,running,noarp,simplex,multicast></full-duplex></up,broadcast,running,promisc,simplex,multicast></full-duplex></up,broadcast,running,simplex,multicast></promisc></up,loopback,running,multicast></up,running></full-duplex></vlan_mtu></up,broadcast,running,promisc,simplex,multicast></full-duplex></vlan_mtu></up,broadcast,running,simplex,multicast></full-duplex></vlan_mtu></up,broadcast,running,promisc,simplex,multicast>

ifconfig on Backup


sis0: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
        options=8 <vlan_mtu>inet6 fe80::20d:b9ff:fe02:8094%sis0 prefixlen 64 scopeid 0x1 
        inet 10.1.1.5 netmask 0xffff0000 broadcast 10.1.255.255
        ether 00:0d:b9:02:80:94
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
sis1: flags=8843 <up,broadcast,running,simplex,multicast>mtu 1500
        options=8 <vlan_mtu>inet6 fe80::20d:b9ff:fe02:8095%sis1 prefixlen 64 scopeid 0x2 
        ether 00:0d:b9:02:80:95
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
sis2: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
        options=8 <vlan_mtu>inet 10.5.1.5 netmask 0xffff0000 broadcast 10.5.255.255
        inet6 fe80::20d:b9ff:fe02:8096%sis2 prefixlen 64 scopeid 0x3 
        ether 00:0d:b9:02:80:96
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
pfsync0: flags=41 <up,running>mtu 1348
        pfsync: syncdev: vlan0 maxupd: 128
lo0: flags=8049 <up,loopback,running,multicast>mtu 16384
        inet 127.0.0.1 netmask 0xff000000 
        inet6 ::1 prefixlen 128 
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 
pflog0: flags=100 <promisc>mtu 33208
vlan0: flags=8843 <up,broadcast,running,simplex,multicast>mtu 1500
        inet 192.168.254.2 netmask 0xffffff00 broadcast 192.168.254.255
        inet6 fe80::20d:b9ff:fe02:8094%vlan0 prefixlen 64 scopeid 0x7 
        ether 00:0d:b9:02:80:94
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
        vlan: 30 parent interface: sis0
vlan1: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
        inet 10.4.1.5 netmask 0xffff0000 broadcast 10.4.255.255
        inet6 fe80::20d:b9ff:fe02:8094%vlan1 prefixlen 64 scopeid 0x8 
        ether 00:0d:b9:02:80:94
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
        vlan: 4 parent interface: sis0
ng0: flags=8890 <pointopoint,noarp,simplex,multicast>mtu 1500
carp0: flags=49 <up,loopback,running>mtu 1500
        inet 10.1.1.10 netmask 0xffff0000 
        carp: MASTER vhid 1 advbase 1 advskew 200
carp1: flags=49 <up,loopback,running>mtu 1500
        inet 10.4.1.10 netmask 0xffff0000 
        carp: MASTER vhid 4 advbase 1 advskew 200
carp2: flags=49 <up,loopback,running>mtu 1500
        inet 10.5.1.10 netmask 0xffff0000 
        carp: MASTER vhid 5 advbase 1 advskew 200</up,loopback,running></up,loopback,running></up,loopback,running></pointopoint,noarp,simplex,multicast></full-duplex></up,broadcast,running,promisc,simplex,multicast></full-duplex></up,broadcast,running,simplex,multicast></promisc></up,loopback,running,multicast></up,running></full-duplex></vlan_mtu></up,broadcast,running,promisc,simplex,multicast></full-duplex></vlan_mtu></up,broadcast,running,simplex,multicast></full-duplex></vlan_mtu></up,broadcast,running,promisc,simplex,multicast>

I can ping the DMZ if from Master to Backup, but not vice versa.

tcpdump on LAN:

23:32:04.572009 IP Backup > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 20, authtype none, intvl 1s, length 36
23:32:05.698596 IP Backup > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 20, authtype none, intvl 1s, length 36
23:32:06.824884 IP Backup > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 20, authtype none, intvl 1s, length 36
23:32:10.613710 IP master > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 240, authtype none, intvl 1s, length 36
23:32:12.354547 IP master > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 240, authtype none, intvl 1s, length 36
23:32:14.300326 IP master > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 240, authtype none, intvl 1s, length 36
….
...
23:35:17.600611 IP master > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 240, authtype none, intvl 1s, length 36
23:35:19.546316 IP master > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 240, authtype none, intvl 1s, length 36
23:35:21.492071 IP master > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 240, authtype none, intvl 1s, length 36
23:35:21.492303 IP Backup > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 200, authtype none, intvl 1s, length 36
23:35:23.335285 IP Backup > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 200, authtype none, intvl 1s, length 36
23:35:25.076075 IP Backup > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 200, authtype none, intvl 1s, length 36

Setup is currently BETA4