CARP Problems



  • Hello all!

    I have repeated entries on the system log about CARP, that might be the cause for some problems I'm having with failover.

    The issue is that when I configure two boxes with CARP, the keep changing the master and backup states, and sometimes stop with the two nodes being master.

    In the system log, I have these entries:

    kernel: carp_input: received len 20 < sizeof(struct carp_header)
        last message repeated 351 times

    I think that this is caused by the VRRP anouncements from a pair of Alteons on one of the networks:

    14:51:40.895688 IP 172.22.3.5 > 224.0.0.18: VRRPv2, Advertisement, vrid 3, prio 130, authtype none, intvl 1s, length 20
    14:51:40.895691 IP 172.22.3.5 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 114, authtype none, intvl 1s, length 20
    14:51:40.895694 IP 172.22.3.5 > 224.0.0.18: VRRPv2, Advertisement, vrid 4, prio 122, authtype none, intvl 1s, length 20
    14:51:41.896614 IP 172.22.3.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 253, prio 200, authtype none, intvl 1s, length 36

    IP 172.22.3.2 is the firewall. What I also find strange is that the authtype is set to none, when I've configured a password.

    This behaviour happends in Beta-1, snapshot 2-19-06, snapshot 2-20-06 and now in Beta-2

    Thank you in advance for the help.



  • CARP and VRRP use the same protocol number.  tcpdump decodes it as VRRP is it's the IETF owner of that protocol number.  However, the field layout is different although similar up to the VRID field.  Sooooo, if you are running VRRP and CARP on the same physical segment, you absolutely must have different VRID/VHID's else they will stomp on each other.  Also, some equipment tends to be unhappy with VRRP and CARP sitting on the same segment, but that's typically due to piss poor VRRP implementations on that equipment.

    For the record, the two do coexist, quite happily.  I run many firewalls w/ CARP on the same physical segments that VRRP (and HSRP - although different protocol I thought it worth mentioning) run on.

    –Bill



  • just to clarify what bill said the same here we have quiet a few networks with carp and vrrp with no issues



  • OK… so that log entries are only information. Is there a way to disable those logs?

    The problem I'm having with CARP is reproducible in two vmware installs I've made. I create one master VIP on node 1. This is syncronised to node 2 with advskew of 101. Then, in about 20 seconds the advskew of node 1 goes to 200. I disable CARP on node 2, the carp interfaces are destroyed. I re-enable CARP, the interfaces are created but not up. So node 2 stays in INIT state.

    If I go to shell and do an "ifconfg carp0 up", sometimes both nodes stay as master. After some time, carp interface on node 2 goes down, to INIT state again.

    I can't find why this is happening. I have found another forum post about this problem, that was solved with the removal of some FC intel card. I don't have those, and the same happens in virtual machines.
    Looking at CARP documentation in OpenBSD site, there's one option that I don't have in pfsense, the "carpdev". Could that be an issue?



  • Hi yall.

    I'm spending a couple of hours wondering why CARP ain't working on two boxes.
    After creating VIP's their advskew are different, interfaces don't come up etc.

    Then, I checked the "/etc/inc/interfaces.inc" (using 1.0BETA2, and think on other versions too)
    I think this is the reason (lines 408 and 409):
    fwrite($fd, "/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast "
    . $broadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew 200 " . $password . "\n");
        409                        mwexec("/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast " . $b
    roadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew 200 " . $password);

    the advskew is code-fixed to 200, no matter what is in configuration (/conf/config.xml).

    So, you can edit the /etc/inc/interfaces.inc, go to line 408 and 409, convert then to this

    fwrite($fd, "/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast " . $broadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew " . $vip['advskew'] . " " . $password . "\n");
    mwexec("/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast " . $broadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew " . $vip['advskew'] . " " . $password);

    it works. correct advskew are now shown in the /tmp/carp.sh and of course, in the "ifconfig" command.

    Another point is that interfaces at the slave unit, even if commando "ifconfig carpX up" is on the script, interfaces come up for a while and goes down. As my knowlodge in ifconfig using carp isn't that good, I don't know what it can be. I tried to add some 'sleep 1' after the lines I mentioned before, but it didn't work.



  • Good news for CARP problems!

    After sending my previous message, I look in the intefaces.inc for the second routing called in the "/etc/rc.interfaces_carp_configure".

    Well, at line 466 in the intefaces.inc, there's a call to mwexec("ifconfig carpx up") and after this a mwexec() to configure the inteface. What I did was put the "ifconfig up" after the mwexec(), togheter w/ a "sleep 1".

    By doing this change, I just realized why carp interfaces went UP/DOWN so fast, and never came up again, staying in INIT status.
    The /etc/rc.interfaces_carp_configure actualy call twice the configure for the interfaces. So, interfaces come up, goes down, and come up again, and stay there.

    I didn't change the "/etc/rc.inter….' file cuz I don't know the entire bootup sequence. BUT it works now. Graceful takeover for VIP!
    :)

    If someone knows how, if possible, submit this to author.

    Best Regards.





  • I don't if I'm missing on configuring something, but I cannot ping the VIP, so, not routing through it.

    An ifconfig on the carp0 shows:

    ifconfig carp0

    carp0: flags=49 <up,loopback,running>mtu 1500
            inet 192.168.15.254 netmask 0xffffff00
            carp: BACKUP vhid 1 advbase 1 advskew 0

    At the workstation "arp -a"
      192.168.15.254        00-00-5e-00-01-01    dynamic
    I see it's the virtual MAC, so, someone is replying.

    But the carp0 is bound to the loopback?!?!

    #netstat -rn
    192.168.15.254    192.168.15.254    UH          0        0  carp0

    I think this is the answer why in the system log this message appears:
    kernel: arp_rtrequest: bad gateway 192.168.15.254 (!AF_LINK)

    The carpdev parameter is not avaiable at the ifconfig.

    Any suggestions?
    Thank you,

    Rafael Vitto Ruthes</up,loopback,running>



  • Did you add firewall rules permitting ICMP to the same interface with the IP that a VIP would bind to?



  • @rafael_r:

    I don't if I'm missing on configuring something, but I cannot ping the VIP, so, not routing through it.

    An ifconfig on the carp0 shows:

    ifconfig carp0

    carp0: flags=49 <up,loopback,running>mtu 1500
            inet 192.168.15.254 netmask 0xffffff00
            carp: BACKUP vhid 1 advbase 1 advskew 0</up,loopback,running>

    It's in backup.  It's only advertising on the network if it's in MASTER - check this boxes peer.

    –Bill



  • Sorry, at the moment i copied the output the interface was on backup state, but for sure I'm testing w/ it UP and MASTER.

    Sullrich, yes, there's the default rule permit * * * * * …. and for testing purpose, I added another permit icmp * * * * *

    The strange thing is that tcpdump doesn't show the ping request packets.
    ifconfig and arp commands doesn't show a virtual mac address for the interface, although the mac 00-00-5e-00-01-01 is learnt from the virtual interface...

    could any hidden rule keeps blocking requests to the virtual mac? (a layer2 hidden pf rule).



  • Could be the block private ip option in WAN.



  • Well… my problem is solved. Now the carp interfaces behave how they should.

    Thank you all



  • Hi yall,

    by now everything is almost perfect.
    to ping to the VIP is possible (the NIC for some reason wasn't replying when VIP should….).

    but a problem w/ the preemption is happening...
    the two systems see each other, they self-elect master and backup, but after sometime they change it! the master becomes the 'standy' and vice-versa.

    when I previous posted some possible correction for the skewadv, it looked like a problem solved.

    i don't know what to do...
    tks for your attention.



  • We need to see the ifconfig output.

    Please show us so we know what we are talking about, otherwise we are pissing in the dark and nobody wants to piss on themselves.



  • well… I have just upgraded to RELENG_1_SNAPSHOT_04-03-2006.

    I started again to have the same problems with carp, with the backup becoming master. I've checked the file /etc/inc/interfaces.inc and in line 409 and 410 the advskew is hard coded to 200. This is an error that was solved in past versions.



  • It stays at 200 for 60-90 seconds on bootup then switches back.



  • It stays with advskew 200 all the time.

    @rafael_r:

    Hi yall.

    I'm spending a couple of hours wondering why CARP ain't working on two boxes.
    After creating VIP's their advskew are different, interfaces don't come up etc.

    Then, I checked the "/etc/inc/interfaces.inc" (using 1.0BETA2, and think on other versions too)
    I think this is the reason (lines 408 and 409):
    fwrite($fd, "/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast "
    . $broadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew 200 " . $password . "\n");
    409 mwexec("/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast " . $b
    roadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew 200 " . $password);

    the advskew is code-fixed to 200, no matter what is in configuration (/conf/config.xml).

    So, you can edit the /etc/inc/interfaces.inc, go to line 408 and 409, convert then to this

    fwrite($fd, "/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast " . $broadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew " . $vip['advskew'] . " " . $password . "\n");
    mwexec("/sbin/ifconfig carp" . $carp_instances_counter . " " . $vip['subnet'] . "/" . $vip['subnet_bits'] . " broadcast " . $broadcast_address . " vhid " . $vip['vhid'] . "{$carpdev} advskew " . $vip['advskew'] . " " . $password);

    In the previous version I had this changed. I thought that it was already in cvs, but only the sleep issue was changed.



  • Then you have a configuration issue.  Check these issues:

    • Make sure you have a static address on each of the pfsync interfaces in the same subnet
    • Try pinging the other end of pfsync to ensure connectivity (if this doesnt work, then stop here and double check everything)
    • Make sure each CARP ip has the same VHID shared across the cluster per ip
    • Make sure each CARP pair has the same password


  • @sullrich:

    Then you have a configuration issue.   Check these issues:

    • Make sure you have a static address on each of the pfsync interfaces in the same subnet
    • Try pinging the other end of pfsync to ensure connectivity (if this doesnt work, then stop here and double check everything)
    • Make sure each CARP ip has the same VHID shared across the cluster per ip
    • Make sure each CARP pair has the same password

    I have checked all of the obove, but…

            Master
           ___________   ~~~~~
           |     sis2|----DMZ
    ---WAN-|sis1     |   ~~~~~
       |   |         |                     ~~~~~
       |   |_____sis0|----LAN---------------LAN
       |                   |               ~~~~~
       |                   |           ~~~~~
       |                   |___VLAN0 - pfsync
       |                   |           ~~~~~
       |                   |           
       |                   |           ~~~~~
       |                   |___VLAN1 - WLAN
       |    Backup                     ~~~~~
       |
       |   ___________   ~~~~~
       |   |     sis2|----DMZ
    ---WAN-|sis1     |   ~~~~~
           |         |                     ~~~~~
           |_____sis0|----LAN---------------LAN
                          |                ~~~~~
                          |           ~~~~~
                          |___VLAN0 - pfsync
                          |           ~~~~~
                          |           
                          |          ~~~~~
                          |___VLAN1 - WLAN
                                     ~~~~~
    

    I configured CARP-VIPs for the DMZ, LAN and WLAN-vlan.

    Now I have the same phenomenon as described before:
    the boxes keep changing Master/Slave on DMZ and LAN, the backup box being Master most of the time.

    On the vlan however, both insist on being master. tcpdump on LAN shows the same strangeness in changing advskew.

    I have * * * * * rules for all non-WAN interfaces.

    Edit: Here's the ifconfig output of the Master

    
    ifconfig 
    sis0: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
            options=8 <vlan_mtu>inet6 fe80::20d:b9ff:fe02:7a8c%sis0 prefixlen 64 scopeid 0x1 
            inet 10.1.1.1 netmask 0xffff0000 broadcast 10.1.255.255
            ether 00:0d:b9:02:7a:8c
            media: Ethernet autoselect (100baseTX <full-duplex>)
            status: active
    sis1: flags=8843 <up,broadcast,running,simplex,multicast>mtu 1500
            options=8 <vlan_mtu>inet6 fe80::20d:b9ff:fe02:7a8d%sis1 prefixlen 64 scopeid 0x2 
            ether 00:0d:b9:02:7a:8d
            media: Ethernet autoselect (100baseTX <full-duplex>)
            status: active
    sis2: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
            options=8 <vlan_mtu>inet 10.5.1.1 netmask 0xffff0000 broadcast 10.5.255.255
            inet6 fe80::20d:b9ff:fe02:7a8e%sis2 prefixlen 64 scopeid 0x3 
            ether 00:0d:b9:02:7a:8e
            media: Ethernet autoselect (100baseTX <full-duplex>)
            status: active
    pfsync0: flags=41 <up,running>mtu 1348
            pfsync: syncdev: vlan0 maxupd: 128
    lo0: flags=8049 <up,loopback,running,multicast>mtu 16384
            inet 127.0.0.1 netmask 0xff000000 
            inet6 ::1 prefixlen 128 
            inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 
    pflog0: flags=100 <promisc>mtu 33208
    vlan0: flags=8843 <up,broadcast,running,simplex,multicast>mtu 1500
            inet 192.168.254.1 netmask 0xffffff00 broadcast 192.168.254.255
            inet6 fe80::20d:b9ff:fe02:7a8c%vlan0 prefixlen 64 scopeid 0x7 
            ether 00:0d:b9:02:7a:8c
            media: Ethernet autoselect (100baseTX <full-duplex>)
            status: active
            vlan: 30 parent interface: sis0
    vlan1: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
            inet 10.4.1.1 netmask 0xffff0000 broadcast 10.4.255.255
            inet6 fe80::20d:b9ff:fe02:7a8c%vlan1 prefixlen 64 scopeid 0x8 
            ether 00:0d:b9:02:7a:8c
            media: Ethernet autoselect (100baseTX <full-duplex>)
            status: active
            vlan: 4 parent interface: sis0
    ng0: flags=88d1 <up,pointopoint,running,noarp,simplex,multicast>mtu 1492
            inet6 fe80::20d:b9ff:fe02:7a8c%ng0 prefixlen 64 scopeid 0x9 
            inet 80.136.201.83 --> 217.0.116.148 netmask 0xffffffff 
    carp0: flags=49 <up,loopback,running>mtu 1500
            inet 10.1.1.10 netmask 0xffff0000 
            carp: BACKUP vhid 1 advbase 1 advskew 200
    carp1: flags=49 <up,loopback,running>mtu 1500
            inet 10.4.1.10 netmask 0xffff0000 
            carp: BACKUP vhid 4 advbase 1 advskew 200
    carp2: flags=49 <up,loopback,running>mtu 1500
            inet 10.5.1.10 netmask 0xffff0000 
            carp: MASTER vhid 5 advbase 1 advskew 200</up,loopback,running></up,loopback,running></up,loopback,running></up,pointopoint,running,noarp,simplex,multicast></full-duplex></up,broadcast,running,promisc,simplex,multicast></full-duplex></up,broadcast,running,simplex,multicast></promisc></up,loopback,running,multicast></up,running></full-duplex></vlan_mtu></up,broadcast,running,promisc,simplex,multicast></full-duplex></vlan_mtu></up,broadcast,running,simplex,multicast></full-duplex></vlan_mtu></up,broadcast,running,promisc,simplex,multicast> 
    

    ifconfig on Backup

    
    sis0: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
            options=8 <vlan_mtu>inet6 fe80::20d:b9ff:fe02:8094%sis0 prefixlen 64 scopeid 0x1 
            inet 10.1.1.5 netmask 0xffff0000 broadcast 10.1.255.255
            ether 00:0d:b9:02:80:94
            media: Ethernet autoselect (100baseTX <full-duplex>)
            status: active
    sis1: flags=8843 <up,broadcast,running,simplex,multicast>mtu 1500
            options=8 <vlan_mtu>inet6 fe80::20d:b9ff:fe02:8095%sis1 prefixlen 64 scopeid 0x2 
            ether 00:0d:b9:02:80:95
            media: Ethernet autoselect (100baseTX <full-duplex>)
            status: active
    sis2: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
            options=8 <vlan_mtu>inet 10.5.1.5 netmask 0xffff0000 broadcast 10.5.255.255
            inet6 fe80::20d:b9ff:fe02:8096%sis2 prefixlen 64 scopeid 0x3 
            ether 00:0d:b9:02:80:96
            media: Ethernet autoselect (100baseTX <full-duplex>)
            status: active
    pfsync0: flags=41 <up,running>mtu 1348
            pfsync: syncdev: vlan0 maxupd: 128
    lo0: flags=8049 <up,loopback,running,multicast>mtu 16384
            inet 127.0.0.1 netmask 0xff000000 
            inet6 ::1 prefixlen 128 
            inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 
    pflog0: flags=100 <promisc>mtu 33208
    vlan0: flags=8843 <up,broadcast,running,simplex,multicast>mtu 1500
            inet 192.168.254.2 netmask 0xffffff00 broadcast 192.168.254.255
            inet6 fe80::20d:b9ff:fe02:8094%vlan0 prefixlen 64 scopeid 0x7 
            ether 00:0d:b9:02:80:94
            media: Ethernet autoselect (100baseTX <full-duplex>)
            status: active
            vlan: 30 parent interface: sis0
    vlan1: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
            inet 10.4.1.5 netmask 0xffff0000 broadcast 10.4.255.255
            inet6 fe80::20d:b9ff:fe02:8094%vlan1 prefixlen 64 scopeid 0x8 
            ether 00:0d:b9:02:80:94
            media: Ethernet autoselect (100baseTX <full-duplex>)
            status: active
            vlan: 4 parent interface: sis0
    ng0: flags=8890 <pointopoint,noarp,simplex,multicast>mtu 1500
    carp0: flags=49 <up,loopback,running>mtu 1500
            inet 10.1.1.10 netmask 0xffff0000 
            carp: MASTER vhid 1 advbase 1 advskew 200
    carp1: flags=49 <up,loopback,running>mtu 1500
            inet 10.4.1.10 netmask 0xffff0000 
            carp: MASTER vhid 4 advbase 1 advskew 200
    carp2: flags=49 <up,loopback,running>mtu 1500
            inet 10.5.1.10 netmask 0xffff0000 
            carp: MASTER vhid 5 advbase 1 advskew 200</up,loopback,running></up,loopback,running></up,loopback,running></pointopoint,noarp,simplex,multicast></full-duplex></up,broadcast,running,promisc,simplex,multicast></full-duplex></up,broadcast,running,simplex,multicast></promisc></up,loopback,running,multicast></up,running></full-duplex></vlan_mtu></up,broadcast,running,promisc,simplex,multicast></full-duplex></vlan_mtu></up,broadcast,running,simplex,multicast></full-duplex></vlan_mtu></up,broadcast,running,promisc,simplex,multicast> 
    

    I can ping the DMZ if from Master to Backup, but not vice versa.

    tcpdump on LAN:

    23:32:04.572009 IP Backup > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 20, authtype none, intvl 1s, length 36
    23:32:05.698596 IP Backup > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 20, authtype none, intvl 1s, length 36
    23:32:06.824884 IP Backup > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 20, authtype none, intvl 1s, length 36
    23:32:10.613710 IP master > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 240, authtype none, intvl 1s, length 36
    23:32:12.354547 IP master > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 240, authtype none, intvl 1s, length 36
    23:32:14.300326 IP master > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 240, authtype none, intvl 1s, length 36
    ….
    ...
    23:35:17.600611 IP master > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 240, authtype none, intvl 1s, length 36
    23:35:19.546316 IP master > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 240, authtype none, intvl 1s, length 36
    23:35:21.492071 IP master > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 240, authtype none, intvl 1s, length 36
    23:35:21.492303 IP Backup > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 200, authtype none, intvl 1s, length 36
    23:35:23.335285 IP Backup > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 200, authtype none, intvl 1s, length 36
    23:35:25.076075 IP Backup > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 200, authtype none, intvl 1s, length 36

    Setup is currently BETA4



  • I have the same problem (brand new install of beta4).
    After configuring CARP on each firewall, some of the interfaces of the master are in backup mode and some other in master mode, the same appears on the slave firewall.

    if I do an ifconfig carp0 carp1 etc… then I can see that the advskew is set to 200 to all carp interfaces on the two firewalls even if I have set 0 on the master one. Bakcuping the configuring and editing the XML file shows up the right configuration (0 for master VIPs and 200 for slave).

    then if I modify the /tmp/carp.sh on the master by putting the advskew at 0, I destroy all carp interfaces and execute carp.sh all is fine because master is master !

    If I modify the code where the advskee is hard coded on the master firewall then all is fine too.



  • @Juve:

    I have the same problem (brand new install of beta4).
    After configuring CARP on each firewall, some of the interfaces of the master are in backup mode and some other in master mode, the same appears on the slave firewall.

    if I do an ifconfig carp0 carp1 etc… then I can see that the advskew is set to 200 to all carp interfaces on the two firewalls even if I have set 0 on the master one. Bakcuping the configuring and editing the XML file shows up the right configuration (0 for master VIPs and 200 for slave).

    then if I modify the /tmp/carp.sh on the master by putting the advskew at 0, I destroy all carp interfaces and execute carp.sh all is fine because master is master !

    If I modify the code where the advskee is hard coded on the master firewall then all is fine too.

    It will have a advertising skew until the final carp bringup process (about 2 minutes after the firewall is completely booted up).  You can view the progress on the console.

    In terms of having interfaces being master or backup and being wrong, this means that carp is not communicating on the interface themselves.  It needs to be able to broadcast and talk to the other firewall on that interface in question.



  • @sullrich:

    In terms of having interfaces being master or backup and being wrong, this means that carp is not communicating on the interface themselves.  It needs to be able to broadcast and talk to the other firewall on that interface in question.

    How could I test it. Because I'm facing the similar problem, one of my carp interfaces out of the four are "master-master" no matter what I do. Simple ping goest fine to and fro'. Nothing seems to be blocked in the logs. I have already changed NIC's and switches without success.



  • @iimre:

    @sullrich:

    In terms of having interfaces being master or backup and being wrong, this means that carp is not communicating on the interface themselves.  It needs to be able to broadcast and talk to the other firewall on that interface in question.

    How could I test it. Because I'm facing the similar problem, one of my carp interfaces out of the four are "master-master" no matter what I do. Simple ping goest fine to and fro'. Nothing seems to be blocked in the logs. I have already changed NIC's and switches without success.

    If you have not seen the CARP tutorial on our site then you need to follow it.  It will guide you in setting up the primary box which sycns the configuration to the secondaries.  The reason this is important is because it ensures that the advskew and also the vhid are correct across all cluster members.  It also ensures that the passwords match per vhid.  Place a crossover cable between the two wan interfaces.  Does the problem persist?  If so you have a mismatched configuration somewhere.



  • @sullrich:

    If you have not seen the CARP tutorial on our site then you need to follow it.

    I did exatly that.

    It will guide you in setting up the primary box which sycns the configuration to the secondaries.  The reason this is important is because it ensures that the advskew and also the vhid are correct across all cluster members.  It also ensures that the passwords match per vhid.  Place a crossover cable between the two wan interfaces.

    I have already tried this. Not only the wan but all the interface pairs, one by one. I will make some other xover cables tomorrow and will make a try with connecting all interface pairs (WAN, WAN2, DMZ and LAN) with xover (they carp syncronization interface is ofcourse permanently xovered).

    Does the problem persist?

    Yes :(

    If so you have a mismatched configuration somewhere.

    Yes probably, but I have tried to build up several times from scratch, with only the (as I guess) the minimal neccessary configuration. So now I have no idea what could be the problem.
    Anyhow, it seems to function well, on all the two WAN interfaces either from LAN or DMZ, but I afraid that there is a hidden problem which can cause a collapse in the worst moment.



  • Post screen shots of each of the machines virtual ips configuration so we can inspect.



  • @sullrich:

    Post screen shots of each of the machines virtual ips configuration so we can inspect.

    I attached as you asked. I reduced the sizes as possible, hoping that they are still readable.
    Thank you for your help

    Imre


























  • Each of the same ip's need to share the same vhid group…  They are unique in your setup which also tells me that you didnt follow the tutorial as it would have sync'd the configuration to the backup node ensuring this is all the way it should be.    >:(



  • @sullrich:

    Each of the same ip's need to share the same vhid group…  They are unique in your setup which also tells me that you didnt follow the tutorial as it would have sync'd the configuration to the backup node ensuring this is all the way it should be.    >:(

    Sorry .then I probably misunderstandig something :(
    xxx.xxx.xxx.165's VHID=1
    xxx.xxx.xxx.116's VHID=2
    10.0.254.4'd VHID=3
    192.168.0.10's VHID=4
    the same kind of interfaces have the same vhid group number.
    I'm confused. All of the 4 should have the same?



  • @iimre:

    @sullrich:

    Each of the same ip's need to share the same vhid group…  They are unique in your setup which also tells me that you didnt follow the tutorial as it would have sync'd the configuration to the backup node ensuring this is all the way it should be.    >:(

    Sorry .then I probably misunderstandig something :(
    xxx.xxx.xxx.165's VHID=1
    xxx.xxx.xxx.116's VHID=2
    10.0.254.4'd VHID=3
    192.168.0.10's VHID=4
    the same kind of interfaces have the same vhid group number.
    I'm confused. All of the 4 should have the same?

    Each unique IP needs to have its on VHID.  The VHID needs to match on each machine.

    If you are using the Sync option as the tutorial shows, this is all automatic.



  • @sullrich:

    Each unique IP needs to have its on VHID.

    It is.

    The VHID needs to match on each machine.

    They do.

    If you are using the Sync option as the tutorial shows, this is all automatic.

    I did and I see them to be the same, but please let me know which one is not matching. it is probably my fault, but I really don't see.



  • I just want to add something to know before activating sync over XML-RPC. When having a lot of rule in the filter, it is not possible (in terms of 'useability') to use the rule sync over XML-RPC.  I have tested it on a cluster wich have between 700 and 800 rules… when you modify one thing the sync starts and then the firewall goes to 100% CPU (php process) during many many minutes loosing control on everything. This was tested on 2 IBM x336 intel Xeon 3.2Ghz dual core with 2Gb of RAM and 80Gb SATA hard drives.

    What I do is manual sync using partial backups ;-) and it's fine I'm not adding rules every minute ;-)



  • @Juve:

    I just want to add something to know before activating sync over XML-RPC. When having a lot of rule in the filter, it is not possible (in terms of 'useability') to use the rule sync over XML-RPC.  I have tested it on a cluster wich have between 700 and 800 rules… when you modify one thing the sync starts and then the firewall goes to 100% CPU (php process) during many many minutes loosing control on everything. This was tested on 2 IBM x336 intel Xeon 3.2Ghz dual core with 2Gb of RAM and 80Gb SATA hard drives.

    What I do is manual sync using partial backups ;-) and it's fine I'm not adding rules every minute ;-)

    I don't really want to hijack this thread but could you please start a new topic that explains the pain and frustration of managing such a large ruleset in a new topic?  We can begin to brainstorm how to improve this situation.



  • I really hope you don't think I'm complaining. The previous post was just a sort of "advice" for those who have not tried it yet.

    Regards.



  • @Juve:

    I really hope you don't think I'm complaining. The previous post was just a sort of "advice" for those who have not tried it yet.

    Regards.

    Not at all.  I just can imagine that managing that large amount of rules must be painful.  I am looking for information on what you don't like, what is hard to do, etc for future improvements…



  • Hi,

    Just for the record, my problem is solved. It was a ruling mistake on DMZ, ie. a directed all traffic destined to elswhere then LAN or DMZ to the load balancer (WAN1 + WAN2), but this way the traffic to 224.0.0.x went out to the net.
    Thanks for all who tried to help me to solve this problem.


Log in to reply