Carp problems in testing releases



  • Hi all.

    I did a fresh install of BETA1 testing release from 5.2.2006, and suddenly strange problems with carp interfaces raised, that were not fixed even in yesterday snapshot (full upgrade):

    I have two hosts and 4 carp addresses "between" them. This was working well with official BETA1 release…

    Now, whenever i bring up both hosts, on primary host I can see preempt suppression number rising, and all carp addresses, that are on VLAN interfaces goes to INIT state, carp interface on WAN goes to master and on second node all interfaces goes to master, so I have WAN carp IP master on both hosts...

    Any idea?

    /jan



  • Known bug.  Update to a snapshot release:  http://www.pfsense.com/~sullrich/1.0-BETA1-TESTING-SNAPSHOT-2-8-06/



  • @sullrich:

    Known bug.  Update to a snapshot release:  http://www.pfsense.com/~sullrich/1.0-BETA1-TESTING-SNAPSHOT-2-8-06/

    I did full upgrade from 2-5-06 to 2-8-06, but problem is still there. Do you recommend full install?

    /jan



  • We alway recommend a fresh full install from scratch if problems that should be fixed are still there after an upgrade.



  • @hoba:

    We alway recommend a fresh full install from scratch if problems that should be fixed are still there after an upgrade.

    Did a fresh install and things got a little bit better.

    Now carp interfaces are not in INIT state, but in BACKUP (on primary node).

    Well, let me explain, what is going on here:

    two nodes, pf1 and pf2 (pf1 is primary).

    one carp IP on WAN (physical WAN) and 3 carp IP's on VLAN interfaces.

    After fresh install, same thing happened, WAN carp interface is MASTER on both nodes, other carp interfaces are in backup state on pf1 (primary) and in master state on pf2.

    Soon after boot, advskew on all interfaces on pf1 is 0 and 100 on pf2 (which is correct). Then, after some time (20 -30 seconds) advskew on pf1 goes to 200 and on pf2 remains 100. Supress.carp value on pf1 goes to something higher than 0 and if I disable and enable carp on pf1 then all goes to INIT (unless WAN carp interface, that goes to "secondary master").

    I tried to delete all carp interfaces, reboot the nodes and create them from scratch… when I create carp interface on WAN, synchronisation works well and it appears on pf2 without any problem. With only one interface setup is perfectly working. When I add another carp interface on any VLAN, strange things starts to happen, like I described earlier.

    Any idea?

    Thnx, /jan



  • Ran into this today.  If the interfaces are not plugged in (linked up) then you're ips will come up in the state of INIT.

    Do you have all the interfaces plugged up?



  • @sullrich:

    Ran into this today.  If the interfaces are not plugged in (linked up) then you're ips will come up in the state of INIT.

    Do you have all the interfaces plugged up?

    Yup, all interfaces plugged… WAN is on copper 100TX, all VLANS are on LAN, that is on fiber 1000TX.

    All connected and running...

    /jan



  • Then the only thing I can think of is that you need to make sure that an ip on the same carp subnet exists on a real interface.



  • @sullrich:

    Then the only thing I can think of is that you need to make sure that an ip on the same carp subnet exists on a real interface.

    I have no IP on physical LAN interface, only on VLAN's. If I do ifconfig -a in shell, there is no IP on physical LAN, I see no need for that, because I run VLAN's there and it is a trunk port… anything wrong with that?

    /jan



  • Now I have BETA2 and things got a little bit better… but still, when I create carp interface on primary, it syncs to secondary and after that advskew on primary goes to 200 and becomes backup (wrong).

    How do I enable debugging and/or logging to file, what scripts are really doing in background?

    Thnx, /jan



  • OK, gang… fixed.

    I just pulled out Intel dual FC giga-ethernet card, installed normal intel 100BaseTX card, re-assigned properly all interfaces and we are back online, up&running without any trouble.

    Seems, that there were some errors on card (or GBIC, or port on a switch), that raised net.inet.carp.suppress_preempt value > 0

    Thnx for all help, I just thought you might want to hear, what was wrong...

    Keep up with good work, /jan



  • Well, can't sleep before I come to an end when issue arise…

    Here are some answers to my problem with CARP, VLAN and em driver:

    http://docs.freebsd.org/cgi/getmsg.cgi?fetch=25292+0+/usr/local/www/db/text/2005/freebsd-net/20050424.freebsd-net+raw

    Do you think this could be included in pfsense distro?

    Thnx, /jan



  • Have you asked someone to MFC those patches to RELENG_6?  I would rather not add any more custom patches to our current roster.



  • @sullrich:

    Have you asked someone to MFC those patches to RELENG_6?   I would rather not add any more custom patches to our current roster.

    I already opened a support ticket on Intel, they was not able to reproduce the problem, so they wanted me to send them full TCP dump, but now I sent them that link, we'll see, what can be done.

    /jan



  • I have two nodes with a dual copper gigabit card with vlans and carp in em0. Had similar problem, but now they are working fine.

    From dmesg:

    em0: <intel(r) 1000="" pro="" network="" connection="" version="" -="" 3.2.18="">port 0x2400-0x243f mem 0xfe060000-0xfe07ffff,0xfe080000-0xfe0bffff irq 24 at device 5.0 on pci2
    em0: Ethernet address: 00:04:23:c2:25:42
    em1: <intel(r) 1000="" pro="" network="" connection="" version="" -="" 3.2.18="">port 0x2440-0x247f mem 0xfe100000-0xfe11ffff,0xfe0c0000-0xfe0fffff irq 25 at device 5.1 on pci2
    em1: Ethernet address: 00:04:23:c2:25:43

    See http://forum.pfsense.org/index.php?topic=752.0</intel(r)></intel(r)>



  • @NAmorim:

    I have two nodes with a dual copper gigabit card with vlans and carp in em0. Had similar problem, but now they are working fine.

    Copper runs fine, FC is a problem…

    /jan



  • My machine occasionaly produce kernel error when I delete or create a new CARP.
    I missed to write down the error log, because the server stuck and restart by it self.
    After restart, it self check the file structure, found and fix some error in some files.
    Then my pfsense configuration is broke with the error "could not find xml configuration", and I must reinstall that machine.  :'( :'( :'(



  • @NAmorim:

    I have two nodes with a dual copper gigabit card with vlans and carp in em0. Had similar problem, but now they are working fine.

    From dmesg:

    em0: <intel(r) 1000="" pro="" network="" connection="" version="" -="" 3.2.18="">port 0x2400-0x243f mem 0xfe060000-0xfe07ffff,0xfe080000-0xfe0bffff irq 24 at device 5.0 on pci2
    em0: Ethernet address: 00:04:23:c2:25:42
    em1: <intel(r) 1000="" pro="" network="" connection="" version="" -="" 3.2.18="">port 0x2440-0x247f mem 0xfe100000-0xfe11ffff,0xfe0c0000-0xfe0fffff irq 25 at device 5.1 on pci2
    em1: Ethernet address: 00:04:23:c2:25:43

    See http://forum.pfsense.org/index.php?topic=752.0</intel(r)></intel(r)>

    Hello… yes, solution on that link solved quite a bit of a problem, I did an CVS update and behaviour changed dramatically. Now CARP interfaces on FC em2 are not anymore in INIT state, but in weird state. advskew is normal (0 on master and 100 on slave), but states are all MASTER on slave node and some of them MASTER on master node and some of them SLAVE on master node...

    I feel like in release 0.96, CARP on VLAN behaved exactly the same on copper interfaces :)

    snap from ifconfig on master node:
    carp0: flags=49 <up,loopback,running>mtu 1500
            inet 192.168.222.1 netmask 0xffffff00
            carp: BACKUP vhid 1 advbase 1 advskew 0
    carp1: flags=49 <up,loopback,running>mtu 1500
            inet 192.168.223.1 netmask 0xffffffff
            carp: MASTER vhid 2 advbase 1 advskew 0
    carp2: flags=49 <up,loopback,running>mtu 1500
            inet 192.168.224.1 netmask 0xffffff00
            carp: MASTER vhid 3 advbase 1 advskew 0
    carp3: flags=49 <up,loopback,running>mtu 1500
            inet 81.24.100.7 netmask 0xfffffff0
            carp: BACKUP vhid 4 advbase 1 advskew 0

    snap from ifconfig on slave mode:
    carp0: flags=49 <up,loopback,running>mtu 1500
            inet 192.168.222.1 netmask 0xffffff00
            carp: MASTER vhid 1 advbase 1 advskew 100
    carp1: flags=49 <up,loopback,running>mtu 1500
            inet 192.168.223.1 netmask 0xffffffff
            carp: MASTER vhid 2 advbase 1 advskew 100
    carp2: flags=49 <up,loopback,running>mtu 1500
            inet 192.168.224.1 netmask 0xffffff00
            carp: MASTER vhid 3 advbase 1 advskew 100
    carp3: flags=49 <up,loopback,running>mtu 1500
            inet 81.24.100.7 netmask 0xfffffff0
            carp: MASTER vhid 4 advbase 1 advskew 100

    I also have tcpdump from em0 interface, when I enable CARP on slave and master goes from all MASTER to woohooo, if somebody is interested.
    http://haktar.select-tech.si/em2.dump.txt

    /jan</up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running>



  • If the interfaces are in different states then pfsync is not communicating properly.

    A couple things to check:

    1. Use a dedicated sync interface and add allow all rules on it.  Set the sync interfaces into their own subnet 192.168.5.1 and 192.168.5.2 /24 so that they can communicate.
    2. Ping the other interface from each of the boxes to ensure connectivity (ping the 192.168.5.X ips)



  • @sullrich:

    If the interfaces are in different states then pfsync is not communicating properly.

    A couple things to check:

    1. Use a dedicated sync interface and add allow all rules on it.  Set the sync interfaces into their own subnet 192.168.5.1 and 192.168.5.2 /24 so that they can communicate.
    2. Ping the other interface from each of the boxes to ensure connectivity (ping the 192.168.5.X ips)

    All checked. I use dedicated sync interface, connected with cross cable. 10.0.2.0/24 (.2 and .3) IP addreses used on that interfaces.

    Can ping each other on that interfaces.

    I created CARP addresses on primary and those addreses were synced over to slave host.

    Also Rules are synced with no issues…

    Sullrich, if you like, I can give you access to hosts in question, they are not production firewalls, their purpose is only to test those FC cards with pfsense...

    /jan



  • Yes, go ahead and email me the information.



  • @sullrich:

    Yes, go ahead and email me the information.

    You got mail… :)

    /jan


Locked