Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    3rd interface not failing back…

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    21 Posts 4 Posters 31.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S Offline
      sullrich
      last edited by

      Stickying thread.

      1 Reply Last reply Reply Quote 0
      • J Offline
        jakehathaway
        last edited by

        tcpdump -i xl0 -ttt -n proto CARP
        Here is the output of my tcpdump:
        709630 IP 172.16.20.152 > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 200, authtype none, intvl 1s, length 36
        293069 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
        1. 002309 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
        487570 IP 172.16.20.152 > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 200, authtype none, intvl 1s, length 36
        514636 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
        1. 001317 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
        267018 IP 172.16.20.152 > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 200, authtype none, intvl 1s, length 36
        734179 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
        1. 001057 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
        047719 IP 172.16.20.152 > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 200, authtype none, intvl 1s, length 36
        953636 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
        829337 IP 172.16.20.152 > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 200, authtype none, intvl 1s, length 36
        171683 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
        1. 001111 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
        610157 IP 172.16.20.152 > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 200, authtype none, intvl 1s, length 36
        391038 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
        1. 234670 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
        157247 IP 172.16.20.152 > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 200, authtype none, intvl 1s, length 36
        1. 039601 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36

        the 151 is the master machine, the 251 is the machine on the other side of the QMOE link that is the other firewall PFsense box. you can see the vrid is different, so that shouldn't affect it.

        1. Misconfiguration: password, VHID or advskew problems, check it again.

        Checked this, it is correct.

        1. Another device using VRRPv2 is using a VHID you are using, check you network devices or change VHID

        Obviously it is connected to the pfsense on the other side of the qmoe, but not sure if vrid is same as vhid, but I manually checked in the gui for the config of both sides of qmoe and the vhid is different.

        1. You don't see master's packets on the slave node when doing the tcpdump (so the slave node has one or more interface in master mode). You have a communication error between the two machines. Check the switchs, the cables. Or look at problem 4 ;-)

        I see the master packets, see about tcpdump.

        1. You have a NAT rule, natting everything from a source network to a single IP address which IS NOT the interface address and which is in ANOTHER subnet. Should happen on WAN iface most of the time.

        Still checking this. But not sure what that would affect. Will post follow-up in a bit.

        thx for the help with this.

        1 Reply Last reply Reply Quote 0
        • J Offline
          jakehathaway
          last edited by

          As far as NAT routing to single IP, we do not have that on the network that is having trouble.

          As you can see in the image the last rule is for the qmoe and it goes to * (all).

          NAT.png
          NAT.png_thumb

          1 Reply Last reply Reply Quote 0
          • J Offline
            Juve
            last edited by

            Can you give us a network diagram ? You have 4 machines as I can understand, 2 on A d/c and 2 on B d/c

            1 Reply Last reply Reply Quote 0
            • J Offline
              jakehathaway
              last edited by

              Here is a simple drawing. The pf2 box, interface 4 (QMOE) is the only one that doesn't failback.

              net.jpg
              net.jpg_thumb

              1 Reply Last reply Reply Quote 0
              • S Offline
                sullrich
                last edited by

                Are all of the nics the same type?

                1 Reply Last reply Reply Quote 0
                • J Offline
                  jakehathaway
                  last edited by

                  NIC types…
                  pf1 and pf2:
                  int 1 - Intel Pro 100 - WAN
                  int 2 - Intel Pro 100 - LAN
                  int 3 - Intel Pro 100 - pfsync
                  int 4 - 3com 3C905-TX - QMOE

                  pf251 and pf252:
                  int 1 - Intel e1000 - LAN
                  int 2 - Intel e1000 - WAN
                  int 3 - Broadcom Gbit - QMOE
                  int 4 - Broadcom Gbit - pfsync

                  1 Reply Last reply Reply Quote 0
                  • J Offline
                    Juve
                    last edited by

                    Have you checked that either Foundry or HP equipment aren't filtering any type of trafic (like multicast)?

                    1 Reply Last reply Reply Quote 0
                    • J Offline
                      jakehathaway
                      last edited by

                      yep, multicast is working just fine. The foundry side is working fine… pf251,pf252. It is the HP side that is having the failback problem. But we checked the multicast and it is fine. I can also see it in the tcpdump on pf2.

                      1 Reply Last reply Reply Quote 0
                      • J Offline
                        jakehathaway
                        last edited by

                        Is there any other information you can give me? Anything else you might try? Please let me know.

                        1 Reply Last reply Reply Quote 0
                        • S Offline
                          sullrich
                          last edited by

                          Check network equipment on HP side.  Something is being blocked (multicast).

                          1 Reply Last reply Reply Quote 0
                          • J Offline
                            jakehathaway
                            last edited by

                            We are now on duplicate equipment as the other side. Foundry Super X. This did not solve the issue.

                            1 Reply Last reply Reply Quote 0
                            • S Offline
                              sullrich
                              last edited by

                              The equipment is not forwarding or blocking the CARP specific traffic.  Use tcpdump to monitor each machine to see if it is receiving the broadcast traffic.  I bet the switch is the culprit.

                              1 Reply Last reply Reply Quote 0
                              • J Offline
                                jakehathaway
                                last edited by

                                As you can see I did this already and the machines are seeing the carp traffic without any issue.

                                @jakehathaway:

                                tcpdump -i xl0 -ttt -n proto CARP
                                Here is the output of my tcpdump:
                                709630 IP 172.16.20.152 > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 200, authtype none, intvl 1s, length 36
                                293069 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
                                1. 002309 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
                                487570 IP 172.16.20.152 > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 200, authtype none, intvl 1s, length 36
                                514636 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
                                1. 001317 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
                                267018 IP 172.16.20.152 > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 200, authtype none, intvl 1s, length 36
                                734179 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
                                1. 001057 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
                                047719 IP 172.16.20.152 > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 200, authtype none, intvl 1s, length 36
                                953636 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
                                829337 IP 172.16.20.152 > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 200, authtype none, intvl 1s, length 36
                                171683 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
                                1. 001111 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
                                610157 IP 172.16.20.152 > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 200, authtype none, intvl 1s, length 36
                                391038 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
                                1. 234670 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36
                                157247 IP 172.16.20.152 > 224.0.0.18: VRRPv2, Advertisement, vrid 6, prio 200, authtype none, intvl 1s, length 36
                                1. 039601 IP 172.16.20.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 0, authtype none, intvl 1s, length 36

                                the 151 is the master machine, the 251 is the machine on the other side of the QMOE link that is the other firewall PFsense box. you can see the vrid is different, so that shouldn't affect it.

                                1. Misconfiguration: password, VHID or advskew problems, check it again.

                                Checked this, it is correct.

                                1. Another device using VRRPv2 is using a VHID you are using, check you network devices or change VHID

                                Obviously it is connected to the pfsense on the other side of the qmoe, but not sure if vrid is same as vhid, but I manually checked in the gui for the config of both sides of qmoe and the vhid is different.

                                1. You don't see master's packets on the slave node when doing the tcpdump (so the slave node has one or more interface in master mode). You have a communication error between the two machines. Check the switchs, the cables. Or look at problem 4 ;-)

                                I see the master packets, see about tcpdump.

                                1. You have a NAT rule, natting everything from a source network to a single IP address which IS NOT the interface address and which is in ANOTHER subnet. Should happen on WAN iface most of the time.

                                Still checking this. But not sure what that would affect. Will post follow-up in a bit.

                                thx for the help with this.

                                1 Reply Last reply Reply Quote 0
                                • S Offline
                                  sullrich
                                  last edited by

                                  Well then about the only thing I can think of is the NICS in the machine.

                                  BTW: I have major problems with Broadcom nics + CARP at work.  It is a driver issue of some sorts.

                                  1 Reply Last reply Reply Quote 0
                                  • J Offline
                                    jakehathaway
                                    last edited by

                                    So I have completely by-passed routing on the pf box since it isn't working. It works until it gets into the following state.  (see attached pics).

                                    pf1.jpg
                                    pf1.jpg_thumb
                                    pf2.jpg
                                    pf2.jpg_thumb

                                    1 Reply Last reply Reply Quote 0
                                    • S Offline
                                      smilodon
                                      last edited by

                                      Iv seen this one before… sorry to say that im a noob and just figuring it out my own probs at:
                                      http://forum.pfsense.org/index.php/topic,10458.0.html

                                      At my configuration... it happened when the CARP suddenly "worked" after i sorted out some bugs... then again it didnt work. It was when the SYNC interfaces were on 10Mb/s old NICs. And the LAN VIP became master on Backup, WAN and WAN2 were left Master at the Master box. And then when i went to 100/10 NIC's the backup took all the VIP's as master... so it might be something different than your prob.

                                      One question... how would i bypass the "broadcast" thing if it really is the switch or NIC's bad appetite for not eating broadcast packets. ?

                                      1 Reply Last reply Reply Quote 0
                                      • First post
                                        Last post
                                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.