Dual WAN CARP/HA Config with ARP traffic issues



  • Greetings Everyone,

    I'm hitting an issue that has me truly stumped. We maintain 2 separate WANs for HA purposes. We also maintain two firewalls with several CARP IPs shared between them. Everything has been working quite well and as expected until about a month ago.

    COX maintains our secondary WAN2 interface which we use as a failover or to run certain external tests etc. As of late something suddenly has gone wrong and about once a week the secondary WAN connection just drops. Rebooting the cable modem fixes the issue at hand, but diving deeper into the situation it's uncovered several oddball things I'm not sure I can make sense out of.

    The main issue (I believe) is COX's gateway responds like an overeager overachieving student to EVERY ARP request we have. In the case of our physical WAN2 IP assignments on both firewalls doing an arping and watching the traffic with tcpdump, shows dual responses for every ARP request. 1 from the firewall the other from the COX Gateway.

    60 bytes from FW_WAN2_MAC (FW2_IP): index=0 time=233.037 usec
    60 bytes from COX_GATEWAY_MAC (FW2_IP): index=1 time=10.212 msec
    60 bytes from FW_WAN2_MAC (FW2_IP): index=2 time=163.138 usec
    60 bytes from COX_GATEWAY_MAC (FW2_IP): index=3 time=11.235 msec
    60 bytes from FW_WAN2_MAC (FW2_IP): index=4 time=314.551 usec
    60 bytes from COX_GATEWAY_MAC (FW2_IP): index=5 time=375.887 msec
    60 bytes from FW_WAN2_MAC (FW2_IP): index=6 time=357.209 usec
    60 bytes from COX_GATEWAY_MAC (FW2_IP): index=7 time=11.947 msec
    60 bytes from FW_WAN2_MAC (FW2_IP): index=8 time=171.491 usec
    60 bytes from COX_GATEWAY_MAC (FW2_IP): index=9 time=63.950 msec
    60 bytes from FW_WAN2_MAC (FW2_IP): index=10 time=135.632 usec
    60 bytes from COX_GATEWAY_MAC (FW2_IP): index=11 time=16.134 msec
    60 bytes from FW_WAN2_MAC (FW2_IP): index=12 time=217.011 usec
    60 bytes from COX_GATEWAY_MAC (FW2_IP): index=13 time=10.775 msec
    60 bytes from FW_WAN2_MAC (FW2_IP): index=14 time=340.514 usec
    60 bytes from COX_GATEWAY_MAC (FW2_IP): index=15 time=8.841 msec
    

    TCPDUMP shows that it's giving an ARP response of the same MAC address as the FW2_WAN2_MAC. Not the worst thing in the world.

    11:53:42.317192 ARP, Request who-has FW2_IP tell FW1_WAN2_IP, length 44
    11:53:42.317425 ARP, Reply FW2_IP is-at FW2_WAN2_MAC (oui Unknown), length 46
    11:53:42.327387 ARP, Reply FW2_IP is-at FW2_WAN2_MAC (oui Unknown), length 46
    11:53:43.318565 ARP, Request who-has FW2_IP tell FW1_WAN2_IP, length 44
    11:53:43.318727 ARP, Reply FW2_IP is-at FW2_WAN2_MAC (oui Unknown), length 46
    11:53:43.329796 ARP, Reply FW2_IP is-at FW2_WAN2_MAC (oui Unknown), length 46
    11:53:44.325528 ARP, Request who-has FW2_IP tell FW1_WAN2_IP, length 44
    11:53:44.325841 ARP, Reply FW2_IP is-at FW2_WAN2_MAC (oui Unknown), length 46
    11:53:44.701388 ARP, Reply FW2_IP is-at FW2_WAN2_MAC (oui Unknown), length 46
    11:53:45.327225 ARP, Request who-has FW2_IP tell FW1_WAN2_IP, length 44
    11:53:45.327581 ARP, Reply FW2_IP is-at FW2_WAN2_MAC (oui Unknown), length 46
    11:53:45.339168 ARP, Reply FW2_IP is-at FW2_WAN2_MAC (oui Unknown), length 46
    11:53:46.327606 ARP, Request who-has FW2_IP tell FW1_WAN2_IP, length 44
    
    
    

    That being said. When issuing arpings for ANY of the CARP address the COX_GATEWAY IMMEDIATELY responds with the 00:00:5E:00:01:0xVHID CARP VIP MAC Address for EVERY Single request. Is it normal for the ISP Gateway to be eager to respond to EVERYTHING like that? The thing is. the VIPs that are configured suddenly become unresponsive and network services drop. We do NOT have this issue with our CARP/HA Configs

    I welcome any insight at this point. I've lost a lot of sleep over this one.


  • LAYER 8 Netgate

    The only interface that should respond to ARP is the interface that holds that MAC address.

    If the Comcast device is also responding, it is broken.

    This should NOT be the end of the world if everything it does is perfect from a CARP perspective but I suspect it is not.

    The ONLY frames that should ever be sourced from a CARP MAC address (like 00:00:5E:00:01:0xVHID) Is the CARP advertisement itself from the current MASTER node. No other traffic should ever be sourced from that MAC address.

    ARP responses for the WHO HAS the CARP VIP will be sourced from the interface MAC address and contain 00:00:5E:00:01:0xVHID in the ARP protocol payload in IS AT. What you are posting does not provide enough information because both the ARP payload and the source/dest MAC addresses of the frames themselves all matter here.

    All of this pretty much has to work perfectly. This would not be the first time an ISP device was not compatible with CARP/HA because of games it wants to play.


Log in to reply