Hacked up HA setup for home



  • I managed to setup a sort-of HA for my home without CARP and with a single dynamic IPv4 and IPv6 from Comcast. I mostly needed it because I mess around with my main server a bunch and my family was getting mad when I killed the internet.

    It is not instant fail-over, it take around 15-30 secs, but it does work.

    Hardware Setup -

    • A managed switch with proper vlans setup for all of the below.
    • Main server with single connection to managed switch in trunk mode.
      • Mine is a beefy server with a 40g connection to the switch.
      • How you run pfsense doesn't really matter, but mine is running in a VM with SR-IOV virtual NICs.
    • Backup server with a single connection to managed switch in trunk mode.
      • Mine is a small NUC with a 1g connection to the switch.
      • Latest Proxmox with open vswitch.
      • pfsense in a VM with regular virtual NICs.
    • Cable modem or whatever connected to managed switch on vlan 99.

    I have 4 vlans, but you can use only 3 if needed (remove CAM).

    VLAN's and their interface assignments and ip addresses -
    99 - WAN - dhcp or whatever
    66 - ROUTERSYNC - primary 10.66.0.1, secondary 10.66.0.2
    none - LAN - 10.0.0.1
    1003 - CAM - 10.whatever

    On each node, setup those vlans and interfaces. They need to be configured exactly the same on both like you normally would, except the IPs in ROUTERSYNC.

    For comcast, the cable modem will only talk to a single MAC address, so make the mac address the same on both nodes for the WAN. This will work fine, see why later. Do this only for the WAN, others should all have different MACs. ARP will be our friend.

    The main node is setup like normal for your regular internet connection. Nothing really fancy there.
    All the fancy stuff is on the secondary.

    Virtual NIC's are required on secondary. You can use pfsense vlans for all except ROUTERSYNC. That needs to be a separate virtual NIC.

    If you want IPv6 to work, setup RA on the secondary exactly like the primary. Do not use ipv6 in the ROUTERSYNC vlan.

    One other thing you probably want to setup is haproxy or something on the primary with a back-end of the web interface on 10.66.0.2 so that you can access the secondary web interface. The secondary will not be able to talk to your LAN by default because it will have an active, but disconnected, interface on your LAN network.

    In proxmox, here is my /etc/network/interfaces

    source /etc/network/interfaces.d/*
    
    auto lo
    iface lo inet loopback
    
    auto enp1s0
    iface enp1s0 inet manual
            ovs_type OVSPort
            ovs_bridge vmbr0
    
    auto mgmt
    iface mgmt inet static
            address 10.0.0.243/24
            gateway 10.0.0.1
            ovs_type OVSIntPort
            ovs_bridge vmbr0
    
    auto patch22_11
    iface patch22_11 inet manual
            ovs_bridge vmbr22
            ovs_type OVSPatchPort
            ovs_patch_peer patch11_22
    
    auto mgmt66
    iface mgmt66 inet static
            address 10.66.0.243/24
            ovs_type OVSIntPort
            ovs_bridge vmbr0
            ovs_options tag=66
    
    auto vmbr0
    iface vmbr0 inet manual
            ovs_type OVSBridge
            ovs_ports enp1s0 mgmt mgmt66
    
    auto vmbr22
    iface vmbr22 inet manual
            ovs_type OVSBridge
    
    iface vmbr11 inet manual
            ovs_type OVSBridge
    
    iface patch11_22 inet manual
            ovs_bridge vmbr11
            ovs_type OVSPatchPort
            ovs_patch_peer patch22_11
    
    iface patch11_0 inet manual
            ovs_bridge vmbr11
            ovs_type OVSPatchPort
            ovs_patch_peer patch0_11
    
    auto patch0_11
    iface patch0_11 inet manual
            ovs_bridge vmbr0
            ovs_type OVSPatchPort
            ovs_patch_peer patch11_0
    

    What you can see is basically 3 virtual switches connected with patch ports.

    • The main switch, vmbr0, is connected to the real NIC, enp1s0.
    • The virtual switch in the middle, vmbr11, and it's patch ports do not auto start.
    • The final virtual switch, vmbr22, is thus not connected to the network unless we start vmbr11 and it's patch ports.
    • Proxmox is connected to the network via vmbr0 and has IP's in both the LAN and ROUTERSYNC vlans.

    Here is the network definitions for my pfsense vm on the secondary.

    net0: virtio=2E:C8:1F:29:75:BB,bridge=vmbr22
    net1: virtio=8A:C6:78:5E:D0:1B,bridge=vmbr22,tag=99
    net2: virtio=B2:D9:23:92:4C:D3,bridge=vmbr0,tag=66
    net3: virtio=1A:B9:D4:AA:7C:CE,bridge=vmbr22,tag=1003
    

    The nics for LAN, WAN, and CAM are connected to vmbr22, which is by default not connected to the network.
    The nic for ROUTERSYNC is connected to vmbr0, which is connected to the network at all times.

    So now what we have setup is a secondary setup exactly like the primary with it's 'duplicate' interfaces disconnected from the main network.

    Once you have that all configured, you can configure High Availability Sync on the primary. Do not setup pfsync. Setup XMLRPC Sync to sync to the secondary at 10.66.0.2. Sync everything except NAT, static routes, and virtual IPs.

    Now setup the Filer package on both nodes and enable XMLRPC Sync for Filer.

    On the primary, add /usr/local/bin/pve_id_rsa with 600 permissions in filer. The content should be the ssh private key for proxmox. It is usually in /root/ssh/id_rsa in the proxmox file system. You may need to convert it to regular rsa private key format.

    Add /usr/local/bin/monitor_backup_router.stop with 666 permissions in filer with content '0'.

    Add the following as /usr/local/bin/reload_interface.sh with permission 777 -

    #!/usr/local/bin/php-cgi -f
    <?php
    
    require_once("globals.inc");
    require_once("functions.inc");
    require_once("config.inc");
    require_once("util.inc");
    require_once("interfaces.inc");
    
    interface_configure($_GET["interface"]);
    
    
    ?>
    

    Now, add the final magical script, /usr/local/bin/monitor_backup_router.sh, to filer with 777 permissions -

    #!/bin/sh
    
    BOUNCES="vtnet1 vtnet0 vtnet3"
    # only need wan, it's the only dhcp
    INTERFACES="wan"
    
    up_all()
    {
      for i in $BOUNCES; do echo "ifconfig $i up" && ifconfig $i up; done
    }
    
    reload_all()
    {
      for i in $INTERFACES; do echo "/usr/local/bin/reload_interface.sh interface=$i" && /usr/local/bin/reload_interface.sh interface=$i; done
    }
    
    down_all()
    {
      for i in $BOUNCES; do echo "ifconfig $i down" && ifconfig $i down; done
    }
    
    activate_fw()
    {
      echo "activate_fw"
      ssh -i /usr/local/bin/pve_id_rsa root@10.66.0.243 <<EOF
    ifup patch11_22
    ifup patch11_0
    ifdown vmbr11
    EOF
    
    }
    
    deactivate_fw()
    {
      echo "deactivate_fw"
      ssh -i /usr/local/bin/pve_id_rsa root@10.66.0.243 <<EOF
    ifup patch11_22
    ifup patch11_0
    ifup vmbr11
    EOF
    
    }
    
    DO_EXIT="0"
    
    check_connect()
    {
      if [ $DO_EXIT -ne "0" ]
      then
        echo "DO_EXIT is not 0, exiting"
        exit 0
      fi
    
      ping -c1 10.66.0.2 > /dev/null
      if [ $? -ne 0 ]
      then
        echo "10.66.0.2 not up, exiting"
        exit 0
      fi
    
    
      ping -c1 10.66.0.243 > /dev/null
      if [ $? -ne 0 ]
      then
        echo "10.66.0.243 not up, exiting"
        exit 0
      fi
    
    
      FCONT=$(head -c 1 /usr/local/bin/monitor_backup_router.stop)
      if [ $FCONT -ne "0" ]
      then
        echo "First char of /usr/local/bin/monitor_backup_router.stop not 0, exiting"
        exit 0
      fi
    
    }
    
    send_done_email()
    {
      NOWSE=`date`
      sleep 120
      echo "router UP, deactivated backup router at $NOWSE" | /usr/local/bin/mail.php -s"router UP, deactivated backup router"
    }
    
    sigint_catch()
    {
      DO_EXIT="1"
    }
    
    trap "sigint_catch" 2
    
    check_connect
    activate_fw
    
    
    while :
    do
    
      check_connect
    
      ping -c1 10.66.0.1 > /dev/null
      if [ $? -eq 0 ]
      then
        sleep 10
        continue
      fi
    
      echo "Ping 10.66.0.1 FAIL."
    
      echo "Pinging 10.66.0.1"
      ping -c1 10.66.0.1
      if [ $? -eq 0 ]
      then
        echo "Ping 10.66.0.1 OK."
        sleep 10
        continue
      fi
    
      echo "Ping 10.66.0.1 FAIL."
    
      echo "10.66.0.1 DOWN, activating backup router"
    
      down_all
      deactivate_fw
      up_all
      sleep 5
      reload_all
    
    
      NOWASE=`date`
      echo "router DOWN, activated backup router at $NOWASE" | /usr/local/bin/mail.php -s"router DOWN, activated backup router"
    
      while :
      do
    
        echo "Checking first char of /usr/local/bin/monitor_backup_router.stop"
    
        if [ $DO_EXIT -eq "0" ]
        then
          FCONT=$(head -c 1 /usr/local/bin/monitor_backup_router.stop)
          if [ $FCONT -eq "0" ]
          then
    
            echo "First char of /usr/local/bin/monitor_backup_router.stop is 0."
    
            echo "Pinging 10.66.0.1"
            ping -c1 10.66.0.1
            if [ $? -ne 0 ]
            then
              echo "Ping 10.66.0.1 failed, keeping backup router active."
              continue
            fi
    
            echo "Ping 10.66.0.1 OK."
    
            echo "10.66.0.1 UP, deactivating backup router"
    
          else
    
            echo "First char of /usr/local/bin/monitor_backup_router.stop not 0, deactivating backup router"
    
          fi
        else
            echo "DO_EXIT is not 0, deactivating backup router"
        fi
    
    
        down_all
        activate_fw
        up_all
        sleep 5
        reload_all
    
        echo "Done deactivating backup router"
    
        send_done_email &
    
        break
    
    
      done
    
    
    done
    

    And finally, only on the secondary, install the cron package, and add the following task -

    *	*	*	*	*	root	"/usr/bin/lockf -t 0 /tmp/monitor_backup_router.lock /usr/local/bin/monitor_backup_router.sh 2>&1 | logger -i -t monitor_backup_router"
    

    And we are done.

    The /usr/local/bin/monitor_backup_router.sh will sit and monitor the primary node in the ROUTERSYNC vlan.

    When ping fails it will

    • stop all the duplicated interfaces (WAN, LAN, CAM)
    • activate the "connection" switch, vmbr11, on proxmox
    • start all the interfaces
      • this will cause a bunch of ARP's to go out and 'take over' the ip addresses
    • and reload dhcp on the WAN.
      • since the MAC address is the same, the cable modem will talk to the secondary.

    monitor_backup_router.sh will now sit and wait for ping's to succeed to the primary in the ROUTERSYNC vlan.

    When the ping succeeds it will

    • stop all the duplicated interfaces (WAN, LAN, CAM)
    • deactivate the "connection" switch, vmbr11, on proxmox
    • start all the interfaces
      • we need this because it seems like with xml sync running, various services don't like having the interfaces down
    • and reload dhcp on the WAN.
      • because the secondary can't talk to anything, this can take a while, but it will eventually finish without a IP
      • not sure if this is required, but I did it anyway to get everything back to 'normal'

    This actually works pretty well. It takes around 20 seconds or so to fail over, and less than 10 to go back to the primary.
    ARP seems to figure everything out fine.


Log in to reply