Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Hacked up HA setup for home

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    1 Posts 1 Posters 297 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      statop
      last edited by statop

      I managed to setup a sort-of HA for my home without CARP and with a single dynamic IPv4 and IPv6 from Comcast. I mostly needed it because I mess around with my main server a bunch and my family was getting mad when I killed the internet.

      It is not instant fail-over, it take around 15-30 secs, but it does work.

      Hardware Setup -

      • A managed switch with proper vlans setup for all of the below.
      • Main server with single connection to managed switch in trunk mode.
        • Mine is a beefy server with a 40g connection to the switch.
        • How you run pfsense doesn't really matter, but mine is running in a VM with SR-IOV virtual NICs.
      • Backup server with a single connection to managed switch in trunk mode.
        • Mine is a small NUC with a 1g connection to the switch.
        • Latest Proxmox with open vswitch.
        • pfsense in a VM with regular virtual NICs.
      • Cable modem or whatever connected to managed switch on vlan 99.

      I have 4 vlans, but you can use only 3 if needed (remove CAM).

      VLAN's and their interface assignments and ip addresses -
      99 - WAN - dhcp or whatever
      66 - ROUTERSYNC - primary 10.66.0.1, secondary 10.66.0.2
      none - LAN - 10.0.0.1
      1003 - CAM - 10.whatever

      On each node, setup those vlans and interfaces. They need to be configured exactly the same on both like you normally would, except the IPs in ROUTERSYNC.

      For comcast, the cable modem will only talk to a single MAC address, so make the mac address the same on both nodes for the WAN. This will work fine, see why later. Do this only for the WAN, others should all have different MACs. ARP will be our friend.

      The main node is setup like normal for your regular internet connection. Nothing really fancy there.
      All the fancy stuff is on the secondary.

      Virtual NIC's are required on secondary. You can use pfsense vlans for all except ROUTERSYNC. That needs to be a separate virtual NIC.

      If you want IPv6 to work, setup RA on the secondary exactly like the primary. Do not use ipv6 in the ROUTERSYNC vlan.

      One other thing you probably want to setup is haproxy or something on the primary with a back-end of the web interface on 10.66.0.2 so that you can access the secondary web interface. The secondary will not be able to talk to your LAN by default because it will have an active, but disconnected, interface on your LAN network.

      In proxmox, here is my /etc/network/interfaces

      source /etc/network/interfaces.d/*
      
      auto lo
      iface lo inet loopback
      
      auto enp1s0
      iface enp1s0 inet manual
              ovs_type OVSPort
              ovs_bridge vmbr0
      
      auto mgmt
      iface mgmt inet static
              address 10.0.0.243/24
              gateway 10.0.0.1
              ovs_type OVSIntPort
              ovs_bridge vmbr0
      
      auto patch22_11
      iface patch22_11 inet manual
              ovs_bridge vmbr22
              ovs_type OVSPatchPort
              ovs_patch_peer patch11_22
      
      auto mgmt66
      iface mgmt66 inet static
              address 10.66.0.243/24
              ovs_type OVSIntPort
              ovs_bridge vmbr0
              ovs_options tag=66
      
      auto vmbr0
      iface vmbr0 inet manual
              ovs_type OVSBridge
              ovs_ports enp1s0 mgmt mgmt66
      
      auto vmbr22
      iface vmbr22 inet manual
              ovs_type OVSBridge
      
      iface vmbr11 inet manual
              ovs_type OVSBridge
      
      iface patch11_22 inet manual
              ovs_bridge vmbr11
              ovs_type OVSPatchPort
              ovs_patch_peer patch22_11
      
      iface patch11_0 inet manual
              ovs_bridge vmbr11
              ovs_type OVSPatchPort
              ovs_patch_peer patch0_11
      
      auto patch0_11
      iface patch0_11 inet manual
              ovs_bridge vmbr0
              ovs_type OVSPatchPort
              ovs_patch_peer patch11_0
      

      What you can see is basically 3 virtual switches connected with patch ports.

      • The main switch, vmbr0, is connected to the real NIC, enp1s0.
      • The virtual switch in the middle, vmbr11, and it's patch ports do not auto start.
      • The final virtual switch, vmbr22, is thus not connected to the network unless we start vmbr11 and it's patch ports.
      • Proxmox is connected to the network via vmbr0 and has IP's in both the LAN and ROUTERSYNC vlans.

      Here is the network definitions for my pfsense vm on the secondary.

      net0: virtio=2E:C8:1F:29:75:BB,bridge=vmbr22
      net1: virtio=8A:C6:78:5E:D0:1B,bridge=vmbr22,tag=99
      net2: virtio=B2:D9:23:92:4C:D3,bridge=vmbr0,tag=66
      net3: virtio=1A:B9:D4:AA:7C:CE,bridge=vmbr22,tag=1003
      

      The nics for LAN, WAN, and CAM are connected to vmbr22, which is by default not connected to the network.
      The nic for ROUTERSYNC is connected to vmbr0, which is connected to the network at all times.

      So now what we have setup is a secondary setup exactly like the primary with it's 'duplicate' interfaces disconnected from the main network.

      Once you have that all configured, you can configure High Availability Sync on the primary. Do not setup pfsync. Setup XMLRPC Sync to sync to the secondary at 10.66.0.2. Sync everything except NAT, static routes, and virtual IPs.

      Now setup the Filer package on both nodes and enable XMLRPC Sync for Filer.

      On the primary, add /usr/local/bin/pve_id_rsa with 600 permissions in filer. The content should be the ssh private key for proxmox. It is usually in /root/ssh/id_rsa in the proxmox file system. You may need to convert it to regular rsa private key format.

      Add /usr/local/bin/monitor_backup_router.stop with 666 permissions in filer with content '0'.

      Add the following as /usr/local/bin/reload_interface.sh with permission 777 -

      #!/usr/local/bin/php-cgi -f
      <?php
      
      require_once("globals.inc");
      require_once("functions.inc");
      require_once("config.inc");
      require_once("util.inc");
      require_once("interfaces.inc");
      
      interface_configure($_GET["interface"]);
      
      
      ?>
      

      Now, add the final magical script, /usr/local/bin/monitor_backup_router.sh, to filer with 777 permissions -

      #!/bin/sh
      
      BOUNCES="vtnet1 vtnet0 vtnet3"
      # only need wan, it's the only dhcp
      INTERFACES="wan"
      
      up_all()
      {
        for i in $BOUNCES; do echo "ifconfig $i up" && ifconfig $i up; done
      }
      
      reload_all()
      {
        for i in $INTERFACES; do echo "/usr/local/bin/reload_interface.sh interface=$i" && /usr/local/bin/reload_interface.sh interface=$i; done
      }
      
      down_all()
      {
        for i in $BOUNCES; do echo "ifconfig $i down" && ifconfig $i down; done
      }
      
      activate_fw()
      {
        echo "activate_fw"
        ssh -i /usr/local/bin/pve_id_rsa root@10.66.0.243 <<EOF
      ifup patch11_22
      ifup patch11_0
      ifdown vmbr11
      EOF
      
      }
      
      deactivate_fw()
      {
        echo "deactivate_fw"
        ssh -i /usr/local/bin/pve_id_rsa root@10.66.0.243 <<EOF
      ifup patch11_22
      ifup patch11_0
      ifup vmbr11
      EOF
      
      }
      
      DO_EXIT="0"
      
      check_connect()
      {
        if [ $DO_EXIT -ne "0" ]
        then
          echo "DO_EXIT is not 0, exiting"
          exit 0
        fi
      
        ping -c1 10.66.0.2 > /dev/null
        if [ $? -ne 0 ]
        then
          echo "10.66.0.2 not up, exiting"
          exit 0
        fi
      
      
        ping -c1 10.66.0.243 > /dev/null
        if [ $? -ne 0 ]
        then
          echo "10.66.0.243 not up, exiting"
          exit 0
        fi
      
      
        FCONT=$(head -c 1 /usr/local/bin/monitor_backup_router.stop)
        if [ $FCONT -ne "0" ]
        then
          echo "First char of /usr/local/bin/monitor_backup_router.stop not 0, exiting"
          exit 0
        fi
      
      }
      
      send_done_email()
      {
        NOWSE=`date`
        sleep 120
        echo "router UP, deactivated backup router at $NOWSE" | /usr/local/bin/mail.php -s"router UP, deactivated backup router"
      }
      
      sigint_catch()
      {
        DO_EXIT="1"
      }
      
      trap "sigint_catch" 2
      
      check_connect
      activate_fw
      
      
      while :
      do
      
        check_connect
      
        ping -c1 10.66.0.1 > /dev/null
        if [ $? -eq 0 ]
        then
          sleep 10
          continue
        fi
      
        echo "Ping 10.66.0.1 FAIL."
      
        echo "Pinging 10.66.0.1"
        ping -c1 10.66.0.1
        if [ $? -eq 0 ]
        then
          echo "Ping 10.66.0.1 OK."
          sleep 10
          continue
        fi
      
        echo "Ping 10.66.0.1 FAIL."
      
        echo "10.66.0.1 DOWN, activating backup router"
      
        down_all
        deactivate_fw
        up_all
        sleep 5
        reload_all
      
      
        NOWASE=`date`
        echo "router DOWN, activated backup router at $NOWASE" | /usr/local/bin/mail.php -s"router DOWN, activated backup router"
      
        while :
        do
      
          echo "Checking first char of /usr/local/bin/monitor_backup_router.stop"
      
          if [ $DO_EXIT -eq "0" ]
          then
            FCONT=$(head -c 1 /usr/local/bin/monitor_backup_router.stop)
            if [ $FCONT -eq "0" ]
            then
      
              echo "First char of /usr/local/bin/monitor_backup_router.stop is 0."
      
              echo "Pinging 10.66.0.1"
              ping -c1 10.66.0.1
              if [ $? -ne 0 ]
              then
                echo "Ping 10.66.0.1 failed, keeping backup router active."
                continue
              fi
      
              echo "Ping 10.66.0.1 OK."
      
              echo "10.66.0.1 UP, deactivating backup router"
      
            else
      
              echo "First char of /usr/local/bin/monitor_backup_router.stop not 0, deactivating backup router"
      
            fi
          else
              echo "DO_EXIT is not 0, deactivating backup router"
          fi
      
      
          down_all
          activate_fw
          up_all
          sleep 5
          reload_all
      
          echo "Done deactivating backup router"
      
          send_done_email &
      
          break
      
      
        done
      
      
      done
      

      And finally, only on the secondary, install the cron package, and add the following task -

      *	*	*	*	*	root	"/usr/bin/lockf -t 0 /tmp/monitor_backup_router.lock /usr/local/bin/monitor_backup_router.sh 2>&1 | logger -i -t monitor_backup_router"
      

      And we are done.

      The /usr/local/bin/monitor_backup_router.sh will sit and monitor the primary node in the ROUTERSYNC vlan.

      When ping fails it will

      • stop all the duplicated interfaces (WAN, LAN, CAM)
      • activate the "connection" switch, vmbr11, on proxmox
      • start all the interfaces
        • this will cause a bunch of ARP's to go out and 'take over' the ip addresses
      • and reload dhcp on the WAN.
        • since the MAC address is the same, the cable modem will talk to the secondary.

      monitor_backup_router.sh will now sit and wait for ping's to succeed to the primary in the ROUTERSYNC vlan.

      When the ping succeeds it will

      • stop all the duplicated interfaces (WAN, LAN, CAM)
      • deactivate the "connection" switch, vmbr11, on proxmox
      • start all the interfaces
        • we need this because it seems like with xml sync running, various services don't like having the interfaces down
      • and reload dhcp on the WAN.
        • because the secondary can't talk to anything, this can take a while, but it will eventually finish without a IP
        • not sure if this is required, but I did it anyway to get everything back to 'normal'

      This actually works pretty well. It takes around 20 seconds or so to fail over, and less than 10 to go back to the primary.
      ARP seems to figure everything out fine.

      1 Reply Last reply Reply Quote 0
      • First post
        Last post
      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.