Hacked up HA setup for home
-
I managed to setup a sort-of HA for my home without CARP and with a single dynamic IPv4 and IPv6 from Comcast. I mostly needed it because I mess around with my main server a bunch and my family was getting mad when I killed the internet.
It is not instant fail-over, it take around 15-30 secs, but it does work.
Hardware Setup -
- A managed switch with proper vlans setup for all of the below.
- Main server with single connection to managed switch in trunk mode.
- Mine is a beefy server with a 40g connection to the switch.
- How you run pfsense doesn't really matter, but mine is running in a VM with SR-IOV virtual NICs.
- Backup server with a single connection to managed switch in trunk mode.
- Mine is a small NUC with a 1g connection to the switch.
- Latest Proxmox with open vswitch.
- pfsense in a VM with regular virtual NICs.
- Cable modem or whatever connected to managed switch on vlan 99.
I have 4 vlans, but you can use only 3 if needed (remove CAM).
VLAN's and their interface assignments and ip addresses -
99 - WAN - dhcp or whatever
66 - ROUTERSYNC - primary 10.66.0.1, secondary 10.66.0.2
none - LAN - 10.0.0.1
1003 - CAM - 10.whateverOn each node, setup those vlans and interfaces. They need to be configured exactly the same on both like you normally would, except the IPs in ROUTERSYNC.
For comcast, the cable modem will only talk to a single MAC address, so make the mac address the same on both nodes for the WAN. This will work fine, see why later. Do this only for the WAN, others should all have different MACs. ARP will be our friend.
The main node is setup like normal for your regular internet connection. Nothing really fancy there.
All the fancy stuff is on the secondary.Virtual NIC's are required on secondary. You can use pfsense vlans for all except ROUTERSYNC. That needs to be a separate virtual NIC.
If you want IPv6 to work, setup RA on the secondary exactly like the primary. Do not use ipv6 in the ROUTERSYNC vlan.
One other thing you probably want to setup is haproxy or something on the primary with a back-end of the web interface on 10.66.0.2 so that you can access the secondary web interface. The secondary will not be able to talk to your LAN by default because it will have an active, but disconnected, interface on your LAN network.
In proxmox, here is my /etc/network/interfaces
source /etc/network/interfaces.d/* auto lo iface lo inet loopback auto enp1s0 iface enp1s0 inet manual ovs_type OVSPort ovs_bridge vmbr0 auto mgmt iface mgmt inet static address 10.0.0.243/24 gateway 10.0.0.1 ovs_type OVSIntPort ovs_bridge vmbr0 auto patch22_11 iface patch22_11 inet manual ovs_bridge vmbr22 ovs_type OVSPatchPort ovs_patch_peer patch11_22 auto mgmt66 iface mgmt66 inet static address 10.66.0.243/24 ovs_type OVSIntPort ovs_bridge vmbr0 ovs_options tag=66 auto vmbr0 iface vmbr0 inet manual ovs_type OVSBridge ovs_ports enp1s0 mgmt mgmt66 auto vmbr22 iface vmbr22 inet manual ovs_type OVSBridge iface vmbr11 inet manual ovs_type OVSBridge iface patch11_22 inet manual ovs_bridge vmbr11 ovs_type OVSPatchPort ovs_patch_peer patch22_11 iface patch11_0 inet manual ovs_bridge vmbr11 ovs_type OVSPatchPort ovs_patch_peer patch0_11 auto patch0_11 iface patch0_11 inet manual ovs_bridge vmbr0 ovs_type OVSPatchPort ovs_patch_peer patch11_0
What you can see is basically 3 virtual switches connected with patch ports.
- The main switch, vmbr0, is connected to the real NIC, enp1s0.
- The virtual switch in the middle, vmbr11, and it's patch ports do not auto start.
- The final virtual switch, vmbr22, is thus not connected to the network unless we start vmbr11 and it's patch ports.
- Proxmox is connected to the network via vmbr0 and has IP's in both the LAN and ROUTERSYNC vlans.
Here is the network definitions for my pfsense vm on the secondary.
net0: virtio=2E:C8:1F:29:75:BB,bridge=vmbr22 net1: virtio=8A:C6:78:5E:D0:1B,bridge=vmbr22,tag=99 net2: virtio=B2:D9:23:92:4C:D3,bridge=vmbr0,tag=66 net3: virtio=1A:B9:D4:AA:7C:CE,bridge=vmbr22,tag=1003
The nics for LAN, WAN, and CAM are connected to vmbr22, which is by default not connected to the network.
The nic for ROUTERSYNC is connected to vmbr0, which is connected to the network at all times.So now what we have setup is a secondary setup exactly like the primary with it's 'duplicate' interfaces disconnected from the main network.
Once you have that all configured, you can configure High Availability Sync on the primary. Do not setup pfsync. Setup XMLRPC Sync to sync to the secondary at 10.66.0.2. Sync everything except NAT, static routes, and virtual IPs.
Now setup the Filer package on both nodes and enable XMLRPC Sync for Filer.
On the primary, add /usr/local/bin/pve_id_rsa with 600 permissions in filer. The content should be the ssh private key for proxmox. It is usually in /root/ssh/id_rsa in the proxmox file system. You may need to convert it to regular rsa private key format.
Add /usr/local/bin/monitor_backup_router.stop with 666 permissions in filer with content '0'.
Add the following as /usr/local/bin/reload_interface.sh with permission 777 -
#!/usr/local/bin/php-cgi -f <?php require_once("globals.inc"); require_once("functions.inc"); require_once("config.inc"); require_once("util.inc"); require_once("interfaces.inc"); interface_configure($_GET["interface"]); ?>
Now, add the final magical script, /usr/local/bin/monitor_backup_router.sh, to filer with 777 permissions -
#!/bin/sh BOUNCES="vtnet1 vtnet0 vtnet3" # only need wan, it's the only dhcp INTERFACES="wan" up_all() { for i in $BOUNCES; do echo "ifconfig $i up" && ifconfig $i up; done } reload_all() { for i in $INTERFACES; do echo "/usr/local/bin/reload_interface.sh interface=$i" && /usr/local/bin/reload_interface.sh interface=$i; done } down_all() { for i in $BOUNCES; do echo "ifconfig $i down" && ifconfig $i down; done } activate_fw() { echo "activate_fw" ssh -i /usr/local/bin/pve_id_rsa root@10.66.0.243 <<EOF ifup patch11_22 ifup patch11_0 ifdown vmbr11 EOF } deactivate_fw() { echo "deactivate_fw" ssh -i /usr/local/bin/pve_id_rsa root@10.66.0.243 <<EOF ifup patch11_22 ifup patch11_0 ifup vmbr11 EOF } DO_EXIT="0" check_connect() { if [ $DO_EXIT -ne "0" ] then echo "DO_EXIT is not 0, exiting" exit 0 fi ping -c1 10.66.0.2 > /dev/null if [ $? -ne 0 ] then echo "10.66.0.2 not up, exiting" exit 0 fi ping -c1 10.66.0.243 > /dev/null if [ $? -ne 0 ] then echo "10.66.0.243 not up, exiting" exit 0 fi FCONT=$(head -c 1 /usr/local/bin/monitor_backup_router.stop) if [ $FCONT -ne "0" ] then echo "First char of /usr/local/bin/monitor_backup_router.stop not 0, exiting" exit 0 fi } send_done_email() { NOWSE=`date` sleep 120 echo "router UP, deactivated backup router at $NOWSE" | /usr/local/bin/mail.php -s"router UP, deactivated backup router" } sigint_catch() { DO_EXIT="1" } trap "sigint_catch" 2 check_connect activate_fw while : do check_connect ping -c1 10.66.0.1 > /dev/null if [ $? -eq 0 ] then sleep 10 continue fi echo "Ping 10.66.0.1 FAIL." echo "Pinging 10.66.0.1" ping -c1 10.66.0.1 if [ $? -eq 0 ] then echo "Ping 10.66.0.1 OK." sleep 10 continue fi echo "Ping 10.66.0.1 FAIL." echo "10.66.0.1 DOWN, activating backup router" down_all deactivate_fw up_all sleep 5 reload_all NOWASE=`date` echo "router DOWN, activated backup router at $NOWASE" | /usr/local/bin/mail.php -s"router DOWN, activated backup router" while : do echo "Checking first char of /usr/local/bin/monitor_backup_router.stop" if [ $DO_EXIT -eq "0" ] then FCONT=$(head -c 1 /usr/local/bin/monitor_backup_router.stop) if [ $FCONT -eq "0" ] then echo "First char of /usr/local/bin/monitor_backup_router.stop is 0." echo "Pinging 10.66.0.1" ping -c1 10.66.0.1 if [ $? -ne 0 ] then echo "Ping 10.66.0.1 failed, keeping backup router active." continue fi echo "Ping 10.66.0.1 OK." echo "10.66.0.1 UP, deactivating backup router" else echo "First char of /usr/local/bin/monitor_backup_router.stop not 0, deactivating backup router" fi else echo "DO_EXIT is not 0, deactivating backup router" fi down_all activate_fw up_all sleep 5 reload_all echo "Done deactivating backup router" send_done_email & break done done
And finally, only on the secondary, install the cron package, and add the following task -
* * * * * root "/usr/bin/lockf -t 0 /tmp/monitor_backup_router.lock /usr/local/bin/monitor_backup_router.sh 2>&1 | logger -i -t monitor_backup_router"
And we are done.
The /usr/local/bin/monitor_backup_router.sh will sit and monitor the primary node in the ROUTERSYNC vlan.
When ping fails it will
- stop all the duplicated interfaces (WAN, LAN, CAM)
- activate the "connection" switch, vmbr11, on proxmox
- start all the interfaces
- this will cause a bunch of ARP's to go out and 'take over' the ip addresses
- and reload dhcp on the WAN.
- since the MAC address is the same, the cable modem will talk to the secondary.
monitor_backup_router.sh will now sit and wait for ping's to succeed to the primary in the ROUTERSYNC vlan.
When the ping succeeds it will
- stop all the duplicated interfaces (WAN, LAN, CAM)
- deactivate the "connection" switch, vmbr11, on proxmox
- start all the interfaces
- we need this because it seems like with xml sync running, various services don't like having the interfaces down
- and reload dhcp on the WAN.
- because the secondary can't talk to anything, this can take a while, but it will eventually finish without a IP
- not sure if this is required, but I did it anyway to get everything back to 'normal'
This actually works pretty well. It takes around 20 seconds or so to fail over, and less than 10 to go back to the primary.
ARP seems to figure everything out fine.