Questions about CARP setup

maverickws

Hi all,

We are currently setting up a test lab with two (soon to be three) hypervisors, and a public subnet delivered in a VLAN.

pfSense has a /28 network to WAN, and using Virtual IP's and NAT 1:1 to provide access to DMZ, working as expected.
Then we have 4 local networks: LAN, DMZ, DATA and ADM.
We will be using ADM network for CARP SYNC as this network has little traffic.

Now I have two main questions, regarding WAN and DHCP:

DHCP: the pfSense is serving as DHCP server on LAN. How should the other pfSense routers be configured? DHCP enabled, disabled, relay?
Should I create a static mapping for the secondary pfSense routers or just put them in static ipv4 config?

WAN: On the /28 one of the IP's in this network is my gateway to the outside which theoretically is always online. Should the pfSense routers have different WAN addresses, or when one goes down the other can pickup that same WAN address?

Thanks

dotdash

You need to read the docs first:
https://docs.netgate.com/pfsense/en/latest/book/highavailability/index.html
Some comments:
I would create a separate local network for the SYNC traffic.
Make sure you configure your hypervisor correctly for CARP traffic.
The secondary box should be statically addressed. I've never tried having two secondary nodes (tertiary?) and I'm not even sure this is supported. I don't see why you'd need it.
You need a public IP for each node AND shared CARP IPs.

maverickws

Hi @dotdash , thank you a lot for your quick reply.

I have read the documentation :) but thanks for pointing it out still had this questions afterwards...
Regarding your comments, I already suspected the public ip's necessity just wanted to confirm. About some other things you mentioned, I would like to better understand some points:

I would create a separate local network for the SYNC traffic.

The ADM network has really little to no traffic, small payloads. Why wouldn't it be suitable for usage as SYNC network? What conditions may interfere with its traffic?

Make sure you configure your hypervisor correctly for CARP traffic.

CARP traffic isn't routed through the hypervisor. All hypervisors have an interface (without IP configuration) picking up a VLAN, and that interface corresponds to pfSense's WAN interface. No hypervisors are visible on this network. From my understanding, the hypervisor does not require any config for CARP to work in such scenario, but what configuration should I consider?

I've never tried having two secondary nodes (tertiary?) and I'm not even sure this is supported. I don't see why you'd need it.

When I mentioned nodes I meant hypervisor nodes but that does lead to a tertiary pfsense node. We were thinking of setting three pfSense routers (one per hypervisor), and of testing a scenario of disabling two hypervisors at a time and test failover. We assumed this would be easily setup/scaled as documentation states:

Though often erroneously called a “CARP Cluster”, two or more redundant pfSense firewalls are more aptly titled a “High Availability Cluster” (...)

The most common High Availability cluster configuration includes only two nodes. It is possible to have more nodes in a cluster, but they do not provide a significant advantage.

From this we assumed adding more pfsense nodes and different topologies would be attainable. Are you suggesting this is not supported/hard to achieve or not tested?

Thank you!

dotdash

You can use an existing network for sync traffic, but it's easy to create an isolated vlan for it.
ESXI, for example, requires some tweaking to the virtual switch for use with CARP. That's why I mentioned it. Not sure about others.
I have never personally tried running a three node cluster. If the docs say it's possible, then you should be fine. I would bet that it doesn't get a lot of testing though.

maverickws

Right. I see putting a dedicated vlan for sync traffic is a solution regarding privacy and security, but considering the network that will be used is already a secured network with few clients and all accounted for in their functions, it does seem adequate.
However we may in the future consider different configurations.

What kind of tweaking of which settings? May come useful to know :) about the three node cluster as I mentioned our approach to this setup is testing different scenarios. Although the docs say one thing, for example, I'm curious about how it would work with a three or four node setup on some features like dhcp server, since there's only space for one failover IP. Unless they can be comma or space separated. Or would we configure pfsense1 with failover ip pfsense2, and pfsense2 with that of pfsense3?

dotdash

@maverickws said in Questions about CARP setup:

What kind of tweaking of which settings?

See the 'Hypervisor Users' section here:
https://docs.netgate.com/pfsense/en/latest/highavailability/troubleshooting-high-availability-clusters.html

SteveITS

re: 3 pfSense in HA...that got asked a few weeks ago (https://forum.netgate.com/topic/155682/ha-for-three-or-more-devices) with no responses. I see it in the docs, as you pointed out, but I don't how how it would be accomplished since the HA settings allow to specify one IP address to sync to. Maybe you could have the second node sync its config to the third but I don't know how all three would sync states.

JeGr

HA with more than 2 devices isn't really that great or a good idea. The only implementation that is working at all is a daisy-chain-like setup, in which you configure the secondary node like the primary node but for the third one. For example you would not setup pfsync with a peer IP and setup XMLRPC sync to sync from primary to secondary AND from secondary to third node. Also those few HA services that support running some sort of active-active mode ('ish), aren't made for running on more than two nodes (DHCP).

That's why it isn't that amazing idea to set up, as you would literally daisy-chain a configuration from node 1->2->3
As there are no really big things beside 1-2 smaller services like DHCP, DNS or NTP that could actually run on every node without being stopped on a non-master-node and the FreeBSD pf/CARP implementation doesn't really have active/active setups in mind, adding a third node doesn't appeal from an availability or security standpoint. I'd say you'd be better of using a potential third node as a cold spare with pfSense installed and some sort of console/mgmt interface set up so you can fire it up quickly and restore a config backup on it rather than install it as a tertiary node in a daisy chain ring.

maverickws

Hi @JeGr thank you a lot for your input. You made some very interesting observations, and in fact as @dotdash and @teamits mentioned a configuration with over 2 nodes isn't friendly.

I did made the setup with two nodes, something is failing.
I read about the issue @dotdash mentioned but it seems very specific to VMWare's ESX virtualisation. I tried to find the same issue relating xenserver pfsense and carp and didn't find that solution applied to Xen.

So right now I have the following config:

Public subnet: 1.2.3.100/28

pfSense CARP WAN VIP:     1.2.3.100/28
pfSense1 WAN:             1.2.3.101/28
pfSense2 WAN:             1.2.3.102/28

pfSense CARP LAN VIP:     172.16.1.254/24
pfSense1 LAN:             172.16.1.1/24
pfSense2 LAN:             172.16.1.2/24

pfSense CARP SYNC VIP:     172.16.254.254/24
pfSense1 SYNC:             172.16.254.1/24
pfSense2 SYNC:             172.16.254.2/24

pfSense CARP IP's for 1:1 NAT:
1.2.3.105
1.2.3.106
1.2.3.107
1.2.3.108 (etc)

I have enabled High Availability Sync: both pfsync and XMLRPC sync.
Sync appears to be working perfectly, except one detail I noticed it does sync authentication servers but on the second pfsense the authentication server selected was still local database and had to change this manually.
We configured every interface accordingly, the dhcp server to use the CARP LAN IP on DNS and Gateway, set failover ip;
Changed to manual outbound nat, changed the rules to use WAN CARP VIP instead of interface;
Added extra NAT rules to overcome this issue mentioned on the documentation: https://docs.netgate.com/pfsense/en/latest/highavailability/troubleshooting-vpn-connectivity-to-a-high-availability-secondary-node.html
We're using site-to-site IPSec VPN, changed the tunnel configuration to use interface WAN CARP IP:
VPN site-to-site is working: I dial 1.2.3.100 and connection is established, not encountering any issues on VPN traffic;
NAT 1:1 OK - I can access all servers using the Virtual IP configured for CARP.
Each CARP IP has its own ID: the /28 subnet have the ID matching the last octet, and the private addresses have VHID's matching the third octet. - There are no overlapping VHID's.
All CARP IP's appear as Master on the primary and Backup on the secondary.
Rules for SYNC interface are allow SYNC Net to any.

Following an article I found I also performed the additional configurations (after HA failed in the first tests):

System > Advanced > Firewall & NAT
- Enable NAT Reflection for 1:1 NAT: checked
- Enable automatic outbound NAT for Reflection

However, despite all sync seeming correct, when I halt the first system the secondary quickly changes from BACKUP to MASTER. However, the VPN stays down, and traffic doesn't reach the servers.

For example, this was a PING I was running to the public IP of a web server. When the primary is master I can access the site without issues, and ping it. But when the secondary becomes master, nothing works.

64 bytes from 1.2.3.105: icmp_seq=39 ttl=47 time=52.099 ms
64 bytes from 1.2.3.105: icmp_seq=40 ttl=47 time=51.661 ms
Request timeout for icmp_seq 41
Request timeout for icmp_seq 42
Request timeout for icmp_seq 43
Request timeout for icmp_seq 44
Request timeout for icmp_seq 45
Request timeout for icmp_seq 46
Request timeout for icmp_seq 47
Request timeout for icmp_seq 48
Request timeout for icmp_seq 49
Request timeout for icmp_seq 50
Request timeout for icmp_seq 51
Request timeout for icmp_seq 52
Request timeout for icmp_seq 53
Request timeout for icmp_seq 54
Request timeout for icmp_seq 55
Request timeout for icmp_seq 56
Request timeout for icmp_seq 57
Request timeout for icmp_seq 58
Request timeout for icmp_seq 59
Request timeout for icmp_seq 60
Request timeout for icmp_seq 61
Request timeout for icmp_seq 62
Request timeout for icmp_seq 63
Request timeout for icmp_seq 64
Request timeout for icmp_seq 65
Request timeout for icmp_seq 66
Request timeout for icmp_seq 67
Request timeout for icmp_seq 68
Request timeout for icmp_seq 69
Request timeout for icmp_seq 70
Request timeout for icmp_seq 71
Request timeout for icmp_seq 72
Request timeout for icmp_seq 73
Request timeout for icmp_seq 74
Request timeout for icmp_seq 75
Request timeout for icmp_seq 76
Request timeout for icmp_seq 77
Request timeout for icmp_seq 78
Request timeout for icmp_seq 79
Request timeout for icmp_seq 80
Request timeout for icmp_seq 81
Request timeout for icmp_seq 82
Request timeout for icmp_seq 83
Request timeout for icmp_seq 84
Request timeout for icmp_seq 85
Request timeout for icmp_seq 86
Request timeout for icmp_seq 87
Request timeout for icmp_seq 88
Request timeout for icmp_seq 89
Request timeout for icmp_seq 90
Request timeout for icmp_seq 91
Request timeout for icmp_seq 92
Request timeout for icmp_seq 93
Request timeout for icmp_seq 94
Request timeout for icmp_seq 95
Request timeout for icmp_seq 96
Request timeout for icmp_seq 97
Request timeout for icmp_seq 98
Request timeout for icmp_seq 99
Request timeout for icmp_seq 100
Request timeout for icmp_seq 101
Request timeout for icmp_seq 102
Request timeout for icmp_seq 103
Request timeout for icmp_seq 104
Request timeout for icmp_seq 105
Request timeout for icmp_seq 106
Request timeout for icmp_seq 107
Request timeout for icmp_seq 108
Request timeout for icmp_seq 109
Request timeout for icmp_seq 110
Request timeout for icmp_seq 111
Request timeout for icmp_seq 112
Request timeout for icmp_seq 113
Request timeout for icmp_seq 114
Request timeout for icmp_seq 115
Request timeout for icmp_seq 116
Request timeout for icmp_seq 117
Request timeout for icmp_seq 118
Request timeout for icmp_seq 119
Request timeout for icmp_seq 120
Request timeout for icmp_seq 121
Request timeout for icmp_seq 122
Request timeout for icmp_seq 123
Request timeout for icmp_seq 124
Request timeout for icmp_seq 125
Request timeout for icmp_seq 126
Request timeout for icmp_seq 127
Request timeout for icmp_seq 128
Request timeout for icmp_seq 129
Request timeout for icmp_seq 130
Request timeout for icmp_seq 131
Request timeout for icmp_seq 132
Request timeout for icmp_seq 133
Request timeout for icmp_seq 134
Request timeout for icmp_seq 135
Request timeout for icmp_seq 136
Request timeout for icmp_seq 137
Request timeout for icmp_seq 138
Request timeout for icmp_seq 139
64 bytes from 1.2.3.105: icmp_seq=140 ttl=47 time=52.780 ms
64 bytes from 1.2.3.105: icmp_seq=141 ttl=47 time=54.253 ms
64 bytes from 1.2.3.105: icmp_seq=142 ttl=47 time=51.698 ms

SteveITS

You don't need a VIP for the sync interface. Each router just needs the other one's IP set in the HA pfsync settings.

maverickws

Hi @teamits hehe well actually I do because the SYNC network also has a few other clients behind it that require the VIP such as LAN.
Basically as I mentioned on the above posts, I chose an already existing network for SYNC that has two other clients beside the pfSense machines. This is a secured network and these are administration machines with restricted access and little traffic.
The documentation recommends a separate network, as I see it, for two factors:

network availability and load
privacy and security (as passwords aren't really encrypted)

Since the chosen network complies with these requirements, it is a very restrict network with very low traffic, this network was used and hence the interface used for sync has a carp vip.
Anyway, all configurations: HA, Interfaces and DHCP server etc have the peer IP directly where it belongs, not the CARP VIP.
I expect this interface to work alike the other interfaces (LAN/DMZ7DATA) etc.