Dropped inter-VLAN connections to backup CARP node and backup CARP node cannot reach internet through primary
-
I've recently installed a second core switch and second pfSense router and have configured CARP. Things work just fine - when any of the LAN interfaces go down on the primary, the backup node takes the load. I've configured devd with some scripting to keep the WAN interfaces down on the backup node and bring them up whatever node is the primary. Everything is working in that regard. Additionally, sync is working 100% as well.
Note: When I say "side" below, I'm referring to the pfSense router with the opposing CARP status (e.g. if the device on the LAN is configured with gateway as CARP primary pf01, trying to connect to the CARP backup pf02, or vice-versa).
Right now there are two issues:
- When doing inter-VLAN connections to the backup node (such as an SSH connection), the connection eventually times out. It doesn't matter if I set the LAN device's gateway to the primary or the backup CARP node, inter-VLAN connections to the opposite side eventually drop. Here's what I've narrowed down so far:
- This only "breaks" when devices on the opposite side and a different VLAN.
- This isn't due to a CARP failover.
- Core switches that are directly connected to both pfSense routers also experience this issue; don't think it's specifically a switching issue. - Many others have had issues with this (1, 2, 3, likely others), but I'm also unable to connect to the internet from the backup node (e.g. to check for updates). I've read these posts, read the documentation, but my setup is a bit different in that instead of using CARP on the WAN side, I'm juggling WAN IPs from DHCP and managing the interface states using devd. I've tried several different things including NAT rules but nothing seems to translate traffic from the backup node through the primary node.
Here's a network diagram for illustration:
Some more info about my setup:
- Two unmanaged switches (each pfSense router is connected to each WAN) are being used to connect both pfSense routers to both WANs.
- CARP is enabled and devd (/etc/devd/carp.conf) is being used to bring the WAN interfaces up and call dhclient on them (ifconfig igb0 up && ifconfig igb1 up && dhclient igb0 && dhclient igb1) when the node is the primary or bring the WAN interfaces down and remove the IP addresses from them (ifconfig igb0 down -alias && ifconfig igb1 down -alias) when the node is the backup. pf02 is MAC spoofing pf01 so they appear the same to the DHCP server. This is tested and working to juggle the WAN IPs on CARP events.
- MLAG is enabled on the Mikrotik switches, which allows LACP bonds across multiple switches. This is tested and working, and all combinations of cables being unplugged (except the last one) allow devices to remain connected to each other.
- All pfSense interface IPs are .2 or .3 except for one (MIK). For ease of reference, just ignore that one.
Sample test cases and results:
- Device on same VLAN, same side (IP 192.168.20.75, GW 192.168.20.2) to pf01 (192.168.20.2) connects and stays connected
- Device on same VLAN, diff side (IP 192.168.20.75, GW 192.168.20.2) to pf02 (192.168.20.3) connects and stays connected
- Device on diff VLAN, same side (IP 192.168.20.75, GW 192.168.20.2) to pf01 (192.168.30.2) connects and stays connected
- Device on diff VLAN, diff side (IP 192.168.20.75, GW 192.168.20.2) to pf02 (192.168.30.3) connects, but drops after a seemingly random amount of time
I know that configuring devices with the interface IP isn't standard and that I should point the gateway to the VIP, but for documentation and testing purposes it helps to be specific.
Hoping someone can provide some help with this. I'm even willing to post a small $20 bounty for this, as it's been driving me NUTS for a week.
Thank you,
NobleKangaroo - When doing inter-VLAN connections to the backup node (such as an SSH connection), the connection eventually times out. It doesn't matter if I set the LAN device's gateway to the primary or the backup CARP node, inter-VLAN connections to the opposite side eventually drop. Here's what I've narrowed down so far:
-
Quick update.
I've upgraded to a /29 block of static IPs from my primary ISP, assigned an IP to each of the pfSense servers, and created a VIP for the WAN interfaces. That's working fine - the VIP switches over to the backup CARP node as intended and as a side effect, this "resolves" the issue of the backup CARP node being unable to connect to the internet. I quoted this as I would've preferred a zero-cost solution as opposed to setting up static IPs.
However, inter-VLAN connections between CARP nodes are still dropping. I'm still tinkering with this.
-
@NobleKangaroo said in Dropped inter-VLAN connections to backup CARP node and backup CARP node cannot reach internet through primary:
When doing inter-VLAN connections to the backup node (such as an SSH connection), the connection eventually times out. It doesn't matter if I set the LAN device's gateway to the primary or the backup CARP node, inter-VLAN connections to the opposite side eventually drop.
Since you should have configured an IP on each node in each VLAN, there is basically no need to access the secondary with an IP in another network segment.
If you do this however, the request packets have to pass the primary, but the secondary is sending responses directly to the client. So this ends up into an asymmetric routing and in dropping connections.If you want to access it still this way, you can masquerade the traffic as explained in the docs: Troubleshooting VPN Connectivity to a High Availability Secondary Node