STP and network

Derelict

Almost all CARP issues must be corrected in the switch. ISPs in particular do stupid stuff.

For instance a co-worker is trying to get CARP working on a cable modem service with a static /29. When the secondary ARPs for the CARP VIP address, it (properly) gets the CARP MAC from the primary AND gets an ARP reply from the ISP device containing the primary's interface MAC address. Thus, broken HA/CARP. We can't see it but I assume it is also proxy-ARPing upstream to the ISP gateway which will break HA too.

That has to be corrected by the ISP and they have been unable to understand the issue much less fix it.

Just an example.

fireix

It is correct that I don't need any routing? I mean, internally, LAN-side, I will have my LAN CARP VIP .1 (/24) as gw (4.4.4.1) on all computers.

Since the pfSense has WAN CARP VIP on the /29 from my ISP (8.8.8.1) and I have verified that pfSense have internet, the LAN traffic should find the 8.8.8.1 on WAN-side "magically" (as long as the rule is allow for the IP and port/direction).

Derelict

You need to:

Make sure your outbound NAT is set to use the CARP VIP

Make sure your inside clients are set to use the CARP VIP for services on the firewall such as default gateway, DNS services, etc.

fireix

I'm setting up the CARP VIP (on LAN-side) with the same GW it had/has before when my ISP/Catalyst was managing my GW. So I shouldn't need to change any of the clients.

I will continue to use the providers DNS to avoid changing to much stuff (I assume I can still do that…). So this shouldn't be needed to change either.

But the outbound NAT sounds important: I should choose Outbound NAT, then choose "LAN"-interface I assume. LAN-interface is the joined LAG of two ports. Then, in Address, the VIP for the the WAN VIP should be possible to select in the dropdown? And that's it?

During this setup, I want to allow all outgoing traffic from LAN, so I will let the rest be set to ANY.

Derelict

When you run HA you have to make sure outbound NAT states are created on the CARP VIP not the interface address. Else you will experience dropped connections on failover because WAN address on the primary node is different that WAN address on the secondary node.

https://doc.pfsense.org/index.php/Configuring_pfSense_Hardware_Redundancy_(CARP)

Derelict

LAN interface is different. You have to make sure that all of your LAN clients are given the LAN CARP address as their default gateway, DNS server (if applicable) etc.

Bottom line is you can't expect HA to just work. It does work fine but it requires additional configuration for things that are otherwise automatic. Such as outbound NAT, DHCP server attributes, etc.

fireix

OK, this is a bit complex for me, but I'll do this in two steps. I can't change to the new /29 withouth downtime (because my ISP needs to unconnect the current network), so I have to make sure it works without CARP first so that I don't have to many things that can go wrong at the same time.

First step is the get rid of the transparent bridge and introduce the transport network /29.

So I will configure one pfSense WAN with IP 8.8.8.2/29 (link/transit/transport-network). LAN-interface is configured on 4.4.4.2/24, and I add a VIP on LAN-side configured with 4.4.4.1/24 (just to become more familiar with VIPs).

Test that server on LAN with GW set to 4.4.4.1 works (Maybe the auto-created one will work out of the box (before I introduce CARP) in this simple setup?

fireix

Could I do a kind of realistic test out of this before the actual going live?

Let's say I setup a pfSense in a closed environment, not connected to anything. I have one computer directly connected to the WAN-port with the computer having the IP 196.44.198.33 (/29-net) and no gateway-setting. This will kind of simulate my ISPs transit-network. I will then set the WAN interface on pfsense to be 196.44.198.34 (also /29-net), with that computer connected on WAN as GW.

On the LAN-side, I specify my current network, let's set it to be 4.4.4.2 (on /24) - I also create a VIP 4.4.4.1 that will serve as local gateway… I connect another computer, with IP 4.4.4.4 and specify 4.4.4.1 as default gateway. Now, I should only have to manage the outgoing NAT - Choose "WAN"-interface and choose the local VIP/GW under Address (and allow any on firewall) in order to ping 196.44.198.33. I understand that cluster requires a bit more, but baby steps are the way to go to understand this. Then I can test and basically do all the mistakes on my own ;) I'm very ready to test this, so please let me know as soon as possible if this could work!

fireix

Ok, I made it!

I didn't have to set up any NAT at all and I can ping from a computer on WAN side - and from LAN to WAN :)

Now, I have "faked" my ISP by letting a computer have an IP on the transport network. But I shouldn't actually need to change anything? Just use my ISPs IP as gw on pfsense WAN and I should be ready!

I don't see how DNS can be a problem either, I will continue to use my ISPs dns-servers and they are outside my network. As long as the Ip to their dns is allowed out, I shouldn't need to reconfigure any client computers after this change :)

Even LAG worked out of the box (I had to use active-passive since I didn't have any test LACP switch available).

fireix

But… The big question.. How do I do LACP from pfSense to both switches so that I get the setup I want. I have the two swiches that are stacked. But I want to have one cable from pfsenseLAG to SW1 and one cable to SW2. From what I understoon in this thread, I should be able to configure a LACP across both SW1 and SW2 now. So far, I have only found a way to do LACP on each of them at a time. I can of course switch fast between the two switches, but I'm missing a way to choose Port 47 on SW1 and Port48 on SW2 should be in same LACP.

"You would then put your 2 switches in a stack and setup a lacp lagg from pfsense to the switch stack with ports going to different switches in the stack."

So it is basically this I want to do.

Derelict

So stack them and do that. Your switches have to be truly stackable (or support something like multi-chassis trunking), not some fake manage-all-as-one-switch marketing term stack.

Brocade ICX-6430:

lag Management dynamic id 81
ports ethernet 1/1/14 ethernet 2/1/14
primary-port 1/1/14
deploy
port-name NAS_LAGG0 ethernet 1/1/14
port-name NAS_LAGG1 ethernet 2/1/14
!

Switch>sh lag id 81
Total number of LAGs: 2
Total number of deployed LAGs: 2
Total number of trunks created:2 (27 available)
LACP System Priority / ID: 1 / cc4e.24b3.68b8
LACP Long timeout: 90, default: 90
LACP Short timeout: 3, default: 3

=== LAG "Management" ID 81 (dynamic Deployed) ===
LAG Configuration:
Ports: e 1/1/14 e 2/1/14
Port Count: 2
Primary Port: 1/1/14
Trunk Type: hash-based
LACP Key: 20081
Deployment: HW Trunk ID 1
Port Link State Dupl Speed Trunk Tag Pvid Pri MAC Name
1/1/14 Up Forward Full 1G 81 No 81 0 cc4e.24b3.68c5 NAS_LAGG0
2/1/14 Up Forward Full 1G 81 No 81 0 cc4e.24b3.68c5 NAS_LAGG1

Port [Sys P] [Port P] [ Key ] [Act][Tio][Agg][Syn][Col][Dis][Def][Exp][Ope]
1/1/14 1 1 20081 Yes L Agg Syn Col Dis No No Ope
2/1/14 1 1 20081 Yes L Agg Syn Col Dis No No Ope

Partner Info and PDU Statistics
Port Partner Partner LACP LACP
System MAC Key Rx Count Tx Count
1/1/14 0cc4.7a47.7be2 203 2575780 2602883
2/1/14 0cc4.7a47.7be2 203 2575772 2602882

Switch>sh stack
T=905d23h3m21.8: alone: standalone, D: dynamic cfg, S: static
ID Type Role Mac Address Pri State Comment
1 S ICX6430-24 active cc4e.24b3.68b8 128 local Ready
2 S ICX6430-24 standby cc4e.24b3.6978 0 remote Ready

active standby
+–-+ +---+
=2/3| 1 |2/1==2/3| 2 |2/1=
| +---+ +---+ |
| |
|------------------------|
Standby u2 - protocols ready, can failover
Current stack management MAC is cc4e.24b3.68b8

fireix

I have stacked them - I think. They have a 10 Gbit fiber cable between them.

I can also access all the switches just with one IP thanks to the stacking. In the interface, I can choose between sw1, sw2 (that originally had each their IP). But I can't edit ports across both (choose one port from switch 1 and one from switch 2). From what I have found online, this is a feature called Cross-Stack and now I'm beginning to think I migth have purchased 4 switches that do stacking, but not this cross-stacking. This can also just be a cisco-word for it…

This is the unit:
http://us.dlink.com/us/en/business-solutions/switching/smart-switches/smartpro/dgs-1510-52x-52-port-gigabit-smartpro-switch.html

I see that they have something called Physical stack (I have activated that). The switch shows a LED-number on each to show what number in the stack they are. In addition, I have also activated SIM I think that has to do with the shared IP. Maybe I have activated both a vitual method and the real thing at the same time ;) I hope it is just that.

Ok, I see I have to dive into documentation and see. Maybe I have overlook a setting or that it only done in cli.

Derelict

What switches do you have?

fireix

DGS-1510-52X

stack.png_thumb

fireix

At least there is a function called "Mirror" and I see I can choose ports on each of the stacked switches.. Not exactly what I'm looking for, but shows there are some integration..

mirror.png_thumb

fireix

I found the solution! Just had to create LACP with ID1 and only one member port. Then switch to next switch, create LACP and use ID1 on that as well :)

fireix

"Yes. And you need to adjust Outbound NAT so it NATs to the CARP VIP not to the interface addresses (for networks that might require NAT, that is)."

Since I didn't need to setup any NAT to get this working in non-carp mode apparently, I suspect I don't have to adjust anything in my scenario with carp either. Sounds like this would only complicate things. Remember that I use public static IPs on my LAN-side due to my type of webservers.

Derelict

The inside addresses just need to be routed to the CARP VIP and not to one of the interface addresses.

fireix

Should I define a LACP-lag on the switch for each server or isn't that needed at all?

And as far I can tell, it works without LACP and I can remove one link and traffic still happens (but maybe switches get confused? Even though it says switch independent). But if it can creates weird situations, I wouldn't want to keep it that way.

From the Wiki for Centos, it seems like mode 4=802.3ad is the switch dependent mode on Linux: https://wiki.centos.org/TipsAndTricks/BondingInterfaces - but it will only have one active connection at a time.

But don't know what is best to choose in my case.

Derelict

@fireix:

Should I define a LACP-lag on the switch for each server or isn't that needed at all?

I would. What happens when an entire switch fails. Understand that the LACP is Layer 2 redundancy, not layer 3. For layer 3 redundancy you don't need the LACP at all.

And as far I can tell, it works without LACP and I can remove one link and traffic still happens (but maybe switches get confused? Even though it says switch independent). But if it can creates weird situations, I wouldn't want to keep it that way.

When you remove what link?

From the Wiki for Centos, it seems like mode 4=802.3ad is the switch dependent mode on Linux: https://wiki.centos.org/TipsAndTricks/BondingInterfaces - but it will only have one active connection at a time.

Each side of an LACP link generally has a method of deciding what traffic it sends over what link. A combination of MAC address, IP address, and sometimes even port.

But don't know what is best to choose in my case.

Nor do I really.