Default route lost when primary is restored
-
I have a CARP setup using local addresses on the WAN interfaces and a single public IP as the WAN CARP VIP. All is well on startup and on failover, but when the primary gets restored after failover, it no longer has a default route.
-
On startup: primary and secondary have a default route to my specified default gateway; clients have internet and all is well.
-
On failover: secondary only has a default route to the gateway; clients have internet and is well.
-
On restore: secondary only has a default route to the gateway; clients have no internets.
After restore, I can manually ssh in to the primary and
route add default {my gateway}
and everything is fine. I hear the devil whispering "cronjob" but that don't sit right with me.I've seen similar posts with this issue. The solution seems to be, make sure you have a NAT outbound rule from "this firewall" to the WAN CARP VIP. I got that. Issue persists. Any ideas? Thanks!
-
-
Your single address and gateway is configured as static on the interface?
There is no longer a base OS requirement that a CARP VIP be in the same subnet as the interface.
That does not mean that your configuration is a supported one for HA.
If it is worth HA it is worth doing right.
Get more addresses from your ISP.
-
Yes, the single public IP and gateway are configured as static.
My provider gave me a /30 network for 2Gbps fiber service. I can certainly go back and ask for a /29, but if anybody has any ideas I'd like to understand why the single IP is breaking my gateway on CARP restoration. Thanks again.
-
The public IP address has to be the CARP VIP (not the interface address) or HA has no chance of working.
What are the interface addresses configured on the WAN interfaces on both nodes? What are the gateways?
What is the WAN CARP VIP?
HA works really well when done correctly. When people try to "game" it, not so much.
You would have to:
Assign a private addressing scheme to the WAN interfaces.
Coerce the system to accept a default gateway outside that subnet (The ISP side of the /30). That generally involves checking the "Use non-local gateway" checkbox in the advanced settings of the gateway.
Set the CARP VIP on WAN to your side of the public /30)
Ensure that ALL traffic egressing WAN intended for the internet is outbound NAT to the CARP VIP address. (Note that this is where the process usually breaks down because this is impossible on the node that is currently CARP BACKUP which means it can't resolve DNS or anything without extreme creativity)
The HA code is not designed to work around this (because it is not possible). If it were me I would forgo HA in lieu of a cold spare and Auto-Config Backup or get a /29 and do it right.
-
Yes, my public IP is my CARP VIP. Everything is working with CARP enabled, and also when CARP fails over. It's only after the primary node transitions back to master that it goes awry and only because of the default route issue. If I put a shell script on the primary to check for a default route every few seconds and add it if one is missing, then everything is functionally perfect. But, of course, that's pretty lame.
WAN1 IP: 10.251.0.10
WAN2 IP: 10.251.0.11
WAN CARP VIP: 216.12.33.XMy default gateway on both nodes is the same and set up for the non-local gateway option. The NAT Address on my NAT Outbound rules is set for the VIP and not the WAN address.
I understand your point regarding three public IPs, as well as the limitation of the nodes not being able to access the internet while in BACKUP status. I'm not trying to half-ass this – but everything I want to achieve with this HA setup appears to be working great, except for the issue of the primary node losing its default route when it goes back to master.
-
I've worked thru this the past several days. here's how I worked thru it. My business cable modem supports NAT as well as Public IP's. so I assigned a private IP to the WAN port on each node with a default GW. Then created an additional GW (not default) to the public IP address of the cable modem. No need to NAT the firewall to the VIP, I tried this at first and it never worked. Just change NAT to manual and change the destination to the VIP.
One word of info, my public range is /29 and I initially created the outbound VIP as .219 and wanted to change it to .221. Changing the Firewall -> VIP from .219 to .221 wasn't completely successful. Actually had to change the individual NAT's to the new VIP.
Hopefully this helps you out.
-
Hi,
I have the same issue here. The hardware and the pfSense Release is on both boxes is the same.
2.4.3-RELEASE (amd64) built on Mon Mar 26 18:02:04 CDT 2018 FreeBSD 11.1-RELEASE-p7
Both boxes works with SuperMicro Boards which have two interfaces on board and an additional i350 4 Port network card. HA is on dedicated interfaces, directly connected without switch. All other interfaces are connected to a switch with untagged VLANs for every interface.
WAN Master and Slave - Switch VLAN WAN - ISP
LAN Master and Slave - Switch VLAN LAN - Internal net
DMZ Master and Slave - Switch VLAN DMZ - DMZ
GUEST Master and Slave - Switch VLAN Guest - Guest network
OPT Master and Slave - Switch VLAN OPT - currently not usedMaster
WAN Interface: Static IPv4 10.10.75.251/29
Gateway: x.x.x.17Slave
WAN Interface: Static IPv4 10.10.75.252/29
Gateway: x.x.x.17External IP
Currently there are 4 static external IPs configured as CARP VIP.
The "master" IP for outgoing traffic is x.x.x.20/29, VHID Group on both 20. The advertising frequency is on master Base = 1 and Skew = 0, on slave Base = 1 and Skew = 100.
The other IPs are for incoming traffic to some webservers and the mailrelay in DMZ.
NAT
There is on both machines a Outbound NAT: This Firewall, any source port, any destination, any destination port with NAT Address x.x.x.20
Additional Outbound NAT is configured for some machines, ports and the other CARP VIPs, i.e. outgoing mail is the IP of the MX record and so on.
There is no problem if I switch form master to slave. But back from slave to master the default gateway on master is missing. If I set it in the console or simple save it with a click in the GUI of the master WAN interface or System / Routing / Gateways / Edit without changing something, the default gateway is immediatley set.
If I can do something manually without any error, than it must be an error in automatic mode ;-)
I think, there must be something wrong in the config or in the scripts or simply a wrong state, which are prevent the setting of the default gatway. But where can I start to debug this?
Tom
-
WAN Interface: Static IPv4 10.10.75.251/29
Gateway: x.x.x.17Having your gateway not included in the interface subnet is an odd configuration.
Or is the interface really a /24 and you can only use that /29 out of it?
-
WAN Interface: Static IPv4 10.10.75.251/29
Gateway: x.x.x.17Having your gateway not included in the interface subnet is an odd configuration.
Or is the interface really a /24 and you can only use that /29 out of it?Sorry, I doesn't mentioned it!
The gateway is a public IP address, 62.x.x.17 and "use non local gateway" is set. Outbound NAT is also set.
I read all the threads here about this setup with version > 2.2 and someone mentioned, that the mask on the WAN interfaces should be the same as the public networks. I changed it to /24, master 10.10.75.251/24, slave 10.10.75.252/24 but there is no change. Master to Slave runs perfectly with only some lost packets, Slave to Master lets the default gateway missing on master. If I add it manually with route add default 62.x.x.17 all is up immediatly.
I have done some debugging on console:
a) console on master
- enter persistent CARP maintenance mode on MASTER
- failover to slave, all connections established
- default gw lost on master (netstat -r)
- leave persistent CARP maintenance mode on MASTER
- all interfaces and services "green"
- only default gw lost
- route add default 62.x.x.17
- all is up
b) console on master
- ifconfig ibg4 down (WAN interface)
- failover to slave, all connections established
- default gw present on master
- ifconfig ibg4 up
- go back to master as active
- all interfaces and services "green"
- only default gw lost
- route add default 62.x.x.17
- all is up
c) console on master
- sysctl net.inet.carp.demotion=250
- failover to slave, all connections established
- default gw present on master
- sysctl net.inet.carp.demotion=-250
- go back to master as active
- all interfaces and services "green"
- default gw present on master!!!
- all is up
I tried c) several times and pf always switches perfectly between master and slave
without lost of any connection.If I simulate a lost WAN interface with b) the default gw will be present. The default
gw not lost during failover, but when the Master takes over again.If I set the Master in maintenance mode a) , the default gw is lost immadiatley.
What are the differences between these scenaries, so that only c) function correctly?
Tom