HA failover Issue with CARP switching all interfaces to backup when just one connection fails.
-
HA fail-over Issue. When any one of our internet connections drop all interfaces fail-over to the backup and then switches back. Why even switch when the internet connection has also failed on the backup as well. Any idea why is this happening? Was this a known issue with 22.01? All advanced gateway settings are still set to the defaults for all gateways. Any thoughts would be appreciated.
Thanks!
JimRunning 22.01-RELEASE (amd64) on XG 1538
Aug 22 11:19:00 sshguard 17300 Now monitoring attacks.
Aug 22 11:19:00 sshguard 52663 Exiting on signal.
Aug 22 11:14:22 kernel carp: demoted by -240 to 0 (pfsync bulk
Aug 22 11:08:00 sshguard 52663 Now monitoring attacks.
Aug 22 11:08:00 sshguard 14862 Exiting on signal.
Aug 22 11:07:03 check_reload_status 20671 Starting packages
Aug 22 11:07:03 check_reload_status 20671 Reloading filter
Aug 22 11:07:02 check_reload_status 20671 rc.newwanip starting ovpns8
Aug 22 11:07:02 kernel ovpns8: link state changed to UP
Aug 22 11:07:02 check_reload_status 20671 Reloading filter
Aug 22 11:07:01 php 43661 notify_monitor.php: Message sent to xxx@xxx.com OK
Aug 22 11:07:00 check_reload_status 20671 Starting packages
Aug 22 11:07:01 kernel ovpns8: link state changed to DOWN
Aug 22 11:06:59 check_reload_status 20671 Starting packages
Aug 22 11:06:59 check_reload_status 20671 Starting packages
Aug 22 11:06:59 check_reload_status 20671 Starting packages
Aug 22 11:06:59 check_reload_status 20671 Starting packages
Aug 22 11:06:59 check_reload_status 20671 rc.newwanip starting ovpns8
Aug 22 11:06:59 kernel ovpns8: link state changed to UP
Aug 22 11:06:58 check_reload_status 20671 rc.newwanip starting ovpns7
Aug 22 11:06:58 kernel ovpns7: link state changed to UP
Aug 22 11:06:58 check_reload_status 20671 rc.newwanip starting ovpnc5
Aug 22 11:06:58 kernel ovpnc5: link state changed to UP
Aug 22 11:06:58 check_reload_status 20671 rc.newwanip starting ovpnc4
Aug 22 11:06:58 check_reload_status 20671 rc.newwanip starting ovpns6
Aug 22 11:06:57 check_reload_status 20671 Reloading filter
Aug 22 11:06:58 kernel ovpns8: link state changed to DOWN
Aug 22 11:06:58 kernel ovpns6: link state changed to UP
Aug 22 11:06:58 kernel ovpnc4: link state changed to UP
Aug 22 11:06:55 check_reload_status 20671 Carp master event
Aug 22 11:06:55 kernel carp: 196@ix0: BACKUP -> MASTER (master timed out)
Aug 22 11:06:55 kernel carp: 116@ixl3: BACKUP -> MASTER (master timed out)
Aug 22 11:06:55 check_reload_status 20671 Carp master event
Aug 22 11:06:54 kernel carp: 247@ixl0: BACKUP -> MASTER (master timed out)
Aug 22 11:06:54 check_reload_status 20671 Carp master event
Aug 22 11:06:54 kernel ovpns7: link state changed to DOWN
Aug 22 11:06:54 kernel ovpnc5: link state changed to DOWN
Aug 22 11:06:54 check_reload_status 20671 Reloading filter
Aug 22 11:06:54 kernel ovpns6: link state changed to DOWN
Aug 22 11:06:54 check_reload_status 20671 Reloading filter
Aug 22 11:06:54 kernel ovpnc4: link state changed to DOWN
Aug 22 11:06:52 check_reload_status 20671 Carp backup event
Aug 22 11:06:52 kernel carp: 116@ixl3: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:06:52 kernel carp: 196@ix0: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:06:52 check_reload_status 20671 Carp backup event
Aug 22 11:06:51 check_reload_status 20671 Carp backup event
Aug 22 11:06:51 kernel carp: 247@ixl0: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:06:51 kernel carp: 247@ixl0: BACKUP -> MASTER (master timed out)
Aug 22 11:06:51 check_reload_status 20671 Carp master event
Aug 22 11:06:47 check_reload_status 20671 Reloading filter
Aug 22 11:06:47 check_reload_status 20671 Restarting OpenVPN tunnels/interfaces
Aug 22 11:06:47 check_reload_status 20671 Restarting IPsec tunnels
Aug 22 11:06:47 check_reload_status 20671 updating dyndns WAN_ATT2_GW
Aug 22 11:06:47 rc.gateway_alarm 90277 >>> Gateway alarm: WAN_ATT2_GW (Addr:1.1.1.1 Alarm:0 RTT:13.720ms RTTsd:.347ms Loss:5%)
Aug 22 11:05:51 php 43661 notify_monitor.php: Message sent to xxxx.com OK
Aug 22 11:05:48 kernel carp: 247@ixl0: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:05:48 check_reload_status 20671 Carp backup event
Aug 22 11:05:31 php 43661 notify_monitor.php: Message sent to xxx.com OK
Aug 22 11:05:28 check_reload_status 20671 Carp master event
Aug 22 11:05:28 kernel carp: 1@ix1: BACKUP -> MASTER (master timed out)
Aug 22 11:05:24 check_reload_status 20671 Carp backup event
Aug 22 11:05:24 kernel carp: 1@ix1: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:05:23 check_reload_status 20671 Carp master event
Aug 22 11:05:24 kernel carp: 1@ix1: BACKUP -> MASTER (master timed out)
Aug 22 11:05:22 check_reload_status 20671 Starting packages
Aug 22 11:05:22 check_reload_status 20671 Reloading filter
Aug 22 11:05:21 check_reload_status 20671 rc.newwanip starting ovpns8
Aug 22 11:05:21 check_reload_status 20671 Reloading filter
Aug 22 11:05:21 kernel ovpns8: link state changed to UP
Aug 22 11:05:21 kernel arp: x.x.x.x moved from 00:00:5e:00:01:f7 to 00:e0:ed:e3:5f:ac on ixl0
Aug 22 11:05:21 kernel carp: 247@ixl0: BACKUP -> MASTER (master timed out)
Aug 22 11:05:21 check_reload_status 20671 Carp master event
Aug 22 11:05:21 php 43661 notify_monitor.php: Message sent to xxx@xxx.com OK
Aug 22 11:05:20 check_reload_status 20671 Reloading filter
Aug 22 11:05:20 check_reload_status 20671 Carp backup event
Aug 22 11:05:20 kernel carp: 1@ix1: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:05:19 check_reload_status 20671 Starting packages
Aug 22 11:05:20 kernel ovpns8: link state changed to DOWN
Aug 22 11:05:19 check_reload_status 20671 Starting packages
Aug 22 11:05:18 check_reload_status 20671 rc.newwanip starting ovpns8
Aug 22 11:05:18 kernel ovpns8: link state changed to UP
Aug 22 11:05:18 kernel ovpns8: link state changed to DOWN
Aug 22 11:05:18 check_reload_status 20671 Starting packages
Aug 22 11:05:18 check_reload_status 20671 rc.newwanip starting ovpns7
Aug 22 11:05:18 check_reload_status 20671 Carp backup event
Aug 22 11:05:18 check_reload_status 20671 Carp master event
Aug 22 11:05:16 check_reload_status 20671 Starting packages
Aug 22 11:05:16 check_reload_status 20671 Starting packages
Aug 22 11:05:18 kernel carp: 247@ixl0: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:05:18 kernel carp: 247@ixl0: BACKUP -> MASTER (master timed out)
Aug 22 11:05:18 kernel ovpns7: link state changed to UP
Aug 22 11:05:16 check_reload_status 20671 rc.newwanip starting ovpnc5
Aug 22 11:05:16 kernel ovpnc5: link state changed to UP
Aug 22 11:05:15 kernel ovpns7: link state changed to DOWN
Aug 22 11:05:15 check_reload_status 20671 Starting packages
Aug 22 11:05:15 kernel ovpnc5: link state changed to DOWN
Aug 22 11:05:15 check_reload_status 20671 rc.newwanip starting ovpns6
Aug 22 11:05:15 kernel ovpns6: link state changed to UP
Aug 22 11:05:15 check_reload_status 20671 Starting packages
Aug 22 11:05:15 check_reload_status 20671 rc.newwanip starting ovpnc4
Aug 22 11:05:15 kernel ovpnc4: link state changed to UP
Aug 22 11:05:15 kernel ovpns6: link state changed to DOWN
Aug 22 11:05:15 check_reload_status 20671 Starting packages
Aug 22 11:05:15 kernel ovpnc4: link state changed to DOWN
Aug 22 11:05:15 check_reload_status 20671 Starting packages
Aug 22 11:05:15 check_reload_status 20671 Reloading filter
Aug 22 11:05:15 kernel carp: demoted by 240 to 240 (pfsync bulk start)
Aug 22 11:05:15 check_reload_status 20671 Carp backup event
Aug 22 11:05:15 kernel carp: 247@ixl0: INIT -> BACKUP (initialization complete)
Aug 22 11:05:15 kernel carp: 247@ixl0: BACKUP -> INIT (hardware interface up)
Aug 22 11:05:15 check_reload_status 20671 Carp backup event
Aug 22 11:05:15 php-fpm 70897 /rc.carpmaster: HA cluster member "(x.x.x.x@ix1): (LAN)" has resumed CARP state "MASTER" for vhid 1
Aug 22 11:05:14 check_reload_status 20671 rc.newwanip starting ovpns7
Aug 22 11:05:14 kernel ovpns7: link state changed to UP
Aug 22 11:05:14 check_reload_status 20671 rc.newwanip starting ovpnc5
Aug 22 11:05:14 kernel ovpnc5: link state changed to UP
Aug 22 11:05:14 kernel ovpns7: link state changed to DOWN
Aug 22 11:05:14 kernel ovpnc5: link state changed to DOWN
Aug 22 11:05:14 check_reload_status 20671 rc.newwanip starting ovpns6
Aug 22 11:05:14 kernel ovpns6: link state changed to UP
Aug 22 11:05:14 check_reload_status 20671 rc.newwanip starting ovpnc4
Aug 22 11:05:14 kernel ovpnc4: link state changed to UP
Aug 22 11:05:14 check_reload_status 20671 Carp master event
Aug 22 11:05:14 check_reload_status 20671 Carp master event
Aug 22 11:05:14 kernel carp: 116@ixl3: BACKUP -> MASTER (preempting a slower master)
Aug 22 11:05:14 kernel carp: 196@ix0: BACKUP -> MASTER (preempting a slower master)
Aug 22 11:05:14 kernel carp: 1@ix1: BACKUP -> MASTER (preempting a slower master)
Aug 22 11:05:14 check_reload_status 20671 Carp master event
Aug 22 11:05:14 check_reload_status 20671 rc.newwanip starting ixl0
Aug 22 11:05:13 check_reload_status 20671 Reloading filter
Aug 22 11:05:13 kernel ovpns6: link state changed to DOWN
Aug 22 11:05:13 check_reload_status 20671 Reloading filter
Aug 22 11:05:13 kernel ovpnc4: link state changed to DOWN
Aug 22 11:05:13 php-fpm 70897 /rc.carpbackup: HA cluster member "(x.x.x.x@ix1): (LAN)" has resumed CARP state "BACKUP" for vhid 1
Aug 22 11:05:13 check_reload_status 20671 Linkup starting ixl0
Aug 22 11:05:13 kernel ixl0: link state changed to UP
Aug 22 11:05:13 kernel carp: demoted by -240 to 0 (interface up)
Aug 22 11:05:13 kernel carp: 247@ixl0: INIT -> BACKUP (initialization complete)
Aug 22 11:05:13 kernel ixl0: Link is up, 10 Gbps Full Duplex, Requested FEC: None, Negotiated FEC: None, Autoneg: False, Flow Control: None
Aug 22 11:05:13 check_reload_status 20671 Carp backup event
Aug 22 11:05:12 check_reload_status 20671 Carp backup event
Aug 22 11:05:12 check_reload_status 20671 Carp backup event
Aug 22 11:05:12 kernel carp: 196@ix0: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:05:12 kernel carp: 116@ixl3: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:05:12 kernel carp: 1@ix1: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:05:12 check_reload_status 20671 Carp backup event
Aug 22 11:04:29 check_reload_status 20671 Reloading filter
Aug 22 11:04:29 check_reload_status 20671 Restarting OpenVPN tunnels/interfaces
Aug 22 11:04:29 check_reload_status 20671 Restarting IPsec tunnels
Aug 22 11:04:29 check_reload_status 20671 updating dyndns WAN_ATT2_GW
Aug 22 11:04:29 rc.gateway_alarm 99002 >>> Gateway alarm: WAN_ATT2_GW (Addr:1.1.1.1 Alarm:1 RTT:13.698ms RTTsd:.808ms Loss:22%)
Aug 22 11:04:19 php-fpm 70897 /rc.start_packages: Skipping STARTing packages process because previous/another instance is already running
Aug 22 11:04:19 php 43661 notify_monitor.php: Message sent to xxx@xxx.com OK
Aug 22 11:04:19 check_reload_status 20671 Starting packages
Aug 22 11:04:19 check_reload_status 20671 Starting packages
Aug 22 11:04:18 check_reload_status 20671 Starting packages
Aug 22 11:04:18 check_reload_status 20671 Starting packages
Aug 22 11:04:18 php-fpm 70897 /rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - -> x.x.x.x - Restarting packages.
Aug 22 11:04:18 check_reload_status 20671 Reloading filter
Aug 22 11:04:18 php-fpm 70897 /rc.newwanip: rc.newwanip called with empty interface.
Aug 22 11:04:18 php-fpm 70897 /rc.newwanip: rc.newwanip: on (IP address: x.x.x.x) (interface: []) (real interface: ovpns6).
Aug 22 11:04:18 php-fpm 70897 /rc.newwanip: rc.newwanip: Info: starting on ovpns6.
Aug 22 11:04:18 check_reload_status 20671 Starting packages
Aug 22 11:04:18 check_reload_status 20671 rc.newwanip starting ovpns8
Aug 22 11:04:18 kernel ovpns8: link state changed to UP
Aug 22 11:04:18 check_reload_status 20671 rc.newwanip starting ovpns7
Aug 22 11:04:18 kernel ovpns7: link state changed to UP
Aug 22 11:04:17 check_reload_status 20671 rc.newwanip starting ovpnc5
Aug 22 11:04:17 kernel ovpnc5: link state changed to UP
Aug 22 11:04:17 check_reload_status 20671 rc.newwanip starting ovpns6
Aug 22 11:04:17 check_reload_status 20671 Reloading filter
Aug 22 11:04:17 kernel ovpns6: link state changed to UP
Aug 22 11:04:17 check_reload_status 20671 rc.newwanip starting ovpnc4
Aug 22 11:04:17 check_reload_status 20671 Reloading filter
Aug 22 11:04:17 kernel ovpnc4: link state changed to UP
Aug 22 11:04:16 check_reload_status 20671 Carp master event
Aug 22 11:04:16 check_reload_status 20671 Carp master event
Aug 22 11:04:16 check_reload_status 20671 Carp master event
Aug 22 11:04:16 kernel carp: 116@ixl3: BACKUP -> MASTER (master timed out)
Aug 22 11:04:16 kernel carp: 1@ix1: BACKUP -> MASTER (master timed out)
Aug 22 11:04:16 kernel carp: 196@ix0: BACKUP -> MASTER (master timed out)
Aug 22 11:04:13 check_reload_status 20671 Carp backup event
Aug 22 11:04:13 check_reload_status 20671 Carp backup event
Aug 22 11:04:13 kernel carp: 196@ix0: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:04:13 kernel carp: 1@ix1: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:04:13 kernel carp: 116@ixl3: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:04:13 check_reload_status 20671 Carp backup event
Aug 22 11:04:13 check_reload_status 20671 Carp master event
Aug 22 11:04:13 check_reload_status 20671 Carp master event
Aug 22 11:04:13 kernel carp: 116@ixl3: BACKUP -> MASTER (master timed out)
Aug 22 11:04:13 kernel carp: 1@ix1: BACKUP -> MASTER (master timed out)
Aug 22 11:04:13 kernel carp: 196@ix0: BACKUP -> MASTER (master timed out)
Aug 22 11:04:13 check_reload_status 20671 Carp master event
Aug 22 11:04:13 kernel ovpns8: link state changed to DOWN
Aug 22 11:04:09 kernel ovpns7: link state changed to DOWN
Aug 22 11:04:09 kernel ovpnc5: link state changed to DOWN
Aug 22 11:04:09 php 43661 notify_monitor.php: Message sent to xxx@xxx.com OK
Aug 22 11:04:09 kernel ovpns6: link state changed to DOWN
Aug 22 11:04:09 check_reload_status 20671 Reloading filter
Aug 22 11:04:08 kernel ovpnc4: link state changed to DOWN
Aug 22 11:04:08 check_reload_status 20671 Reloading filter
Aug 22 11:04:07 check_reload_status 20671 Carp backup event
Aug 22 11:04:07 check_reload_status 20671 Carp backup event
Aug 22 11:04:07 check_reload_status 20671 Carp backup event
Aug 22 11:04:07 check_reload_status 20671 Linkup starting ixl0
Aug 22 11:04:07 kernel carp: 196@ix0: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:04:07 kernel carp: 1@ix1: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:04:07 kernel carp: 116@ixl3: MASTER -> BACKUP (more frequent advertisement received)
Aug 22 11:04:07 kernel ixl0: link state changed to DOWN
Aug 22 11:04:07 kernel carp: demoted by 240 to 240 (interface down)
Aug 22 11:04:07 kernel carp: 247@ixl0: MASTER -> INIT (hardware interface down)
Aug 22 11:04:07 check_reload_status 20671 Carp backup event -
This sounds like a configuration issue, CARP should not be activating just because the internet gateway itself went down.
Can you give some more info about your setup?
CARP has to communicate between the firewalls on both the SYNC interface AND the interfaces that have CARP VIPs on them, this means with a WAN side you need to have each firewall with it's own actual IP on the WAN and then the CARP VIP for the WAN side and these need to be on a switch so that each firewall knows that their connection is still up.
What does your CARP and HA configuration look like?
-
By default CARP will do preemption on link loss meaning it cuts over all VIPs at once.
For most people this is what they want, because a link loss usually means an interface failed, not just a loss of WAN connectivity.
It's not typical to lose link for the majority of cases when a WAN fails. If you're testing that manually, a better test is to unplug one level up higher (e.g. upstream fiber link, coax cable, uplink port, etc) on the CPE, not the NIC itself.
You could set
net.inet.carp.preempt=0
as a tunable to not have it behave that way, but then if something like the primary LAN NIC failed you'd have to manually cut it over, it wouldn't be seamless. -
@jimp Hey Jim, Thank you for your response. Is there a tunable that would prevent preemptive fail-over if the interface on the secondary is also down?
I watched the High Availability on pfSense 2.4 Hangout video and it looks like things are working as described. The preemptive fail-over makes sense but not if the interface on the secondary had also failed. In our case switching and then switching back causes a 2 or 3 minute outage for our users. It would be better to just switch gateways. FYI, we have about 150 active users using Zoom phone, MS Teams, Teamviewer, VPN and general internet access.
Thanks again.
James Ronald -
Not explicitly, no, but if it's just one interface it may still be OK.
Preemption and so on based on interface failure is primarily controlled by things like
net.inet.carp.ifdown_demotion_factor
but the value is the same on both so in theory both should end up demoted by the same amount for an interface failure, so the primary node may still win an election for master status naturally.I haven't tried that, though. If that doesn't work as it is, then it likely isn't feasible.
-
@jimp Thank you!