Can't enter CARP Maintenance Mode
-
Hi
I detect a little problem on master server in cluster. Version 2.3.2 and 2.3.2-1 (after update).ย When I press "Enter persistent CARP maintenance mode" on master server, master role not moved to secondary server. Button changed to "Leave Persistent CARP Maintenance Mode", but all interfaces stay in MASTER mode on master server (and BACKUP on backup server). In general log (on both servers) I not see any events about transfer roles.
But role successfully moved between servers when "Temporarily Disable CARP", or when master server down or reboot.CARP traffic walk between servers - I check it with tcpdump.
Is it a cluster problem, or webGUI? How I can check it?
-
That can happen if your secondary has demoted itself for some reason (e.g. interface with a CARP VIP enabled is unplugged).
Putting the primary into maintenance mode only tells it to use a higher skew. If the secondary isn't taking over, it's because it's transmitting even slower for some reason or another.
Make sure the status is actually "master" on all VIPs on the primary and "backup" on all VIPs on the secondary. If any are in "init" state for example, then it won't work properly as the system showing a VIP in init or blank state would (correctly) believe it has a problem.
-
That can happen if your secondary has demoted itself for some reason (e.g. interface with a CARP VIP enabled is unplugged).
Secondary server get master status and work normally when master shut down or "Temporary disable CARP". So, all interfaces plugged, configured, function.
@jimp:Putting the primary into maintenance mode only tells it to use a higher skew. If the secondary isn't taking over, it's because it's transmitting even slower for some reason or another.
How I can check this skew level? Before and after "Enter maintenance mode"?
I don't see any event in logs. In "normal" cluster I see messages like this (in general system log) about all interfaces:
Oct 28 10:41:18 kernel carp: VHID 1@em1: MASTER -> BACKUP (more frequent advertisement received)
Oct 28 10:41:20 php-fpm 11504 /rc.carpbackup: HA cluster member "(192.168.112.1@em1): (LAN)" has resumed CARP state "BACKUP" for vhid 1on master and like this
Oct 28 10:41:19 php-fpm 60770 /rc.carpmaster: HA cluster member "(192.168.112.1@em1): (LAN)" has resumed CARP state "MASTER" for vhid 1
on secondary
On problem cluster there is no any messages about
@jimp:Make sure the status is actually "master" on all VIPs on the primary and "backup" on all VIPs on the secondary. If any are in "init" state for example, then it won't work properly as the system showing a VIP in init or blank state would (correctly) believe it has a problem.
Yes, all VIP is "master" on primary and "backup" on secondary.
Is there some console command to enter maintenance mode? For separate webGUI problem from configuration problem?
-
In addition:
"Normal" cluster is newly installed from zero. "Skew" parameter on all CARP IP is 0 on master server and 100 on backup server.
"Problem" cluster has long life with many upgrades from previous versions. "Skew" parameter on CARP IPs is 200 on master server and 254 on backup. I never change this parameter manually.
"Base" is 1 on all IP and all servers on both cluster.
What value added to skew when I try enter to maintenance mode?
-
You must have changed the skew at some point, or restored a secondary config to a primary twice. Either way it's wrong and needs fixed. The primary skew should be 0, secondary should be 100.
Fix all of your VIP skews on the primary to 0 or 1 and it will fix itself.
Maintenance mode sets the skew to 254, which in this case only ties it with your secondary so it doesn't work as intended.
-
I manually change all skew to 0 on master. On secondary sever skews changes to 100 automatically. But secondary server stop show CARP status. After reboot secondary it normally show status, and master role normally migrate from master to backup and reverse.
Thank you!