Continuent Blog: Tungsten Cluster: How Does Failover Work? Part 1 of 3

Blog

Our team of MySQL database experts regularly blogs on topics that range from MySQL availability, MySQL replication, multi-master MySQL, and MySQL-aware proxies, all the way through to ‘how to’ content for our solutions: Tungsten Clustering, Tungsten Replicator and Tungsten Proxy.

Introduction

Did you ever wonder just what the Tungsten Manager is thinking when it does an automatic failover or a manual switch in a cluster?

What factors are taken into account by the Manager when it picks a replica to fail over to?

This blog post will detail the steps the Manager takes to perform a switch or failover.

We will cover both the process and some possible reasons why that process might not complete, along with best practices and ways to monitor the cluster for each situation.

Roles for Nodes and Clusters

When we say “role” in the context of a cluster datasource, we are talking about the view of a database node from the Manager’s perspective.

These roles apply to the node datasource at the local (physical) cluster level, and to the composite datasource at the composite cluster level.

Possible roles are:

Primary
- a database node which is writable, or
- a composite cluster which is active (contains a writable primary).
Relay
- a read-only database node which pulls data from a remote cluster and shares it with downstream replicas in the same cluster.
Replica
- a read-only database node which pulls data from a local-cluster primary node, or from a local-cluster relay node for passive composite clusters;
- a composite cluster which is passive (contains a relay but NO writable primary).

Moving the Primary Role to Another Node or Cluster

One of the great powers of the Tungsten Cluster is that the roles for both cluster nodes and composite cluster datasources can be moved to another node or cluster, either at will via the cctrl> switch command, or by having an automatic failover invoked by the Tungsten Manager layer.

Please note that while failovers are normally automatic and triggered by the Tungsten Manager, a failover can be also be invoked manually via the cctrl command if ever needed.

Switch Versus Failover

There are key differences between the manual switch and automatic failover operations:

Switch

Switch attempts to perform the operation as gracefully as possible, so there will be a delay as all of the steps are followed to ensure zero data loss.
When the switch sub-command is invoked within cctrl, the Manager will cleanly close connections and ensure replication is caught up before moving the Primary role to another node.
Switch recovers the original Primary to be a Replica.
See https://docs.continuent.com/tungsten-clustering-7.0/operations-primaryswitch-manual.html.

Failover

Failover is immediate, and could possibly result in data loss, even though we do everything we can to get all events moved to the new Primary.
Failover leaves the original primary in a SHUNNED state.
Connections are closed immediately.
Use the cctrl> recover command to make the failed Primary into a Replica once it is healthy.
See https://docs.continuent.com/tungsten-clustering-7.0/operations-primaryswitch-automatic.html
and https://docs.continuent.com/tungsten-clustering-7.0/manager-failover-behavior.html.

For even more details, please visit: https://docs.continuent.com/tungsten-clustering-7.0/operations-primaryswitch.html.

Which Target Node to Use?

Picking a target replica node from a pool of candidate database replicas involves several checks and decisions.

For switch commands for both physical and composite services, the user has the ability to pass in the name of the physical or composite replica that is to be the target of the switch.

If no target is passed in, or if the operation is an automatic failover, then the Manager has logic to identify the 'most up to date' replica which then becomes the target of the switch or failover.

Here are the choices to pick a new primary database node from available replicas, in order:

Skip any replica that is either not online or that is NOT a standby replica.
Skip any replica that has its status set to ARCHIVE.
Skip any replica that does not have an online manager.
Skip any replica that does not have a replicator in either online or synchronizing state.
Now we have a target datasource prospect...
By comparing the last applied sequence number of the current target datasource prospect to any other previously seen prospect, we should eventually end up with a replica that has the highest applied sequence number. We also save the prospect that has the highest stored sequence number.
If we find that there is a tie in the highest sequence number that has been applied or stored by any prospect with another prospect, we compare the datasource precedence and if there's a difference in this precedence, we choose the datasource with the lowest precedence number i.e. a precedence of 1 is higher than a precedence of 2. If there is a tie in precedence, select the last Replica chosen and discard the Replica currently being evaluated.
After we have evaluated all of the Replicas, we will either have a single winner or we may have a case where we have one replica that has the highest applied sequence number, but we have another Replica that has the highest stored sequence number i.e. it has gotten the most number of THL records from the primary prior to the switch operation. In this case, and this is particularly important in cases of failover, we choose the Replica that has the highest number of stored THL records.
At this point return to the switch or failover command whatever target replica we have chosen so that the operation can proceed.

After looping over all available Replicas, check the selected target Replica’s applied latency to see if it is higher than the configured threshold. If the appliedLatency is too far behind, do not use that Replica. The tpm option --property=policy.slave.promotion.latency.threshold=900 controls the check, with 900 seconds as the default value.

If no viable Replica is found (or if there is no available Replica to begin with), there will be no switch or failover at this point.

For more details on automatic failover versus manual switch, please visit: https://docs.continuent.com/tungsten-clustering-7.0/manager-failover-internals-manual-switch-versus-automatic-failover.html.

Switch and Failover Steps for Local Clusters

In the upcoming Part 2 of this post, we will examine the steps needed to do a local failover.

For more details on switch and failover steps for local clusters, please visit:
https://docs.continuent.com/tungsten-clustering-7.0/operations-primaryswitch.html
https://docs.continuent.com/tungsten-clustering-7.0/manager-failover-internals-switch-failover-steps-local-clusters.html.

Switch and Failover Steps for Composite Services

In the upcoming Part 3 of this post, we will examine the steps needed to do a composite site-level failover.

For more details on switch and failover steps for composite services, please visit:

Best Practices for Proper Cluster Failovers

What are the best practices for ensuring the cluster always behaves as expected? Are there any reasons for a cluster NOT to fail over? If so, what are they?

Here are three common reasons that a cluster might not failover properly:

Policy Not Automatic
- BEST PRACTICE: Ensure the cluster policy is automatic unless you specifically need it to be otherwise.
- SOLUTION: Use the check_tungsten_policy command to verify the policy status.
Complete Network Partition
- If the nodes are unable to communicate cluster-wide, then all nodes will go into a FailSafe-Shun mode to protect the data from a split-brain situation.
- BEST PRACTICE: Ensure that all nodes are able to see each other via the required network ports.
- SOLUTION: Verify that all required ports are open between all nodes local and remote
  https://docs.continuent.com/tungsten-clustering-7.0/prerequisite-host.html#prerequisite-host-networkports.
- SOLUTION: Use the check_tungsten_online command to check the DataSource State on each node.
No Available Replica
- See “Which Target Node To Use?” above for the replica exclusion rules.
- BEST PRACTICE: Ensure there is at least one ONLINE node that is not in STANDBY or ARCHIVE mode.
  - SOLUTION: Use the check_tungsten_online command to check the DataSource State on each node.
- BEST PRACTICE: Ensure that the Manager is running on all nodes.
  - SOLUTION: Use the check_tungsten_services command to verify that the Tungsten processes are running on each node.
- BEST PRACTICE: Ensure all Replicators are either ONLINE or GOING ONLINE:SYNCHRONIZING.
  - SOLUTION: Use the check_tungsten_online command to verify that the Replicator (and Manager) is ONLINE on each node.
- BEST PRACTICE: Ensure the replication applied latency is under the threshold, default 900 seconds.
  - SOLUTION: Use the check_tungsten_latency command to check the latency on each node.

Command-Line Monitoring Tools

Below are examples of all the health-check tools listed above:

check_tungsten_services

shell> check_tungsten_services -c -r
CRITICAL: Connector, Manager, Replicator are not running

shell> startall
Starting Replicator normally
Starting Tungsten Replicator Service...
Waiting for Tungsten Replicator Service.......
running: PID:14628
Starting Tungsten Manager Service...
Waiting for Tungsten Manager Service..........
running: PID:15143
Starting Tungsten Connector Service...
Waiting for Tungsten Connector Service.......
running: PID:15513

shell> check_tungsten_services -c -r
OK: All services (Connector, Manager, Replicator) are running

check_tungsten_policy

shell> check_tungsten_policy
CRITICAL: Manager is not running

shell> manager start

shell> check_tungsten_policy
CRITICAL: Policy is MAINTENANCE

shell> cctrl
cctrl> set policy automatic
cctrl> exit

shell> check_tungsten_policy
OK: Policy is AUTOMATIC

check_tungsten_latency

shell> check_tungsten_latency -w 100 -c 200
CRITICAL: Manager is not running

shell> manager start

shell> check_tungsten_latency -w 100 -c 200
CRITICAL: db8=65107.901s, db9 is missing latency information

shell> cctrl
cctrl> cluster heartbeat
cctrl> exit

shell> check_tungsten_latency -w 100 -c 200
WARNING: db9 is missing latency information  

shell> cctrl
cctrl> set policy automatic
cctrl> exit

shell> check_tungsten_latency -w 100 -c 200
OK: All replicas are running normally (max_latency=4.511)

check_tungsten_online

shell> check_tungsten_online
CRITICAL: Manager is not running

shell> manager start

shell> check_tungsten_online
CRITICAL: Replicator is not running

shell> replicator start

shell> check_tungsten_online
CRITICAL: db9 REPLICATION SERVICE north is not ONLINE

shell> trepctl online

shell> check_tungsten_online
OK: All services on db9 are online

More Information

Tungsten Manager Failover Behavior
Tungsten Manager Failover Tuning
Tungsten Manager Failover Internals
Network Port Prerequisites

Wrap-Up

This blog post discussed the steps the Manager takes to perform a switch or failover, the best practices for ensuring proper failover, and possible solutions for monitoring the cluster health to ensure proper operation.

Smooth sailing!

Published In

Categories:

Cluster Management, Database Administration

Series:

Tungsten University

Tags:

MySQL, MariaDB, new feature, Monitoring

Author

Eric M. Stone

COO and VP of Product Management

Eric is a veteran of fast-paced, large-scale enterprise environments with 40 years of Information Technology experience. With a focus on HA/DR, from building data centers and trading floors to world-wide deployments, Eric has architected, coded, deployed and administered systems for a wide variety of disparate customers, from Fortune 500 financial institutions to SMB’s.

View All Eric M.’s Posts

Tungsten Cluster: How Does Failover Work? Part 1 of 3