CLUSTER FAILOVER
CLUSTER FAILOVER [FORCE | TAKEOVER]
- Available since:
- Redis Open Source 3.0.0
- Time complexity:
- O(1)
- ACL categories:
-
@admin,@slow,@dangerous, - Compatibility:
- Redis Software and Redis Cloud compatibility
This command, which can only be sent to a Redis Cluster replica node, forces the replica to start a manual failover of its primary instance.
Use a manual failover when you want to replace the current primary with one of its replicas, even though the primary has not failed. Send the command to the replica that you want to promote. Redis Cluster performs the failover safely and avoids data loss by making sure the replica has processed the primary’s full replication stream before it takes over.
A manual failover works as follows:
- The replica tells the primary to stop processing client requests.1. The primary sends its current replication offset to the replica.
- The replica waits until it reaches the same replication offset, which confirms that it has processed all data from the primary.
- The replica starts the failover, gets a new configuration epoch from a majority of primary nodes, and broadcasts the new configuration.
- The old primary receives the configuration update, unblocks its clients, and starts returning redirection messages so clients connect to the new primary.
This process moves clients from the old primary to the new primary only after the new primary has processed the full replication stream.
Optional arguments
FORCE | TAKEOVER
FORCE starts the failover without coordinating with the primary, for use when the primary is unreachable. TAKEOVER additionally bypasses the cluster consensus needed to authorize the failover.
Details
Manual failover when the primary is down
The command behavior can be modified by two options: FORCE and TAKEOVER.
If the FORCE option is given, the replica does not perform any handshake
with the primary, that may be not reachable, but instead just starts a
failover starting from point 4 above. This is useful when you want to start
a manual failover while the primray is no longer reachable.
However, using FORCE requires that the majority of primaries are available
to authorize the failover and generate a new configuration epoch
for the replica that is going to become primary.
Manual failover without cluster consensus
There are situations where this is not enough, and you want a replica to failover without any agreement with the rest of the cluster. A real world use case for this is to mass promote replicas in a different data center to primaries in order to perform a data center switch, while all the primaries are down or partitioned away.
The TAKEOVER option implies everything FORCE implies, but also does
not use any cluster authorization in order to failover. A replica receiving
CLUSTER FAILOVER TAKEOVER will instead:
- Generate a new
configEpochunilaterally, just taking the current greatest epoch available and incrementing it if its local configuration epoch is not already the greatest. - Assign itself all the hash slots of its primary, and propagate the new configuration to every node which is reachable ASAP, and eventually to every other node.
Note that TAKEOVER violates the last-failover-wins principle of Redis Cluster, since the configuration epoch generated by the replica violates the normal generation of configuration epochs in several ways:
- There is no guarantee that it is actually the higher configuration epoch, since, for example, we can use the
TAKEOVERoption within a minority, nor any message exchange is performed to generate the new configuration epoch. - If we generate a configuration epoch which happens to collide with another instance, eventually our configuration epoch, or the one of another instance with our same epoch, will be moved away using the configuration epoch collision resolution algorithm.
Because of this the TAKEOVER option should be used with care.
Implementation details and notes
CLUSTER FAILOVER, unless theTAKEOVERoption is specified, does not execute a failover synchronously. It only schedules a manual failover, bypassing the failure detection stage.- An
OKreply is no guarantee that the failover will succeed. - A replica can only be promoted to a primary if it is known as a replica by a majority of the primaries in the cluster.
If the replica is a new node that has just been added to the cluster (for example after upgrading it), it may not yet be known to all the primaries in the cluster.
To check that the primaries are aware of a new replica, you can send
CLUSTER NODESorCLUSTER REPLICASto each of the primary nodes and check that it appears as a replica, before sendingCLUSTER FAILOVERto the replica. - To check that the failover has actually happened you can use
ROLE,INFO REPLICATION(which indicates "role:master" after successful failover), orCLUSTER NODESto verify that the state of the cluster has changed sometime after the command was sent. - To check if the failover has failed, check the replica's log for
Manual failover timed out, which is logged if the replica has given up after a few seconds.
Redis Software and Redis Cloud compatibility
| Redis Software |
Redis Cloud |
Notes |
|---|---|---|
| ❌ Standard |
❌ Standard |
Return information
OK if the command was accepted and a manual failover is going to be attempted. An error if the operation cannot be executed, for example if the client is connected to a node that is already a primary.