Raft: The Distributed Systems Algorithm

Consensus algorithms are at the core of distributed systems. How do you manage consistency across multiple servers or nodes?

The Raft Consensus Algorithm is a distributed system protocol that’s widely used (including by systems like Kubernetes, via etcd). It is equivalent in fault tolerance and consistency guarantees to Paxos, which is often seen as a more complex approach.

Here’s a simplification of the algorithm:

Overall design: Elect a leader among the servers, which is responsible for managing replication and ensuring that all followers have the same data. If the leader fails, the system elects a new leader.

Some of the steps (simplified)

Initialization:

All nodes start as followers

Elect a Leader

If a follower does not receive a heartbeat message from the leader within a certain time period, it becomes a candidate for leadership.
The candidate votes for itself and asks all other nodes for votes
A candidate becomes the leader if it gets a majority of votes

Replicate Logs

The leader accepts commands from clients and appends them to its log
It sends the logs to all followers
When the majority acknowledges the entry, the leader applies it to its own state machine and informs the clients.

Each step has many other nuances, but this is a very high-level description of the algorithm. The Raft paper is the best place for more information. And the etcd raft implementation is a good starting point if you’re more comfortable looking through the code.

Some other systems that use Raft.

CockroachDB
ClickHouse
MongoDB
Etcd