Ongaro and Ousterhout (2014). In Search of an Understandable Consensus Algorithm [Raft]

Fri Feb 05 2021

tags: draft programming computer science self study notes public 6.824 MIT distributed systems

Introduction

This is the famous Raft (extended) paper, and here are my notes on it.

I read this paper as part of my self-study of 6.824 with NUS SoC folks. It may or may not be part of my goal to finish a Stanford CS degree in a year.

Overview of the paper

Raft is an algorithm for managing a replicated log. In the context of Raft a replicated log is a list of proposed changes to a state machine. Raft guarantees consistency (it's CP) and is fault-tolerant: the system remains available and consistent (linearisable?) ^[1] as long as a majority of nodes stay up.

It was designed with ease of understanding in mind: the authors claim that Paxos is difficult to understand and propose Raft as a more understandable alternative.

Safety properties of Raft:

Election safety: at most one leader can be elected in a given term. Leader append-only: a leader never overwrites or deletes entries in its log; it only appends new entries. Log matching: if two logs contain an entry with the same index and term, then the logs are identical in all entries up through the given index. (That is, if two logs agree on an entry in position i, they will also agree on all entries from 1 through i-1). Leader completeness: if a log entry is committed in a given term, that entry will be present in the logs of the leaders for all higher-numbered terms. State machine safety:if a server has applied a log entry at a given index to its state machine, no other server will ever apply a different log entry for the same index.

Interesting things about the paper

Questions I had about the paper

Leader election and the election timeout

A server remains in follower state as long as it receives valid RPCs from a leader or candidate. Leaders send periodic heartbeats (AppendixEntries RPCs with no log entries) to all followers in order to maintain their authority. If a follower receives no communication over a period of time called the election timeout, then it assumes there is no viable leader and begins an election to choose a new leader.

Why is it important that nodes disregard RequestVote RPCs before the minimum election timeout? What would happen if they didn't?

Find an example whereby nodes didn't disregard RequestVote RPCs before the minimum election timeout, and show that it is bad.

What is "copy-on-write"? Why is it important?

What is "linearisable semantics"? Why is it important?

Why does linearisable semantics imply no stale reads?

How do we prevent leaders from returning stale reads?

Leaders need to receive a heartbeat from a majority to make sure it's still the leader before returning any read data to the client.

Conclusion

I should think more about how to map these consistency guarantees to the actual safety properties of Raft. ↩︎