Self-studying MIT's 6.824 Distributed Systems

Mon Feb 01 2021

tags: programming computer science self study notes public 6.824 MIT distributed systems

Introduction

I'm auditing the NUS DYOM Distributed Systems course. Thanks to Wei Neng for introducing me:

I would do 6.284 if I were you

There's a class in NUS running now

Wei Neng told me to ask Joel, who then told me to consult Bernard, who managed to get me on board! Very pleased.

I was surprised at how progressive NUS was: they allow students to make their own module (they call it DYOM) and do it for credits. So this module is based on the MIT course 6.824: Distributed Systems. The difference between the NUS DYOM course and the MIT course is primarily that of scope. The original 6.824 is a graduate-level course, this is for undergraduates; the original course is taught by a professor, this is student-run; finally, MIT students usually do two modules a semester, while NUS students do five. So I think the plan is to do fewer labs (they will stop at lab 2C) read fewer papers, and have fewer lectures. However, I plan to finish all of the labs (up to Lab 3B and do the final project)
and read some additional papers on the side.

Structure of the course

From the MIT 6.824 course page:

6.824 is a core 12-unit graduate subject with lectures, readings, programming labs, an optional project, a mid-term exam, and a final exam. It will present abstractions and implementation techniques for engineering distributed systems. Major topics include fault tolerance, replication, and consistency. Much of the class consists of studying and discussing case studies of distributed systems.

We meet every Monday in person at SoC to go through a paper or proof. Each week a different person leads the discussion. We do the labs (1, 2A--2D, 3A, 3B) on our own time.

Papers covered in the course

The NUS students voted to cover a subset of the papers covered in the 2018 offering. Some of these papers are no longer being covered in the latest (2021) offering.

MapReduce (2004)
GFS (2003)
Fault-Tolerant Virtual Machines (2010)
Extended Raft paper (2014)
Zookeeper (2010)
CRAQ (2009)
Amazon Aurora (2017) (replaced with Pacifica (2008) in the 2021 offering)
Frangipani (1997)
Saltzer and Kaashoek (2009). Principles of Computer System Design: An Introduction (used in MIT's 6.033 course). Chapter 9, sections 9.1.5, 9.1.6, 9.5.2, 9.5.3, 9.6.3
Spanner (2012)
FARM (2015)
Spark (2012)
Memcached at Facebook (2013)
COPS (2011)
Bitcoin (2008)
Blockstack (2017)
Analogic FS experience paper
Spinnaker (2011) (this was taken out in the 2021 offering, but we're reading it regardless)
Amazon DynamoDB (2007) (ditto)
Shapiro et al. (2011). CRDTs (not covered in the MIT course, but interesting and relevant)

I will be writing notes on each paper I read to serve as a reference for my future self, to keep me accountable, and to serve as a proof-of-work.

I also write notes in key distributed systems concepts: the CAP impossibility theorem and the FLP impossibility theorem.

Lab assignments in the course

Lab 1 implements MapReduce. I've successfully implemented it (took me about a day and a half to read and implement: 7--8 hours total?) My notes on Lab 1.
Lab 2 (A, B, C, D) implements the [Raft consensus algorithm](https://files.lieu.gg/docs/Ongaro and Ousterhout (2014). In Search of an Understandable Consensus Algorithm (Extended Version).pdf) with log compaction. My notes on Lab 2.
Lab 3 (A, B) builds a fault-tolerant key-value store using the Raft consensus mechanism.
Lab 4 is either building a sharded key-value service, or creating a final project of your own. I think this is super cool, and I hope that someone in the class will be keen to do a final project with me.

Other things I expect to learn in the course

Programming in Go
The Go memory model
Concurrency, threads, locking, etc.