Dean and Ghemawat (2004). MapReduce: Simplified Data Processing on Large ClustersFri Feb 05 2021
This paper introduces MapReduce.
Overview of the paper
Interesting things about the paper
MapReduce gets map jobs
For some reason, the paper gets Map workers to send the list of the generated intermediate file locations back to the server, but doesn't do it for Reduce workers. I get it--there's not really a need for Reduce workers to send back since Reduce output files are saved on a shared file system while Map intermediate files are saved on the individual workers' storage. The drawback is that it results in additional work for the server to have to constantly check for the presence of output files. I compared my implementation to
I think this could be trying to save unnecessary network I/O.
How MapReduce avoids inconsistency
The problem is
Suppose you have a m
The paper prevents master from observing files that have been partially written by using temporary output files plus a atomic rename operation.