You have just executed a MapReduce job. Where is intermediate data written to after being emitted from the Mapper’s map method?

Posted by: Pdfprep Category: Apache Hadoop Developer Tags: , ,

You have just executed a MapReduce job. Where is intermediate data written to after being emitted from the Mapper’s map method?
A . Intermediate data in streamed across the network from Mapper to the Reduce and is never written to disk.
B . Into in-memory buffers on the TaskTracker node running the Mapper that spill over and are written into HDF
D . Into in-memory buffers that spill over to the local file system of the TaskTracker node running the Mapper.
E . Into in-memory buffers that spill over to the local file system (outside HDFS) of the TaskTracker node running the Reducer
F . Into in-memory buffers on the TaskTracker node running the Reducer that spill over and are written into HDF

Answer: C

Explanation:

The mapper output (intermediate data) is stored on the Local file system (NOT HDFS) of each individual mapper nodes. This is typically a temporary directory location which can be setup in config by the hadoop administrator. The intermediate data is cleaned up after the Hadoop Job completes.

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, Where is the Mapper Output (intermediate kay-value data) stored?

Leave a Reply

Your email address will not be published.