Which process describes the lifecycle of a Mapper?

Which process describes the lifecycle of a Mapper?
A . The JobTracker calls the TaskTracker’s configure () method, then its map () method and finally its close () method.
B . The TaskTracker spawns a new Mapper to process all records in a single input split.
C . The TaskTracker spawns a new Mapper to process each key-value pair.
D . The JobTracker spawns a new Mapper to process all records in a single file.

Answer: B

Explanation:

For each map instance that runs, the TaskTracker creates a new instance of your mapper.

Note:

* The Mapper is responsible for processing Key/Value pairs obtained from the InputFormat. The mapper may perform a number of Extraction and Transformation functions on the

Key/Value pair before ultimately outputting none, one or many Key/Value pairs of the same, or different Key/Value type.

* With the new Hadoop API, mappers extend the org.apache.hadoop.mapreduce.Mapper class. This class defines an ‘Identity’ map function by default – every input Key/Value pair obtained from the InputFormat is written out.

Examining the run() method, we can see the lifecycle of the mapper: /**

* Expert users can override this method for more complete control over the

* execution of the Mapper.

* @param context

* @throws IOException

*/

public void run(Context context) throws IOException, InterruptedException { setup(context);

while (context.nextKeyValue()) {

map(context.getCurrentKey(), context.getCurrentValue(), context);

}

cleanup(context);

}

setup(Context) – Perform any setup for the mapper. The default implementation is a no-op method.

map(Key, Value, Context) – Perform a map operation in the given Key / Value pair. The default implementation calls Context.write(Key, Value)

cleanup(Context) – Perform any cleanup for the mapper. The default implementation is a no-op method.

Reference: Hadoop/MapReduce/Mapper

Which process describes the lifecycle of a Mapper?

Author

Leave a Reply Cancel reply