HCAHD Apache Hadoop Developer Online Free Test

Pdfprep

4 years ago

With our Apache Hadoop Developer practice test, you don’t need to look for other online testing engine that are often obsolete. In most of the cases, people looking for Apache Hadoop Developer prep questions online from us for their certification prep requirements. Our top ranked Apache Hadoop Developer prep questions usually searched on the internet using different search terms like specified below.

Page 1 of 5

1. Which one of the following statements is FALSE regarding the communication between DataNodes and a federation of NameNodes in Hadoop 2.0?

Each DataNode receives commands from one designated master NameNode.

DataNodes send periodic heartbeats to all the NameNodes.

Each DataNode registers with all the NameNodes.

DataNodes send periodic block reports to all the NameNodes.

2. Your client application submits a MapReduce job to your Hadoop cluster. Identify the Hadoop daemon on which the Hadoop framework will look for an available slot schedule a MapReduce operation.

TaskTracker

NameNode

DataNode

JobTracker

Secondary NameNode

3. Which process describes the lifecycle of a Mapper?

The JobTracker calls the TaskTracker’s configure () method, then its map () method and finally its close () method.

The TaskTracker spawns a new Mapper to process all records in a single input split.

The TaskTracker spawns a new Mapper to process each key-value pair.

The JobTracker spawns a new Mapper to process all records in a single file.

4. Which two of the following are true about this trivial Pig program' (choose Two)

The contents of myfile appear on stdout

Pig assumes the contents of myfile are comma delimited

ABC has a schema associated with it

myfile is read from the user's home directory in HDFS

5. Examine the following Pig commands:

Which one of the following statements is true?

The SAMPLE command generates an "unexpected symbol" error

Each MapReduce task will terminate after executing for 0.2 minutes

The reducers will only output the first 20% of the data passed from the mappers

A random sample of approximately 20% of the data will be output

6. Which two of the following statements are true about Pig's approach toward data? Choose 2 answers

Accepts only data that has a key/value pair structure

Accepts data whether it has metadata or not

Accepts only data that is defined by metadata tables stored in a database

Accepts tab-delimited text data only

Accepts any data: structured or unstructured

7. You want to perform analysis on a large collection of images. You want to store this data in HDFS and process it with MapReduce but you also want to give your data analysts and data scientists the ability to process the data directly from HDFS with an interpreted high-level programming language like Python.

Which format should you use to store this data in HDFS?

SequenceFiles

Avro

JSON

HTML

XML

CSV

8. What does the following command do?

register '/piggyban):/pig-files.jar';

Invokes the user-defined functions contained in the jar file

Assigns a name to a user-defined function or streaming command

Transforms Pig user-defined functions into a format that Hive can accept

Specifies the location of the JAR file containing the user-defined functions

9. Review the following 'data' file and Pig code.

Which one of the following statements is true?

The Output Of the DUMP D command IS (M,{(M,62.95102),(M,38,95111)})

The output of the dump d command is (M, {(38,95in),(62,95i02)})

The code executes successfully but there is not output because the D relation is empty

The code does not execute successfully because D is not a valid relation

10. Given the following Hive command:

INSERT OVERWRITE TABLE mytable SELECT * FROM myothertable;

Which one of the following statements is true?

The contents of myothertable are appended to mytable

Any existing data in mytable will be overwritten

A new table named mytable is created, and the contents of myothertable are copied into mytable

The statement is not a valid Hive command

Page 2 of 5

11. Which best describes how TextInputFormat processes input files and line breaks?

Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginning of the broken line.

Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReaders of both splits containing the broken line.

The input file is split exactly at the line breaks, so each RecordReader will read a series of complete lines.

Input file splits may cross line breaks. A line that crosses file splits is ignored.

Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line.

12. You have written a Mapper which invokes the following five calls to the OutputColletor.collect method:

output.collect (new Text (“Apple”), new Text (“Red”) ) ;

output.collect (new Text (“Banana”), new Text (“Yellow”) ) ;

output.collect (new Text (“Apple”), new Text (“Yellow”) ) ;

output.collect (new Text (“Cherry”), new Text (“Red”) ) ;

output.collect (new Text (“Apple”), new Text (“Green”) ) ;

How many times will the Reducer’s reduce method be invoked?

13. To process input key-value pairs, your mapper needs to lead a 512 MB data file in memory.

What is the best way to accomplish this?

Serialize the data file, insert in it the JobConf object, and read the data into memory in the configure method of the mapper.

Place the data file in the DistributedCache and read the data into memory in the map method of the mapper.

Place the data file in the DataCache and read the data into memory in the configure method of the mapper.

Place the data file in the DistributedCache and read the data into memory in the configure method of the mapper.

14. Which one of the following statements is true about a Hive-managed table?

Records can only be added to the table using the Hive INSERT command.

When the table is dropped, the underlying folder in HDFS is deleted.

Hive dynamically defines the schema of the table based on the FROM clause of a SELECT query.

Hive dynamically defines the schema of the table based on the format of the underlying data.

15. Consider the following two relations, A and B.

What is the output of the following Pig commands?

X = GROUP A BY S1;

DUMP X;

Option A

Option B

Option C

Option D

16. What is a SequenceFile?

A SequenceFile contains a binary encoding of an arbitrary number of homogeneous writable objects.

A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous writable objects.

A SequenceFile contains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order.

A SequenceFile contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be same type.

17. You use the hadoop fs Cput command to write a 300 MB file using and HDFS block size of 64 MB.

Just after this command has finished writing 200 MB of this file, what would another user see when trying to access this life?

They would see Hadoop throw an ConcurrentFileAccessException when they try to access this file.

They would see the current state of the file, up to the last bit written by the command.

They would see the current of the file through the last completed block.

They would see no content until the whole file written and closed.

18. Indentify the utility that allows you to create and run MapReduce jobs with any executable

or script as the mapper and/or the reducer?

Oozie

Sqoop

Flume

Hadoop Streaming

mapred

19. You have just executed a MapReduce job. Where is intermediate data written to after being emitted from the Mapper’s map method?

Intermediate data in streamed across the network from Mapper to the Reduce and is never written to disk.

Into in-memory buffers on the TaskTracker node running the Mapper that spill over and are written into HDF

Into in-memory buffers that spill over to the local file system of the TaskTracker node running the Mapper.

Into in-memory buffers that spill over to the local file system (outside HDFS) of the TaskTracker node running the Reducer

Into in-memory buffers on the TaskTracker node running the Reducer that spill over and are written into HDF

20. A combiner reduces:

The number of values across different keys in the iterator supplied to a single reduce method call.

The amount of intermediate data that must be transferred between the mapper and reducer.

The number of input files a mapper must process.

The number of output files a reducer must produce.

Page 3 of 5

21. How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of MapReduce?

Keys are presented to reducer in sorted order; values for a given key are not sorted.

Keys are presented to reducer in sorted order; values for a given key are sorted in ascending order.

Keys are presented to a reducer in random order; values for a given key are not sorted.

Keys are presented to a reducer in random order; values for a given key are sorted in ascending order.

22. All keys used for intermediate output from mappers must:

Implement a splittable compression algorithm.

Be a subclass of FileInputFormat.

Implement WritableComparable.

Override isSplitable.

Implement a comparator for speedy sorting.

23. Which one of the following statements describes a Pig bag. tuple, and map, respectively?

Unordered collection of maps, ordered collection of tuples, ordered set of key/value pairs

Unordered collection of tuples, ordered set of fields, set of key value pairs

Ordered set of fields, ordered collection of tuples, ordered collection of maps

Ordered collection of maps, ordered collection of bags, and unordered set of key/value pairs

24. You want to run Hadoop jobs on your development workstation for testing before you submit them to your production cluster.

Which mode of operation in Hadoop allows you to most closely simulate a production cluster while using a single machine?

Run all the nodes in your production cluster as virtual machines on your development workstation.

Run the hadoop command with the Cjt local and the Cfs file:///options.

Run the DataNode, TaskTracker, NameNode and JobTracker daemons on a single machine.

Run simldooop, the Apache open-source software for simulating Hadoop clusters.

25. Which HDFS command uploads a local file X into an existing HDFS directory Y?

hadoop scp X Y

hadoop fs -localPut X Y

hadoop fs-put X Y

hadoop fs -get X Y

26. In Hadoop 2.0, which TWO of the following processes work together to provide automatic failover of the NameNode? Choose 2 answers

ZKFailoverController

ZooKeeper

QuorumManager

JournalNode

27. To use a lava user-defined function (UDF) with Pig what must you do?

Define an alias to shorten the function name

Pass arguments to the constructor of UDFs implementation class

Put the JAR file into the user's home folder in HDFS

28. When is the earliest point at which the reduce method of a given Reducer can be called?

As soon as at least one mapper has finished processing its input split.

As soon as a mapper has emitted at least one record.

Not until all mappers have finished processing all records.

It depends on the InputFormat used for the job.

29. Which one of the following statements describes the relationship between the ResourceManager and the ApplicationMaster?

The ApplicationMaster requests resources from the ResourceManager

The ApplicationMaster starts a single instance of the ResourceManager

The ResourceManager monitors and restarts any failed Containers of the ApplicationMaster

The ApplicationMaster starts an instance of the ResourceManager within each Container

30. Which HDFS command copies an HDFS file named foo to the local filesystem as localFoo?

hadoop fs -get foo LocalFoo

hadoop -cp foo LocalFoo

hadoop fs -Is foo

hadoop fs -put foo LocalFoo

Page 4 of 5

31. You need to perform statistical analysis in your MapReduce job and would like to call methods in the Apache Commons Math library, which is distributed as a 1.3 megabyte Java archive (JAR) file.

Which is the best way to make this library available to your MapReducer job at runtime?

Have your system administrator copy the JAR to all nodes in the cluster and set its location in the HADOOP_CLASSPATH environment variable before you submit your job.

Have your system administrator place the JAR file on a Web server accessible to all cluster nodes and then set the HTTP_JAR_URL environment variable to its location.

When submitting the job on the command line, specify the Clibjars option followed by the JAR file path.

Package your code and the Apache Commands Math library into a zip file named JobJar.zip

32. In a MapReduce job with 500 map tasks, how many map task attempts will there be?

It depends on the number of reduces in the job.

Between 500 and 1000.

At most 500.

At least 500.

Exactly 500.

33. You want to count the number of occurrences for each unique word in the supplied input data. You’ve decided to implement this by having your mapper tokenize each word and emit a literal value 1, and then have your reducer increment a counter for each literal 1 it receives. After successful implementing this, it occurs to you that you could optimize this by specifying a combiner.

Will you be able to reuse your existing Reduces as your combiner in this case and why or why not?

Yes, because the sum operation is both associative and commutative and the input and output types to the reduce method match.

No, because the sum operation in the reducer is incompatible with the operation of a Combiner.

No, because the Reducer and Combiner are separate interfaces.

No, because the Combiner is incompatible with a mapper which doesn’t use the same data type for both the key and value.

Yes, because Java is a polymorphic object-oriented language and thus reducer code can be reused as a combiner.

34. What data does a Reducer reduce method process?

All the data in a single input file.

All data produced by a single mapper.

All data for a given key, regardless of which mapper(s) produced it.

All data for a given value, regardless of which mapper(s) produced it.

35. Given a directory of files with the following structure: line number, tab character, string:

Example:

1abialkjfjkaoasdfjksdlkjhqweroij

2kadfjhuwqounahagtnbvaswslmnbfgy

3kjfteiomndscxeqalkzhtopedkfsikj

You want to send each line as one record to your Mapper.

Which InputFormat should you use to complete the line: conf.setInputFormat (____.class) ; ?

SequenceFileAsTextInputFormat

SequenceFileInputFormat

KeyValueFileInputFormat

BDBInputFormat

36. Examine the following Hive statements:

Assuming the statements above execute successfully, which one of the following statements is true?

Each reducer generates a file sorted by age

The SORT BY command causes only one reducer to be used

The output of each reducer is only the age column

The output is guaranteed to be a single file with all the data sorted by age

37. When can a reduce class also serve as a combiner without affecting the output of a MapReduce program?

When the types of the reduce operation’s input key and input value match the types of the reducer’s output key and output value and when the reduce operation is both communicative and associative.

When the signature of the reduce method matches the signature of the combine method.

Always. Code can be reused in Java since it is a polymorphic object-oriented programming language.

Always. The point of a combiner is to serve as a mini-reducer directly after the map phase to increase performance.

Never. Combiners and reducers must be implemented separately because they serve different purposes.

38. What does the following WebHDFS command do?

Curl -1 -L “http://host:port/webhdfs/v1/foo/bar?op=OPEN”

Make a directory /foo/bar

Read a file /foo/bar

List a directory /foo

Delete a directory /foo/bar

39. You need to run the same job many times with minor variations. Rather than hardcoding all job configuration options in your drive code, you’ve decided to have your Driver subclass org.apache.hadoop.conf.Configured and implement the org.apache.hadoop.util.Tool interface.

Indentify which invocation correctly passes.mapred.job.name with a value of Example to Hadoop?

hadoop “mapred.job.name=Example” MyDriver input output

hadoop MyDriver mapred.job.name=Example input output

hadoop MyDrive CD mapred.job.name=Example input output

hadoop setproperty mapred.job.name=Example MyDriver input output

hadoop setproperty (“mapred.job.name=Example”) MyDriver input output

40. Determine which best describes when the reduce method is first called in a MapReduce job?

Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The programmer can configure in the job what percentage of the intermediate data should arrive before the reduce method begins.

Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The reduce method is called only after all intermediate data has been copied and sorted.

Reduce methods and map methods all start at the beginning of a job, in order to provide optimal performance for map-only or reduce-only jobs.

Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The reduce method is called as soon as the intermediate key-value pairs start to arrive.

Page 5 of 5

41. You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt and #data.txt.

How many files will be processed by the FileInputFormat.setInputPaths () command when it's given a path object representing this directory?

Four, all files will be processed

Three, the pound sign is an invalid character for HDFS file names

Two, file names with a leading period or underscore are ignored

None, the directory cannot be named jobdata

One, no special characters can prefix the name of an input file

42. In a large MapReduce job with m mappers and n reducers, how many distinct copy operations will there be in the sort/shuffle phase?

mXn (i.e., m multiplied by n)

m+n (i.e., m plus n)

mn (i.e., m to the power of n)

43. Which Hadoop component is responsible for managing the distributed file system metadata?

NameNode

Metanode

DataNode

NameSpaceManager

44. Review the following data and Pig code.

M,38,95111

F,29,95060

F,45,95192

M,62,95102

F,56,95102

A = LOAD 'data' USING PigStorage('.') as (gender:Chararray, age:int, zlp:chararray);

B = FOREACH A GENERATE age;

Which one of the following commands would save the results of B to a folder in hdfs named myoutput?

STORE A INTO 'myoutput' USING PigStorage(',');

DUMP B using PigStorage('myoutput');

STORE B INTO 'myoutput';

DUMP B INTO 'myoutput';

Categories: Hortonworks Online Questions

Tags: Apache Hadoop Developer Online Questions