You are given 10, 000, 000 user profile pages of an online dating site in XML files, and they are stored in HDFS. You are assigned to divide the users into groups based on the content of their profiles. You have been instructed to try K-means clustering on this data. How should you proceed?

Posted by: Pdfprep Category: E20-007 Tags: , ,

You are given 10, 000, 000 user profile pages of an online dating site in XML files, and they are stored in HDFS. You are assigned to divide the users into groups based on the content of their profiles. You have been instructed to try K-means clustering on this data. How should you proceed?
A . Run MapReduce to transform the data, and find relevant key value pairs.
B . Divide the data into sets of 1, 000 user profiles, and run K-means clustering in RHadoop iteratively.
C . Run a Naive Bayes classification as a pre-processing step in HDF
E . Partition the data by XML file size, and run K-means clustering in each partition.

Answer: A

Leave a Reply

Your email address will not be published.