What should you do?

Posted by: Pdfprep Category: DP-100 Tags: , ,

Your team is building a data engineering and data science development environment.

The environment must support the following requirements:

– support Python and Scala

– compose data storage, movement, and processing services into automated data pipelines

– the same tool should be used for the orchestration of both data engineering and data science

– support workload isolation and interactive workloads

– enable scaling across a cluster of machines

You need to create the environment.

What should you do?
A . Build the environment in Apache Hive for HDInsight and use Azure Data Factory for orchestration.
B . Build the environment in Azure Databricks and use Azure Data Factory for orchestration.
C . Build the environment in Apache Spark for HDInsight and use Azure Container Instances for orchestration.
D . Build the environment in Azure Databricks and use Azure Container Instances for orchestration.

Answer: B

Explanation:

In Azure Databricks, we can create two different types of clusters.

– Standard, these are the default clusters and can be used with Python, R, Scala and SQL

– High-concurrency

Azure Databricks is fully integrated with Azure Data Factory.

Incorrect Answers:

D: Azure Container Instances is good for development or testing. Not suitable for production workloads.

References: https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/data-science-and­machine-learning

Leave a Reply

Your email address will not be published.