A team needs to run distributed Apache Spark jobs to preprocess petabyte-scale datasets stored in Amazon S3 before ML training. They need the ability to choose specific instance types and use spot instances. Which AWS service best fits?

Question

Accepted Answer

B. Amazon EMR. Amazon EMR provides managed Hadoop and Spark clusters with full control over instance types, cluster configuration, and spot instance usage, making it ideal for large-scale distributed data preprocessing.

A team needs to run distributed Apache Spark jobs to preprocess petabyte-scale datasets stored in Amazon S3 before ML training. They need the ability to choose specific instance types and use spot instances. Which AWS service best fits?

Related Questions

Discussion