Skip to content

Data Preparation for Machine Learning (ML) Questions

Practice questions for Data Preparation for Machine Learning (ML) topic in AWS Certified Machine Learning Engineer - Associate. 56 questions covering this domain.

56 questions16 easy28 medium12 hard
Q1
easy

Business analysts need a visual tool to clean and normalize raw data without writing code before it is used in ML workflows. Which AWS service best fi...

Q2
hard

A company wants an append-only historical store of feature values in Amazon S3 for training and batch inference, while keeping a record of all past va...

Q3
hard

A healthcare company needs human labeling for a new medical image dataset with custom task UIs and wants multiple annotations consolidated into a fina...

Q4
medium

An ML engineer wants to prototype transformations visually and then export the same preparation logic into code for a custom workflow. Which Data Wran...

Q5
easy

A team wants a SageMaker low-code tool that can import data from sources such as Amazon S3 and Amazon Athena, apply transformations, analyze the data,...

Q6
hard

A data preparation team wants to interactively test a recipe on sample rows in a visual workspace and then store the prepared dataset in Amazon S3 aft...

Q7
easy

A team needs a managed labeling service where they can choose built-in task types or create their own custom labeling workflow for training data. Whic...

Q8
medium

A team wants a managed, serverless way to analyze a dataset and have AWS create data quality rule recommendations to help them get started quickly. Wh...

Q9
medium

A team wants to reduce training-serving skew by ingesting and serving features consistently across experimentation and inference. Which SageMaker capa...

Q10
medium

A recommendation system needs low-latency access to the newest features for online predictions and a full historical record of those features for mode...

Q11
easy

A fraud detection application needs the latest feature values with low millisecond latency during online inference. Which SageMaker Feature Store comp...

Q12
medium

A data scientist needs to measure class imbalance in a dataset before training by using the difference in proportions of labels (DPL) metric. Which Sa...

Q13
medium

A labeling project is becoming expensive because every item is labeled manually. The team wants a service that supports automated data labeling and an...

Q14
medium

A governance team wants non-coders to evaluate data quality on AWS Glue Data Catalog tables by defining rules and monitoring results. Which service sh...

Q15
medium

Which AWS service provides a managed environment to run Apache Hudi/Iceberg/Delta Lake transactional tables on Amazon S3 for ML feature pipelines that...

Q16
medium

An ML engineer wants Data Wrangler to handle imbalanced training data by oversampling the minority class. Which transform should they use?

Q17
hard

An ML pipeline must scan terabytes of training data and convert from CSV to Apache Parquet in columnar form to speed Athena and SageMaker Training rea...

Q18
easy

Which Amazon SageMaker feature provides a managed Jupyter-based IDE for ML development with notebooks, training, tuning, and deployment in a unified U...

Q19
medium

A team needs to convert categorical features into numeric vectors via one-hot encoding before training a SageMaker XGBoost model. Which AWS-managed fe...

Q20
medium

Which AWS service provides serverless ETL with Spark-based jobs and a managed Data Catalog used by Athena, EMR, and Redshift Spectrum to run schema-aw...

Sign in to see all 56 questions

Create a free account to browse all questions — completely free during our launch phase.