Skip to content

ML Workflows Questions

Practice questions for ML Workflows topic in Databricks Certified Machine Learning Associate. 38 questions covering this domain.

38 questions9 easy19 medium10 hard
Q1
hard

A data scientist uses `hp.loguniform("learning_rate", np.log(1e-5), np.log(1e-1))` in a Hyperopt search space. What is the practical implication of us...

Q2
easy

What does the `cross_val_score()` function from scikit-learn return when used for model evaluation?

Q3
medium

A data scientist uses Hyperopt with SparkTrials for distributed hyperparameter tuning. Which code pattern correctly implements distributed tuning?

Q4
medium

A data scientist computes the following confusion matrix on the test set: True Positives=80, False Positives=20, False Negatives=30, True Negatives=12...

Q5
medium

A data scientist is evaluating a regression model and finds a high R² value but also a high RMSE. What is the most likely explanation?

Q6
medium

When creating a training dataset using the Databricks Feature Store, a data scientist wants to ensure that features reflect only the values available ...

Q7
medium

A data scientist needs to search all runs in a specific experiment and filter only those with a validation accuracy greater than 0.90, then retrieve t...

Q8
hard

A data scientist runs 50 Hyperopt trials in an MLflow experiment. They want to programmatically select the run with the highest `val_f1` and retrieve ...

Q9
easy

In the MLflow tracking UI, what is the easiest way to identify which training run produced the model with the lowest validation RMSE across an experim...

Q10
medium

A data scientist needs to programmatically retrieve a completed MLflow run's metric value using the run ID. Which code correctly does this?

Q11
easy

In scikit-learn, which method on a fitted `StandardScaler` applies the learned mean and standard deviation to new data without refitting?

Q12
medium

A data scientist evaluates a multiclass classifier on 3 classes and wants a single performance metric that weights each class by the number of support...

Q13
hard

A data scientist is debugging a Hyperopt run where all trials return the same loss value regardless of the hyperparameter configuration. What is the m...

Q14
easy

In scikit-learn, which class is used to chain preprocessing steps and a final estimator into a single object that can be trained and used for predicti...

Q15
medium

A data scientist compares two scikit-learn models using 5-fold cross-validation. Model A has mean_accuracy=0.88, std=0.01. Model B has mean_accuracy=0...

Q16
medium

A data scientist uses Hyperopt with the TPE algorithm to search over the following search space. Which statement correctly describes the behavior of `...

Q17
easy

A data scientist uses `train_test_split()` from scikit-learn to split data into training and test sets. To ensure the same split is produced every tim...

Q18
hard

A data scientist builds a preprocessing and model pipeline in scikit-learn and wants to ensure it is logged correctly to MLflow so that preprocessing ...

Q19
medium

A data scientist wants to evaluate a scikit-learn model with stratified k-fold cross-validation (k=5) to ensure class proportions are preserved in eac...

Q20
hard

A data scientist implements a Hyperopt objective function for XGBoost and uses `SparkTrials(parallelism=8)`. Their cluster has 16 worker cores. After ...

Sign in to see all 38 questions

Create a free account to browse all questions — completely free during our launch phase.