Skip to content

Model Development Questions

Practice questions for Model Development topic in Databricks Certified Machine Learning Associate. 62 questions covering this domain.

62 questions15 easy28 medium19 hard
Q1
hard

A data scientist builds a feature preprocessing pipeline using scikit-learn and wants to log it as a single MLflow model so that preprocessing is appl...

Q2
hard

A data scientist wants to add a custom metric — the geometric mean of precision and recall — to an `mlflow.evaluate()` call. How should they accomplis...

Q3
easy

What is the primary purpose of the test set in a train/validation/test split?

Q4
medium

Which argument in `mlflow.evaluate()` specifies the type of ML task being evaluated?

Q5
easy

A data scientist trains a binary classifier on a dataset where 95% of samples are class 0 and 5% are class 1. The model achieves 95% accuracy. What is...

Q6
hard

A data scientist discovers that their training dataset has 15% missing values in a numeric feature. Which statement correctly describes when mean vers...

Q7
medium

A data scientist logs the following training code. What gets registered in the Workspace Model Registry?

Q8
hard

A data scientist trains a gradient boosting model. Training loss is 0.05 and validation loss is 0.42. After increasing estimators from 100 to 500, tra...

Q9
hard

A data scientist trains a Random Forest classifier on a highly imbalanced dataset (90% class 0, 10% class 1) and achieves AUC-ROC of 0.62. They want t...

Q10
hard

A data scientist wants to log a custom Python model class that is not natively supported by MLflow flavors. Which MLflow logging method should they us...

Q11
medium

A data scientist wants to perform batch inference on a large Spark DataFrame using an MLflow model. Which function creates a Spark UDF from the regist...

Q12
easy

Which function loads a scikit-learn model from an MLflow artifact store using the native scikit-learn flavor?

Q13
medium

A data scientist applies one-hot encoding to a categorical feature with 100 unique values. What is the primary concern?

Q14
medium

A data science team evaluates two models for fraud detection. Model A has precision=0.92 and recall=0.60. Model B has precision=0.65 and recall=0.95. ...

Q15
medium

Which function correctly loads a registered MLflow model for batch scoring using the `models:/` URI scheme in a flavor-agnostic way?

Q16
hard

A data scientist wants to log a custom `PythonModel` that uses a pre-trained tokenizer alongside a scikit-learn model. The tokenizer is saved as a pic...

Q17
hard

A data scientist trains a gradient boosting classifier with 1,000 estimators. Training AUC is 0.99 and test AUC is 0.72. They reduce estimators to 200...

Q18
easy

What does RMSE (Root Mean Squared Error) measure in regression model evaluation?

Q19
easy

Which scikit-learn class applies different transformations to different subsets of columns in a single fitting step?

Q20
medium

A data scientist creates a model using `mlflow.pyfunc.PythonModel` and wants to bundle a custom lookup table (a Python dict) that the `predict()` meth...

Sign in to see all 62 questions

Create a free account to browse all questions — completely free during our launch phase.