Model Development Questions
Practice questions for Model Development topic in Databricks Certified Machine Learning Associate. 62 questions covering this domain.
A data scientist builds a feature preprocessing pipeline using scikit-learn and wants to log it as a single MLflow model so that preprocessing is appl...
A data scientist wants to add a custom metric — the geometric mean of precision and recall — to an `mlflow.evaluate()` call. How should they accomplis...
What is the primary purpose of the test set in a train/validation/test split?
Which argument in `mlflow.evaluate()` specifies the type of ML task being evaluated?
A data scientist trains a binary classifier on a dataset where 95% of samples are class 0 and 5% are class 1. The model achieves 95% accuracy. What is...
A data scientist discovers that their training dataset has 15% missing values in a numeric feature. Which statement correctly describes when mean vers...
A data scientist logs the following training code. What gets registered in the Workspace Model Registry?
A data scientist trains a gradient boosting model. Training loss is 0.05 and validation loss is 0.42. After increasing estimators from 100 to 500, tra...
A data scientist trains a Random Forest classifier on a highly imbalanced dataset (90% class 0, 10% class 1) and achieves AUC-ROC of 0.62. They want t...
A data scientist wants to log a custom Python model class that is not natively supported by MLflow flavors. Which MLflow logging method should they us...
A data scientist wants to perform batch inference on a large Spark DataFrame using an MLflow model. Which function creates a Spark UDF from the regist...
Which function loads a scikit-learn model from an MLflow artifact store using the native scikit-learn flavor?
A data scientist applies one-hot encoding to a categorical feature with 100 unique values. What is the primary concern?
A data science team evaluates two models for fraud detection. Model A has precision=0.92 and recall=0.60. Model B has precision=0.65 and recall=0.95. ...
Which function correctly loads a registered MLflow model for batch scoring using the `models:/` URI scheme in a flavor-agnostic way?
A data scientist wants to log a custom `PythonModel` that uses a pre-trained tokenizer alongside a scikit-learn model. The tokenizer is saved as a pic...
A data scientist trains a gradient boosting classifier with 1,000 estimators. Training AUC is 0.99 and test AUC is 0.72. They reduce estimators to 200...
What does RMSE (Root Mean Squared Error) measure in regression model evaluation?
Which scikit-learn class applies different transformations to different subsets of columns in a single fitting step?
A data scientist creates a model using `mlflow.pyfunc.PythonModel` and wants to bundle a custom lookup table (a Python dict) that the `predict()` meth...
Sign in to see all 62 questions
Create a free account to browse all questions — completely free during our launch phase.