Model Development Questions

Practice questions for Model Development topic in Databricks Certified Machine Learning Professional. 88 questions covering this domain.

88 questions25 easy41 medium22 hard

easy

What PySpark data type should be used to store dense embedding vectors in a Databricks Feature Store feature table?

easy

A data scientist wants to deploy a custom Python class as a model using MLflow pyfunc. Which method must the class implement?

medium

A machine learning engineer needs to include a custom helper module `preprocessing_utils.py` when logging a scikit-learn model to MLflow for serving. ...

easy

What is required when logging a model to be registered in Unity Catalog using `mlflow.sklearn.log_model()`?

easy

After `CrossValidator` identifies the best hyperparameter combination, what does it do with the entire training dataset?

medium

A data scientist adds parallelism to a `CrossValidator` by setting `parallelism=4`. What is the effect of this setting?

medium

A data scientist uses `mlflow.evaluate()` to evaluate a trained classifier. Which `model_type` argument value should they use?

easy

Which SparkML class is used to construct a grid of hyperparameter combinations for use with `CrossValidator` or `TrainValidationSplit`?

medium

What is the purpose of on-demand feature computation in Databricks Feature Store?

Q10

hard

A SparkML pipeline has stages: `Tokenizer`, `HashingTF`, and `LogisticRegression`. An engineer wraps this pipeline in a `CrossValidator` with a `Param...

Q11

hard

An ML engineer notices that a SparkML `CrossValidator` returns a `CrossValidatorModel`. They need to inspect which parameter combination produced the ...

Q12

hard

A data scientist trains a model using Databricks Feature Store and registers it to Unity Catalog. At inference time using Mosaic AI Model Serving, the...

Q13

medium

A SparkML `CrossValidator` is configured with 3 folds and a `ParamGridBuilder` that specifies 4 values for `maxDepth` and 3 values for `minInstancesPe...

Q14

easy

In SparkML, what is the primary difference between a Transformer and an Estimator?

Q15

medium

When training a model using Databricks Feature Store with `FeatureEngineeringClient.create_training_set()`, what is the primary benefit over loading f...

Q16

hard

An ML engineer logs a custom model that wraps a proprietary Java-based scoring library. The custom class loads the Java library from a JAR file during...

Q17

medium

A data scientist creates a training dataset using Databricks Feature Store `point_in_time_join` to predict customer churn. What does this join ensure?

Q18

medium

An ML engineer needs to perform distributed batch inference on a 50-million-row Spark DataFrame using an MLflow model registered in Unity Catalog. Whi...

Q19

medium

A data scientist builds a SparkML pipeline that includes a `StringIndexer`, `VectorAssembler`, and `RandomForestClassifier`. They pass this pipeline a...

Q20

hard

A data science team has built a SparkML pipeline with `StringIndexer`, `OneHotEncoder`, `VectorAssembler`, and `GBTClassifier`. They want to log the e...

Sign in to see all 88 questions

Create a free account to browse all questions — completely free during our launch phase.