Model Development Questions
Practice questions for Model Development topic in Databricks Certified Machine Learning Professional. 88 questions covering this domain.
What PySpark data type should be used to store dense embedding vectors in a Databricks Feature Store feature table?
A data scientist wants to deploy a custom Python class as a model using MLflow pyfunc. Which method must the class implement?
A machine learning engineer needs to include a custom helper module `preprocessing_utils.py` when logging a scikit-learn model to MLflow for serving. ...
What is required when logging a model to be registered in Unity Catalog using `mlflow.sklearn.log_model()`?
After `CrossValidator` identifies the best hyperparameter combination, what does it do with the entire training dataset?
A data scientist adds parallelism to a `CrossValidator` by setting `parallelism=4`. What is the effect of this setting?
A data scientist uses `mlflow.evaluate()` to evaluate a trained classifier. Which `model_type` argument value should they use?
Which SparkML class is used to construct a grid of hyperparameter combinations for use with `CrossValidator` or `TrainValidationSplit`?
What is the purpose of on-demand feature computation in Databricks Feature Store?
A SparkML pipeline has stages: `Tokenizer`, `HashingTF`, and `LogisticRegression`. An engineer wraps this pipeline in a `CrossValidator` with a `Param...
An ML engineer notices that a SparkML `CrossValidator` returns a `CrossValidatorModel`. They need to inspect which parameter combination produced the ...
A data scientist trains a model using Databricks Feature Store and registers it to Unity Catalog. At inference time using Mosaic AI Model Serving, the...
A SparkML `CrossValidator` is configured with 3 folds and a `ParamGridBuilder` that specifies 4 values for `maxDepth` and 3 values for `minInstancesPe...
In SparkML, what is the primary difference between a Transformer and an Estimator?
When training a model using Databricks Feature Store with `FeatureEngineeringClient.create_training_set()`, what is the primary benefit over loading f...
An ML engineer logs a custom model that wraps a proprietary Java-based scoring library. The custom class loads the Java library from a JAR file during...
A data scientist creates a training dataset using Databricks Feature Store `point_in_time_join` to predict customer churn. What does this join ensure?
An ML engineer needs to perform distributed batch inference on a 50-million-row Spark DataFrame using an MLflow model registered in Unity Catalog. Whi...
A data scientist builds a SparkML pipeline that includes a `StringIndexer`, `VectorAssembler`, and `RandomForestClassifier`. They pass this pipeline a...
A data science team has built a SparkML pipeline with `StringIndexer`, `OneHotEncoder`, `VectorAssembler`, and `GBTClassifier`. They want to log the e...
Sign in to see all 88 questions
Create a free account to browse all questions — completely free during our launch phase.