Skip to content

Serving and scaling models Questions

Practice questions for Serving and scaling models topic in Google Professional Machine Learning Engineer. 40 questions covering this domain.

40 questions8 easy21 medium11 hard
Q1
hard

A company built a model in BigQuery ML and wants to manage it through Vertex AI without exporting the model artifacts first. Which approach is support...

Q2
hard

A custom model must support online predictions with low latency. Which architectural condition must be satisfied before the first request can be serve...

Q3
medium

After an AutoML model finishes training, what happens with respect to Model Registry?

Q4
medium

A mobile application needs low-latency responses to user actions. Which Vertex AI inference option is the best fit?

Q5
easy

Which statement about Vertex AI online inference is correct?

Q6
medium

A team trained a custom model and stored the artifacts in Cloud Storage. They now want to serve predictions through Vertex AI. What should they do fir...

Q7
easy

What is the supported inference pattern for AutoML forecasting models on Vertex AI?

Q8
medium

A data platform team has millions of records to score overnight and does not need immediate responses. They also do not want to deploy the model to an...

Q9
medium

A release process refers to the production model by alias rather than by explicit version number so deployments can move forward without changing down...

Q10
hard

A retailer trained a demand forecast with AutoML forecasting and now wants real-time predictions in a checkout application. What is the best official ...

Q11
easy

Which Vertex AI Endpoint feature lets multiple models share the same compute resources behind one endpoint?

Q12
medium

A team wants to release a new model gradually by sending 10% of traffic to it and 90% to the current model on the same endpoint. Which Vertex AI capab...

Q13
medium

A team needs predictions to use specific GPU accelerators for serving a large neural network with low latency. Where is the accelerator type configure...

Q14
medium

Which Vertex AI inference feature provides scalable, fully managed nearest-neighbor lookup over learned vector embeddings?

Q15
easy

Which Vertex AI inference resource type is intended for asynchronous predictions over large datasets that are not latency-sensitive?

Q16
hard

A team wants to ship a private endpoint where prediction traffic stays inside the VPC and is not exposed to the public internet. Which Vertex AI capab...

Q17
medium

A model serving endpoint experiences variable request volume and the team wants Vertex AI to add or remove replicas automatically. Which configuration...

Q18
hard

A practitioner needs to support thousands of distinct fine-tuned variants of a base LLM cost-effectively without deploying a separate full model for e...

Q19
hard

A team needs to deploy a custom serving container with non-Python dependencies and custom request preprocessing for online inference. Which Vertex AI ...

Q20
medium

A model deployed to a Vertex AI Endpoint must scale down to zero when traffic stops to control cost. Which Vertex AI option supports this for online i...

Sign in to see all 40 questions

Create a free account to browse all questions — completely free during our launch phase.