Serving and scaling models Questions
Practice questions for Serving and scaling models topic in Google Professional Machine Learning Engineer. 40 questions covering this domain.
A company built a model in BigQuery ML and wants to manage it through Vertex AI without exporting the model artifacts first. Which approach is support...
A custom model must support online predictions with low latency. Which architectural condition must be satisfied before the first request can be serve...
After an AutoML model finishes training, what happens with respect to Model Registry?
A mobile application needs low-latency responses to user actions. Which Vertex AI inference option is the best fit?
Which statement about Vertex AI online inference is correct?
A team trained a custom model and stored the artifacts in Cloud Storage. They now want to serve predictions through Vertex AI. What should they do fir...
What is the supported inference pattern for AutoML forecasting models on Vertex AI?
A data platform team has millions of records to score overnight and does not need immediate responses. They also do not want to deploy the model to an...
A release process refers to the production model by alias rather than by explicit version number so deployments can move forward without changing down...
A retailer trained a demand forecast with AutoML forecasting and now wants real-time predictions in a checkout application. What is the best official ...
Which Vertex AI Endpoint feature lets multiple models share the same compute resources behind one endpoint?
A team wants to release a new model gradually by sending 10% of traffic to it and 90% to the current model on the same endpoint. Which Vertex AI capab...
A team needs predictions to use specific GPU accelerators for serving a large neural network with low latency. Where is the accelerator type configure...
Which Vertex AI inference feature provides scalable, fully managed nearest-neighbor lookup over learned vector embeddings?
Which Vertex AI inference resource type is intended for asynchronous predictions over large datasets that are not latency-sensitive?
A team wants to ship a private endpoint where prediction traffic stays inside the VPC and is not exposed to the public internet. Which Vertex AI capab...
A model serving endpoint experiences variable request volume and the team wants Vertex AI to add or remove replicas automatically. Which configuration...
A practitioner needs to support thousands of distinct fine-tuned variants of a base LLM cost-effectively without deploying a separate full model for e...
A team needs to deploy a custom serving container with non-Python dependencies and custom request preprocessing for online inference. Which Vertex AI ...
A model deployed to a Vertex AI Endpoint must scale down to zero when traffic stops to control cost. Which Vertex AI option supports this for online i...
Sign in to see all 40 questions
Create a free account to browse all questions — completely free during our launch phase.