An ML team wants to deploy a PyTorch transformer model for real-time inference. The model weighs 8 GB and requires GPU acceleration. During peak hours, the team expects up to 4 concurrent requests. Which compute type configuration is most appropriate?
CPU_LARGE — 16 GB per concurrency is sufficient for an 8 GB model without GPU.GPU_SMALL (1xT4, 16 GB per concurrency) with provisioned concurrency set to 4.GPU_MEDIUM (1xA10G, 24 GB per concurrency) with provisioned concurrency set to 4.MULTIGPU_MEDIUM (4xA10G, 96 GB per concurrency) — the model requires multiple GPUs.More Model Deployment Questions
24 questions
Full Databricks Certified Machine Learning Professional Practice Test
All topics covered
All Databricks Certified Machine Learning Professional Questions
Browse by topic
Related Questions
What is Mosaic AI Model Serving on Databricks?...
Which model types can be deployed using Mosaic AI Model Serving?...
An ML team deploys a real-time model serving endpoint and enables the scale-to-zero feature. A produ...
A data scientist deploys a custom pyfunc model to Mosaic AI Model Serving. During deployment, the en...
An ML engineer updates a custom model serving endpoint to serve a new model version. How does Databr...
Educational Content — CertQnA practice questions are written against official exam objectives, covering the same domains tested on the real exam. All content is original and independent — not actual exam questions, not affiliated with any certification vendor. Learn more about our content policy
Discussion
Be the first to share your understanding of this concept
Sign in to join the discussion