Skip to content
MLP
Model Deployment
hard
Question 2 of 24

An ML team wants to deploy a PyTorch transformer model for real-time inference. The model weighs 8 GB and requires GPU acceleration. During peak hours, the team expects up to 4 concurrent requests. Which compute type configuration is most appropriate?

ACPU_LARGE — 16 GB per concurrency is sufficient for an 8 GB model without GPU.
BGPU_SMALL (1xT4, 16 GB per concurrency) with provisioned concurrency set to 4.
CGPU_MEDIUM (1xA10G, 24 GB per concurrency) with provisioned concurrency set to 4.
DMULTIGPU_MEDIUM (4xA10G, 96 GB per concurrency) — the model requires multiple GPUs.

Educational Content — CertQnA practice questions are written against official exam objectives, covering the same domains tested on the real exam. All content is original and independent — not actual exam questions, not affiliated with any certification vendor. Learn more about our content policy

Discussion

Be the first to share your understanding of this concept

⚠️ Discussion is for concept clarification only. Do not share or request actual exam questions or answers.

Sign in to join the discussion