Skip to content
MLP
Model Deployment
medium
Question 5 of 24

An ML team deploys a real-time model serving endpoint and enables the scale-to-zero feature. A product manager is concerned about latency for the first request after a quiet period. What behavior should the team communicate?

AScale-to-zero maintains warm containers in standby, so the first request has standard latency.
BThe first request after scaling to zero triggers a cold start; scaling up from zero typically takes 10-20 seconds but can sometimes take minutes, with no SLA on scale-from-zero latency.
CScale-to-zero is only available for GPU endpoints, not CPU endpoints.
DAfter scaling to zero, the endpoint becomes permanently unavailable until manually restarted.

Educational Content — CertQnA practice questions are written against official exam objectives, covering the same domains tested on the real exam. All content is original and independent — not actual exam questions, not affiliated with any certification vendor. Learn more about our content policy

Discussion

Be the first to share your understanding of this concept

⚠️ Discussion is for concept clarification only. Do not share or request actual exam questions or answers.

Sign in to join the discussion