Operational Efficiency and Optimization for GenAI Applications Questions
Practice questions for Operational Efficiency and Optimization for GenAI Applications topic in AWS Certified Generative AI Developer - Professional. 24 questions covering this domain.
A customer facing application is sensitive to latency spikes during busy periods and wants better output token latency for eligible models without red...
A FinOps team wants to monitor how many input and output tokens an application is consuming in Amazon Bedrock. Which AWS service is used for those met...
A team fine tuned several prototype models and later noticed recurring monthly charges even when those models were not actively serving users. Which c...
A marketing team can tolerate delayed processing for a large generation workload and wants a service tier built for throughput flexible batch style us...
A retailer expects steady high volume Bedrock usage and wants committed capacity with more predictable cost planning. Which option is the best fit?
A company has a large offline content generation job and wants a lower cost option than on demand inference for supported Bedrock models. Which Bedroc...
A team has bursty Bedrock usage but contractual obligation for guaranteed peak capacity. Which combination is MOST appropriate?
Which Bedrock pricing/usage metric is most directly affected by long system prompts repeated on every request?
Which AWS service is the primary place to view Bedrock token-usage metrics (input/output tokens) and configure alarms on usage spikes?
An application's per-request latency is dominated by long retrieved context. Which optimization MOST directly reduces output time-to-first-token witho...
A company wants to reduce per-call cost for a chat assistant whose first ~3,000 tokens of prompt context are nearly identical across users (long syste...
Which Amazon Bedrock service tier is designed for committed, predictable inference capacity at a discounted hourly cost for steady-state workloads?
A financial services firm runs a Bedrock-based analyst assistant with highly variable load: very high during earnings season and very low otherwise. T...
A company has a RAG application with a 4,000-token system prompt repeated on every call. Which Bedrock optimization specifically addresses the cost of...
Which Amazon Bedrock inference parameter reduces the probability of the model repeating the same phrases by penalizing tokens that have already appear...
A team discovers that 60% of API calls to Bedrock are throttled during business hours. Which two actions best address sustained throughput needs?
Which Amazon Bedrock feature enables a developer to view a breakdown of per-model invocation counts, latency percentiles, and error rates in a managed...
A startup uses Bedrock on-demand for a new application. They want the lowest possible cost per request for an asynchronous data enrichment pipeline th...
A team wants to allocate Bedrock API costs to different business units by tagging all inference calls with a cost center tag. Which AWS mechanism enab...
A company's Bedrock application costs are growing faster than usage. Cost Explorer shows 80% of spend is on input tokens. The team uses a static 2,000...
Sign in to see all 24 questions
Create a free account to browse all questions — completely free during our launch phase.