Operational Efficiency and Optimization for GenAI Applications Questions

Practice questions for Operational Efficiency and Optimization for GenAI Applications topic in AWS Certified Generative AI Developer - Professional. 24 questions covering this domain.

24 questions7 easy13 medium4 hard

hard

A customer facing application is sensitive to latency spikes during busy periods and wants better output token latency for eligible models without red...

easy

A FinOps team wants to monitor how many input and output tokens an application is consuming in Amazon Bedrock. Which AWS service is used for those met...

medium

A team fine tuned several prototype models and later noticed recurring monthly charges even when those models were not actively serving users. Which c...

medium

A marketing team can tolerate delayed processing for a large generation workload and wants a service tier built for throughput flexible batch style us...

medium

A retailer expects steady high volume Bedrock usage and wants committed capacity with more predictable cost planning. Which option is the best fit?

easy

A company has a large offline content generation job and wants a lower cost option than on demand inference for supported Bedrock models. Which Bedroc...

medium

A team has bursty Bedrock usage but contractual obligation for guaranteed peak capacity. Which combination is MOST appropriate?

easy

Which Bedrock pricing/usage metric is most directly affected by long system prompts repeated on every request?

medium

Which AWS service is the primary place to view Bedrock token-usage metrics (input/output tokens) and configure alarms on usage spikes?

Q10

medium

An application's per-request latency is dominated by long retrieved context. Which optimization MOST directly reduces output time-to-first-token witho...

Q11

hard

A company wants to reduce per-call cost for a chat assistant whose first ~3,000 tokens of prompt context are nearly identical across users (long syste...

Q12

easy

Which Amazon Bedrock service tier is designed for committed, predictable inference capacity at a discounted hourly cost for steady-state workloads?

Q13

hard

A financial services firm runs a Bedrock-based analyst assistant with highly variable load: very high during earnings season and very low otherwise. T...

Q14

medium

A company has a RAG application with a 4,000-token system prompt repeated on every call. Which Bedrock optimization specifically addresses the cost of...

Q15

easy

Which Amazon Bedrock inference parameter reduces the probability of the model repeating the same phrases by penalizing tokens that have already appear...

Q16

medium

A team discovers that 60% of API calls to Bedrock are throttled during business hours. Which two actions best address sustained throughput needs?

Q17

easy

Which Amazon Bedrock feature enables a developer to view a breakdown of per-model invocation counts, latency percentiles, and error rates in a managed...

Q18

medium

A startup uses Bedrock on-demand for a new application. They want the lowest possible cost per request for an asynchronous data enrichment pipeline th...

Q19

medium

A team wants to allocate Bedrock API costs to different business units by tagging all inference calls with a cost center tag. Which AWS mechanism enab...

Q20

hard

A company's Bedrock application costs are growing faster than usage. Cost Explorer shows 80% of spend is on input tokens. The team uses a static 2,000...

Sign in to see all 24 questions

Create a free account to browse all questions — completely free during our launch phase.