Data Ingestion and Transformation Questions
Practice questions for Data Ingestion and Transformation topic in AWS Certified Data Engineer - Associate. 66 questions covering this domain.
A pipeline must invoke a target every night at midnight, and future requirements might include one-time invocations and retry controls. Which AWS serv...
A scheduled AWS Glue ETL job reads data from Amazon S3 every hour. The team wants each run to process only new data since the last successful run. Whi...
A company needs to orchestrate a multi-step data pipeline with branching, retries, and service integrations. Which AWS service is designed for this us...
A team has multiple applications reading the same Amazon Kinesis Data Stream. One reader needs dedicated throughput and low latency without affecting ...
A company needs a visual, debuggable workflow that orchestrates Lambda functions and AWS Glue jobs for a data pipeline with retries and branching. Whi...
An AWS Glue crawler has both custom classifiers and built-in classifiers available. How does it decide which schema to use?
A platform team wants scheduled invocations with cron expressions, rate expressions, one-time execution support, retry limits, and retention for faile...
A team disables job bookmarks for an AWS Glue job and reruns the job on the same source data. What behavior should they expect?
A downstream processor receives Amazon S3 event notifications for object creation. The team notices rare duplicate notifications. What is the best app...
A Lambda function writes to a database that cannot handle unbounded parallel writes. The team needs an upper limit on the function's scale and wants t...
A Kinesis-backed Lambda pipeline has a high IteratorAge during peak traffic. The team wants more concurrent processing per shard but must preserve ord...
A data engineer wants a managed component that connects to a data store, infers schema, and writes table metadata into the AWS Glue Data Catalog. What...
A Lambda function that reads from Kinesis is being invoked with very small batches. The team wants Lambda to wait briefly to collect more records befo...
A Kinesis consumer is using a standard iterator and is competing with other applications for shard throughput. The team must eliminate that contention...
A bucket must publish notifications when new objects arrive. Which information must the Amazon S3 notification configuration specify?
A Lambda function processing one Kinesis shard occasionally fails on a batch. The team wants to reduce the blast radius of those failures on overall t...
An AWS Lambda function processes records from an Amazon Kinesis Data Stream. The team is worried that some records might be delivered more than once. ...
Which Glue construct extends Spark DataFrames with schema-aware transforms (ApplyMapping, ResolveChoice, DropFields, Relationalize) for ETL on semi-st...
An EMR job reads thousands of small files from Amazon S3, causing slow Spark performance. Which optimization aligns with AWS guidance?
Which AWS Glue feature provides a visual, drag-and-drop interface to author ETL jobs that generate Apache Spark code?
Sign in to see all 66 questions
Create a free account to browse all questions — completely free during our launch phase.