Data Ingestion and Transformation Questions

Practice questions for Data Ingestion and Transformation topic in AWS Certified Data Engineer - Associate. 66 questions covering this domain.

66 questions18 easy31 medium17 hard

easy

A pipeline must invoke a target every night at midnight, and future requirements might include one-time invocations and retry controls. Which AWS serv...

medium

A scheduled AWS Glue ETL job reads data from Amazon S3 every hour. The team wants each run to process only new data since the last successful run. Whi...

easy

A company needs to orchestrate a multi-step data pipeline with branching, retries, and service integrations. Which AWS service is designed for this us...

easy

A team has multiple applications reading the same Amazon Kinesis Data Stream. One reader needs dedicated throughput and low latency without affecting ...

hard

A company needs a visual, debuggable workflow that orchestrates Lambda functions and AWS Glue jobs for a data pipeline with retries and branching. Whi...

medium

An AWS Glue crawler has both custom classifiers and built-in classifiers available. How does it decide which schema to use?

medium

A platform team wants scheduled invocations with cron expressions, rate expressions, one-time execution support, retry limits, and retention for faile...

medium

A team disables job bookmarks for an AWS Glue job and reruns the job on the same source data. What behavior should they expect?

hard

A downstream processor receives Amazon S3 event notifications for object creation. The team notices rare duplicate notifications. What is the best app...

Q10

medium

A Lambda function writes to a database that cannot handle unbounded parallel writes. The team needs an upper limit on the function's scale and wants t...

Q11

hard

A Kinesis-backed Lambda pipeline has a high IteratorAge during peak traffic. The team wants more concurrent processing per shard but must preserve ord...

Q12

easy

A data engineer wants a managed component that connects to a data store, infers schema, and writes table metadata into the AWS Glue Data Catalog. What...

Q13

medium

A Lambda function that reads from Kinesis is being invoked with very small batches. The team wants Lambda to wait briefly to collect more records befo...

Q14

hard

A Kinesis consumer is using a standard iterator and is competing with other applications for shard throughput. The team must eliminate that contention...

Q15

medium

A bucket must publish notifications when new objects arrive. Which information must the Amazon S3 notification configuration specify?

Q16

hard

A Lambda function processing one Kinesis shard occasionally fails on a batch. The team wants to reduce the blast radius of those failures on overall t...

Q17

medium

An AWS Lambda function processes records from an Amazon Kinesis Data Stream. The team is worried that some records might be delivered more than once. ...

Q18

easy

Which Glue construct extends Spark DataFrames with schema-aware transforms (ApplyMapping, ResolveChoice, DropFields, Relationalize) for ETL on semi-st...

Q19

hard

An EMR job reads thousands of small files from Amazon S3, causing slow Spark performance. Which optimization aligns with AWS guidance?

Q20

easy

Which AWS Glue feature provides a visual, drag-and-drop interface to author ETL jobs that generate Apache Spark code?

Sign in to see all 66 questions

Create a free account to browse all questions — completely free during our launch phase.