Skip to content

Data Ingestion & Acquisition Questions

Practice questions for Data Ingestion & Acquisition topic in Databricks Certified Data Engineer Professional. 14 questions covering this domain.

14 questions4 easy6 medium4 hard
Q1
hard

A data engineer ingests JSON files using Auto Loader with a fixed target schema. Incoming files occasionally contain unexpected extra fields. The engi...

Q2
easy

What is the primary purpose of the `cloudFiles` source format in Databricks Auto Loader?

Q3
medium

A data engineer configures Auto Loader with `cloudFiles.schemaEvolutionMode` set to `addNewColumns`. A new field `discount_pct` appears in incoming JS...

Q4
medium

A data engineer needs to perform a one-time batch load of Parquet files from an S3 bucket into an existing Delta table. Previously loaded files must n...

Q5
hard

A data engineer needs to ingest data from both S3 and Azure Data Lake Storage Gen2 paths into a single Delta table using a unified pipeline. The S3 pa...

Q6
easy

In Auto Loader, what is the difference between directory listing mode and file notification mode for file discovery?

Q7
medium

A data engineer uses Auto Loader to ingest files from an Azure Data Lake Storage Gen2 path. The source team periodically adds files in large batches —...

Q8
medium

A team switches an Auto Loader job from directory listing mode to file notification mode and asks whether that change will preserve source-file arriva...

Q9
hard

A pipeline uses Auto Loader together with `AUTO CDC` to process deletes that may arrive well after related upserts. Which target-table setting is docu...

Q10
medium

A standalone streaming table is created with `read_files(..., includeExistingFiles => false)`. What data is ingested?

Q11
easy

How does Auto Loader persist the state it uses to avoid reprocessing the same files?

Q12
easy

A production Auto Loader stream must resume after failures without reprocessing files it already handled. Where does Auto Loader persist the file-trac...

Q13
medium

A standalone streaming table uses `read_files` with `includeExistingFiles => false` against a cloud-storage path. Which files are ingested?

Q14
hard

A bronze ingestion pipeline uses Auto Loader together with Lakeflow `AUTO CDC`, and delete records can arrive well after corresponding upserts because...

Sign in to see all 14 questions

Create a free account to browse all questions — completely free during our launch phase.