Skip to content

Prepare and process data Questions

Practice questions for Prepare and process data topic in Microsoft Certified: Azure Databricks Data Engineer Associate. 69 questions covering this domain.

69 questions16 easy36 medium17 hard
Q1
medium

A team wants to update or delete target rows that have no matching source row, but only for a recent time window so the whole table is not rewritten. ...

Q2
easy

How does Auto Loader track which files have already been ingested?

Q3
medium

A MERGE operation fails because two rows in the source match the same target row and both attempt to update it. What should the engineer do first?

Q4
easy

Which change data feed metadata column identifies whether a row is an insert, update preimage, update postimage, or delete?

Q5
easy

A pipeline repeatedly runs COPY INTO against the same source folder. What happens to files that were already loaded successfully?

Q6
hard

A streaming pipeline ingests log records that can contain duplicates. The team wants continuous deduplication before downstream consumers read the fin...

Q7
medium

A team uses Auto Loader inside Lakeflow Spark Declarative Pipelines for production ingestion. Which operational detail is handled automatically by the...

Q8
hard

Why does Databricks strongly recommend keeping VACUUM retention at least seven days unless you are certain no long-running operations exceed the short...

Q9
medium

Files can arrive out of order when Auto Loader is used directly with Structured Streaming. Which pattern aligns with Databricks guidance for upsert pi...

Q10
easy

Which Structured Streaming source does Auto Loader provide for incremental file ingestion from cloud storage?

Q11
medium

A compliance team needs a permanent audit trail of all row-level changes from a Delta table. What should they do?

Q12
medium

A CDF pipeline checkpoint is corrupted, but the downstream table has already processed source changes through version 75. To restart from version 76, ...

Q13
hard

A materialized view should refresh incrementally from Delta source tables. What must be enabled on those source tables?

Q14
medium

A new Structured Streaming job starts reading a Delta table with readChangeFeed set to true and no starting version specified. What does the first run...

Q15
medium

Which SQL function should be used to read change data feed for a specific version range in a batch query?

Q16
easy

What is the key difference between a standard view and a materialized view in Databricks SQL?

Q17
hard

A job orchestrates several materialized view refreshes and wants each refresh command to return immediately so multiple operations can start in parall...

Q18
medium

A pipeline must ingest JSON files from an ADLS Gen2 path with evolving schema, exactly-once semantics, and incremental discovery. Which Auto Loader op...

Q19
medium

Which Delta capability tracks row-level inserts/updates/deletes for downstream incremental consumers?

Q20
hard

A team uses Lakeflow Spark Declarative Pipelines (formerly DLT) and wants to track late-arriving SCD2 history on a target table. Which APPLY CHANGES I...

Sign in to see all 69 questions

Create a free account to browse all questions — completely free during our launch phase.