Prepare and process data Questions
Practice questions for Prepare and process data topic in Microsoft Certified: Azure Databricks Data Engineer Associate. 69 questions covering this domain.
A team wants to update or delete target rows that have no matching source row, but only for a recent time window so the whole table is not rewritten. ...
How does Auto Loader track which files have already been ingested?
A MERGE operation fails because two rows in the source match the same target row and both attempt to update it. What should the engineer do first?
Which change data feed metadata column identifies whether a row is an insert, update preimage, update postimage, or delete?
A pipeline repeatedly runs COPY INTO against the same source folder. What happens to files that were already loaded successfully?
A streaming pipeline ingests log records that can contain duplicates. The team wants continuous deduplication before downstream consumers read the fin...
A team uses Auto Loader inside Lakeflow Spark Declarative Pipelines for production ingestion. Which operational detail is handled automatically by the...
Why does Databricks strongly recommend keeping VACUUM retention at least seven days unless you are certain no long-running operations exceed the short...
Files can arrive out of order when Auto Loader is used directly with Structured Streaming. Which pattern aligns with Databricks guidance for upsert pi...
Which Structured Streaming source does Auto Loader provide for incremental file ingestion from cloud storage?
A compliance team needs a permanent audit trail of all row-level changes from a Delta table. What should they do?
A CDF pipeline checkpoint is corrupted, but the downstream table has already processed source changes through version 75. To restart from version 76, ...
A materialized view should refresh incrementally from Delta source tables. What must be enabled on those source tables?
A new Structured Streaming job starts reading a Delta table with readChangeFeed set to true and no starting version specified. What does the first run...
Which SQL function should be used to read change data feed for a specific version range in a batch query?
What is the key difference between a standard view and a materialized view in Databricks SQL?
A job orchestrates several materialized view refreshes and wants each refresh command to return immediately so multiple operations can start in parall...
A pipeline must ingest JSON files from an ADLS Gen2 path with evolving schema, exactly-once semantics, and incremental discovery. Which Auto Loader op...
Which Delta capability tracks row-level inserts/updates/deletes for downstream incremental consumers?
A team uses Lakeflow Spark Declarative Pipelines (formerly DLT) and wants to track late-arriving SCD2 history on a target table. Which APPLY CHANGES I...
Sign in to see all 69 questions
Create a free account to browse all questions — completely free during our launch phase.