Skip to content

Data Transformation, Cleansing, and Quality Questions

Practice questions for Data Transformation, Cleansing, and Quality topic in Databricks Certified Data Engineer Professional. 20 questions covering this domain.

20 questions4 easy11 medium5 hard
Q1
medium

Which of the following is a valid constraint expression for a Lakeflow SDP pipeline expectation?

Q2
easy

In a Lakeflow Spark Declarative Pipeline, which decorator retains invalid records in the target dataset while logging the violation count as a data qu...

Q3
medium

A data engineer enables Delta Change Data Feed on a table and reads the feed using `spark.read.format(\delta\).option(\readChangeFeed\, \true\)`. Whic...

Q4
medium

A Lakeflow SDP pipeline defines two independent parallel flows: `customer_flow` with an `@dp.expect_or_fail` expectation and `product_flow` without an...

Q5
hard

A data engineer appends data to an existing Delta table using the following code. The DataFrame `df_new` has an extra column `discount_pct` that does ...

Q6
hard

A data engineer is implementing a CDC pipeline for a `customers` table using Delta Change Data Feed and `MERGE INTO`. The source CDC feed provides row...

Q7
easy

In a Lakeflow Spark Declarative Pipeline, what is the key behavioral difference between a streaming table and a materialized view?

Q8
hard

A data engineer runs a pipeline that performs a `MERGE INTO` on a large Delta table to apply daily CDC updates. Over time, the table accumulates many ...

Q9
medium

A Lakeflow SDP pipeline has a dataset `silver_orders` with the expectation `@dp.expect_or_drop("valid_amount", "amount > 0")`. After a pipeline run, t...

Q10
medium

A data engineer defines a Lakeflow SDP pipeline expectation:\n\n```python\n@dp.expect_all_or_fail(\n {"positive_amount": "amount > 0",\n "valid...

Q11
medium

A developer adds two expectations named `valid_amount` to the same dataset definition in a pipeline. What is the documented issue?

Q12
medium

A team wants to add data quality expectations to an `AUTO CDC FROM SNAPSHOT` flow. What should they expect?

Q13
easy

A Lakeflow Spark Declarative Pipeline uses `@dp.expect("valid_price", "price >= 0")`. What happens to rows that fail the condition?

Q14
hard

A pipeline has two parallel flows. One flow uses `@dp.expect_or_fail` and hits a violating record. Which outcome matches the documentation?

Q15
medium

For which pipeline object types does Databricks document expectation metrics support?

Q16
hard

A pipeline has two parallel flows. One flow uses `@dp.expect_or_fail` and encounters a violating record. Which outcome matches Databricks documentatio...

Q17
medium

A pipeline author accidentally gives two expectations on the same dataset the name `valid_amount`. What is the problem?

Q18
medium

A team wants to add expectations to an `AUTO CDC FROM SNAPSHOT` flow. What should they expect?

Q19
medium

A data engineer proposes this expectation clause in a Lakeflow pipeline: `status IN (SELECT valid_status FROM ref.status_lookup)`. Why is this not val...

Q20
easy

A Lakeflow Spark Declarative Pipeline defines `@dp.expect(valid_price, price >= 0)` on a dataset. What happens to rows that fail the condition?

Sign in to see all 20 questions

Create a free account to browse all questions — completely free during our launch phase.