Data Transformation, Cleansing, and Quality Questions
Practice questions for Data Transformation, Cleansing, and Quality topic in Databricks Certified Data Engineer Professional. 20 questions covering this domain.
Which of the following is a valid constraint expression for a Lakeflow SDP pipeline expectation?
In a Lakeflow Spark Declarative Pipeline, which decorator retains invalid records in the target dataset while logging the violation count as a data qu...
A data engineer enables Delta Change Data Feed on a table and reads the feed using `spark.read.format(\delta\).option(\readChangeFeed\, \true\)`. Whic...
A Lakeflow SDP pipeline defines two independent parallel flows: `customer_flow` with an `@dp.expect_or_fail` expectation and `product_flow` without an...
A data engineer appends data to an existing Delta table using the following code. The DataFrame `df_new` has an extra column `discount_pct` that does ...
A data engineer is implementing a CDC pipeline for a `customers` table using Delta Change Data Feed and `MERGE INTO`. The source CDC feed provides row...
In a Lakeflow Spark Declarative Pipeline, what is the key behavioral difference between a streaming table and a materialized view?
A data engineer runs a pipeline that performs a `MERGE INTO` on a large Delta table to apply daily CDC updates. Over time, the table accumulates many ...
A Lakeflow SDP pipeline has a dataset `silver_orders` with the expectation `@dp.expect_or_drop("valid_amount", "amount > 0")`. After a pipeline run, t...
A data engineer defines a Lakeflow SDP pipeline expectation:\n\n```python\n@dp.expect_all_or_fail(\n {"positive_amount": "amount > 0",\n "valid...
A developer adds two expectations named `valid_amount` to the same dataset definition in a pipeline. What is the documented issue?
A team wants to add data quality expectations to an `AUTO CDC FROM SNAPSHOT` flow. What should they expect?
A Lakeflow Spark Declarative Pipeline uses `@dp.expect("valid_price", "price >= 0")`. What happens to rows that fail the condition?
A pipeline has two parallel flows. One flow uses `@dp.expect_or_fail` and hits a violating record. Which outcome matches the documentation?
For which pipeline object types does Databricks document expectation metrics support?
A pipeline has two parallel flows. One flow uses `@dp.expect_or_fail` and encounters a violating record. Which outcome matches Databricks documentatio...
A pipeline author accidentally gives two expectations on the same dataset the name `valid_amount`. What is the problem?
A team wants to add expectations to an `AUTO CDC FROM SNAPSHOT` flow. What should they expect?
A data engineer proposes this expectation clause in a Lakeflow pipeline: `status IN (SELECT valid_status FROM ref.status_lookup)`. Why is this not val...
A Lakeflow Spark Declarative Pipeline defines `@dp.expect(valid_price, price >= 0)` on a dataset. What happens to rows that fail the condition?
Sign in to see all 20 questions
Create a free account to browse all questions — completely free during our launch phase.