Databricks Certified Data Engineer Professional Questions and Answers
200 questions organized by topic with detailed explanations
Databricks
DEP
200 questions
10 topics
Updated May 2026Developing Code for Data Processing using Python and SQL
44 questions11 easy21 medium12 hard~22% of exam
When executing a `MERGE INTO` statement in Delta Lake, what happens if the source dataset contains multiple rows that ma...What is the primary performance advantage of using a pandas UDF (vectorized UDF) over a standard Python UDF in PySpark?What is the primary purpose of the `MERGE INTO` statement in Delta Lake?
Data Ingestion & Acquisition
14 questions4 easy6 medium4 hard~7% of exam
A data engineer ingests JSON files using Auto Loader with a fixed target schema. Incoming files occasionally contain une...What is the primary purpose of the `cloudFiles` source format in Databricks Auto Loader?A data engineer configures Auto Loader with `cloudFiles.schemaEvolutionMode` set to `addNewColumns`. A new field `discou...
Data Transformation, Cleansing, and Quality
20 questions4 easy11 medium5 hard~10% of exam
Which of the following is a valid constraint expression for a Lakeflow SDP pipeline expectation?In a Lakeflow Spark Declarative Pipeline, which decorator retains invalid records in the target dataset while logging th...A data engineer enables Delta Change Data Feed on a table and reads the feed using `spark.read.format(\delta\).option(\r...
Data Sharing and Federation
10 questions2 easy5 medium3 hard~5% of exam
A company has an operational PostgreSQL database and wants Databricks users to run SQL queries against it without migrat...A data team wants to share a Unity Catalog-managed notebook with a partner organization that also uses Databricks. Which...What is a defining characteristic of the Delta Sharing open protocol?
Monitoring and Alerting
20 questions4 easy11 medium5 hard~10% of exam
A data engineer wants to monitor the number of records dropped by a `@dp.expect_or_drop` expectation in a Lakeflow SDP p...A data engineer streams incremental data from `system.billing.usage` into a Delta table for near-real-time cost monitori...A FinOps team wants to build a dashboard showing daily Databricks compute spend broken down by workspace and SKU, coveri...
Cost & Performance Optimisation
26 questions7 easy12 medium7 hard~13% of exam
Which statement best describes Databricks Predictive Optimization for Unity Catalog managed Delta tables?What is the primary purpose of running the `OPTIMIZE` command on a Delta table?A data engineering team switches their Delta Lake ETL workloads from standard Databricks Runtime to a Photon-enabled run...
Ensuring Data Security and Compliance
20 questions6 easy10 medium4 hard~10% of exam
A company has a single large orders table in Unity Catalog shared across regional business units. Each team should only ...Why are service principals preferred over personal access tokens (PATs) from a user account for automated production Dat...A data engineer creates a column mask for the `salary` column on an HR table. The mask should return `NULL` for all user...
Data Governance
14 questions4 easy6 medium4 hard~7% of exam
A user needs to query the table `main.sales.orders` in Unity Catalog. An admin has granted `SELECT` on `main.sales.order...Where does Unity Catalog automatically store column-level data lineage records that track read and write events on colum...When a Unity Catalog managed table is dropped using `DROP TABLE`, what happens to the underlying data files?
Debugging and Deploying
20 questions4 easy9 medium7 hard~10% of exam
Data Modelling
12 questions5 easy6 medium1 hard~6% of exam
In the medallion architecture, what is the primary purpose of the silver layer?A data engineer needs to implement a Slowly Changing Dimension Type 2 (SCD Type 2) for a customer dimension table. Which...In the context of the Databricks Lakehouse medallion architecture, which layer is most appropriate for storing raw, unmo...
All Questions
| # | Question | Topic | Difficulty |
|---|---|---|---|
| 1 | What is the required root configuration file for a Databricks Asset Bundle? | Debugging and Deploying | easy |
| 2 | When executing a `MERGE INTO` statement in Delta Lake, what happens if the source dataset contains m... | Developing Code for Data Processing using Python and SQL | hard |
| 3 | What is the primary performance advantage of using a pandas UDF (vectorized UDF) over a standard Pyt... | Developing Code for Data Processing using Python and SQL | medium |
| 4 | Which of the following resource types can be defined and deployed using a Databricks Asset Bundle? | Debugging and Deploying | medium |
| 5 | A user needs to query the table `main.sales.orders` in Unity Catalog. An admin has granted `SELECT` ... | Data Governance | medium |
Sign in to see all 200 questions
Create a free account to browse all questions — completely free during our launch phase.
Ready to test your knowledge?
Take a full Databricks Certified Data Engineer Professional practice test with timed exam simulation.
Start Practice Test