Productionizing Data Pipelines Questions
Practice questions for Productionizing Data Pipelines topic in Databricks Certified Data Engineer Associate. 36 questions covering this domain.
A data engineer is designing a Lakeflow Job with three tasks: Task A ingests data, Task B transforms it, and Task C validates the output. Tasks B and ...
A data engineer wants to programmatically create and deploy Lakeflow Jobs as part of a CI/CD pipeline using infrastructure-as-code. Which Databricks t...
A data engineering team stores their job notebooks in a Git repository. They want to ensure that each Lakeflow Job run uses a specific tagged version ...
What does a Lakeflow Jobs trigger define?
A data engineer needs to pass a runtime parameter `run_date` to all tasks in a Lakeflow Job so each task can filter data for the correct date. Which L...
A data engineer wants to receive an email alert whenever a Lakeflow Job run fails. Which Lakeflow Jobs feature should they configure?
A data engineer is running a Lakeflow Job with 5 tasks. Task 3 fails intermittently due to transient network errors. The engineer wants to automatical...
In Lakeflow Jobs, what is a task?
A Lakeflow Job runs daily and processes data from the previous day. The job has been running successfully for months. After a code change deployed on ...
A data engineer's production Lakeflow Job fails at Task C during an overnight run. Tasks A and B completed successfully. After fixing the bug in Task ...
A data engineer wants to programmatically monitor a Lakeflow Job run and check its completion status from outside Databricks (e.g., from a CI/CD pipel...
A data engineer's Lakeflow Job fails intermittently during Task C with a transient `java.lang.OutOfMemoryError`. The task processes a large dataset an...
A data engineer wants to use Databricks Asset Bundles to manage a production Lakeflow Job as code. After setting up the bundle YAML, which CLI command...
A data engineer wants to pass a dynamic date parameter to all tasks in a Lakeflow Job so the job processes only data for the previous day. The enginee...
A Lakeflow Job runs a multi-task ETL pipeline nightly. After a recent deployment, Task D (which runs a complex transformation notebook) starts failing...
In Lakeflow Jobs, what does setting a task's `depends_on` property do?
A Lakeflow Job has five tasks (A, B, C, D, E). Tasks B and C both depend on Task A. Task D depends on both B and C. Task E depends on D. Task B fails....
A data engineer creates a Lakeflow Job with a cron trigger set to `0 0 * * *`. What does this cron expression mean?
Which Lakeflow Jobs trigger starts a run when new files appear in a monitored Unity Catalog storage location?
Several tasks in one job share the same jobs compute resource. Which behavior should the engineer keep in mind?
Sign in to see all 36 questions
Create a free account to browse all questions — completely free during our launch phase.