Development and Ingestion Questions

Practice questions for Development and Ingestion topic in Databricks Certified Data Engineer Associate. 60 questions covering this domain.

60 questions15 easy31 medium14 hard

hard

A data engineer is troubleshooting a Structured Streaming pipeline that uses Auto Loader. The pipeline ran successfully for several months but suddenl...

medium

A data engineer needs to extract data from a source table only where the `updated_at` timestamp is after the last pipeline run timestamp. Which approa...

easy

In the Databricks Lakehouse architecture, what is the recommended pattern for organizing data through successive stages of quality, from raw ingested ...

medium

A data engineer wants to ingest a large number of existing Parquet files from an S3 bucket into a Delta table in a one-time batch operation. After ing...

easy

What is the purpose of the checkpoint location in a Structured Streaming pipeline that uses Auto Loader?

medium

An Auto Loader pipeline has been ingesting CSV files for six months. This week, the source team adds two new columns to all CSV files. The data engine...

medium

A data engineer writes the following PySpark code to read CSV files from a Unity Catalog volume:\n\n```python\ndf = (spark.read\n .format(\csv\)\n ....

medium

A data engineer configures Auto Loader to ingest JSON files from an Azure Data Lake Storage Gen2 path. The engineer wants Auto Loader to automatically...

hard

A data engineering team uses Auto Loader to ingest JSON files. After several weeks, they notice that new files containing an additional field `new_col...

Q10

medium

A data engineer configures an Auto Loader stream with the option `.option(\cloudFiles.schemaLocation\, \/checkpoints/json_schema\)`. What is the prima...

Q11

easy

Which Databricks feature provides a Structured Streaming source called `cloudFiles` and incrementally processes new files as they arrive in cloud stor...

Q12

hard

A company streams millions of small JSON files per hour from IoT devices into S3. A data engineer needs to ingest these files into a Delta table with ...

Q13

easy

Which file formats does Auto Loader natively support for ingestion? (Choose the best answer)

Q14

medium

A data engineer uses `COPY INTO` to load Parquet files from an S3 path into a Delta table. The first run completes successfully. The engineer then re-...

Q15

medium

A data engineer needs to read data incrementally from a Delta table using Structured Streaming. Which Spark read format should they use?

Q16

easy

A data engineer needs to read a multiline JSON file where each record spans multiple lines (the entire file is one JSON object per file, not one JSON ...

Q17

easy

What does the `trigger(availableNow=True)` option do when set on a Structured Streaming writeStream in Databricks?

Q18

medium

A data engineer writes the following PySpark code to configure an Auto Loader stream:\n```python\ndf = (spark.readStream\n .format(\cloudFiles\)\n ....

Q19

medium

A data engineer uses Lakeflow Connect to ingest data from a SaaS application into Databricks. Which statement correctly describes the output of a Lake...

Q20

medium

A data engineer needs to load a local CSV file from their laptop into a Delta table in Databricks. The file is 5 MB. What is the recommended way to ma...

Sign in to see all 60 questions

Create a free account to browse all questions — completely free during our launch phase.