Development and Ingestion Questions
Practice questions for Development and Ingestion topic in Databricks Certified Data Engineer Associate. 60 questions covering this domain.
A data engineer is troubleshooting a Structured Streaming pipeline that uses Auto Loader. The pipeline ran successfully for several months but suddenl...
A data engineer needs to extract data from a source table only where the `updated_at` timestamp is after the last pipeline run timestamp. Which approa...
In the Databricks Lakehouse architecture, what is the recommended pattern for organizing data through successive stages of quality, from raw ingested ...
A data engineer wants to ingest a large number of existing Parquet files from an S3 bucket into a Delta table in a one-time batch operation. After ing...
What is the purpose of the checkpoint location in a Structured Streaming pipeline that uses Auto Loader?
An Auto Loader pipeline has been ingesting CSV files for six months. This week, the source team adds two new columns to all CSV files. The data engine...
A data engineer writes the following PySpark code to read CSV files from a Unity Catalog volume:\n\n```python\ndf = (spark.read\n .format(\csv\)\n ....
A data engineer configures Auto Loader to ingest JSON files from an Azure Data Lake Storage Gen2 path. The engineer wants Auto Loader to automatically...
A data engineering team uses Auto Loader to ingest JSON files. After several weeks, they notice that new files containing an additional field `new_col...
A data engineer configures an Auto Loader stream with the option `.option(\cloudFiles.schemaLocation\, \/checkpoints/json_schema\)`. What is the prima...
Which Databricks feature provides a Structured Streaming source called `cloudFiles` and incrementally processes new files as they arrive in cloud stor...
A company streams millions of small JSON files per hour from IoT devices into S3. A data engineer needs to ingest these files into a Delta table with ...
Which file formats does Auto Loader natively support for ingestion? (Choose the best answer)
A data engineer uses `COPY INTO` to load Parquet files from an S3 path into a Delta table. The first run completes successfully. The engineer then re-...
A data engineer needs to read data incrementally from a Delta table using Structured Streaming. Which Spark read format should they use?
A data engineer needs to read a multiline JSON file where each record spans multiple lines (the entire file is one JSON object per file, not one JSON ...
What does the `trigger(availableNow=True)` option do when set on a Structured Streaming writeStream in Databricks?
A data engineer writes the following PySpark code to configure an Auto Loader stream:\n```python\ndf = (spark.readStream\n .format(\cloudFiles\)\n ....
A data engineer uses Lakeflow Connect to ingest data from a SaaS application into Databricks. Which statement correctly describes the output of a Lake...
A data engineer needs to load a local CSV file from their laptop into a Delta table in Databricks. The file is 5 MB. What is the recommended way to ma...
Sign in to see all 60 questions
Create a free account to browse all questions — completely free during our launch phase.