Data Ingestion & Acquisition Questions
Practice questions for Data Ingestion & Acquisition topic in Databricks Certified Data Engineer Professional. 14 questions covering this domain.
A data engineer ingests JSON files using Auto Loader with a fixed target schema. Incoming files occasionally contain unexpected extra fields. The engi...
What is the primary purpose of the `cloudFiles` source format in Databricks Auto Loader?
A data engineer configures Auto Loader with `cloudFiles.schemaEvolutionMode` set to `addNewColumns`. A new field `discount_pct` appears in incoming JS...
A data engineer needs to perform a one-time batch load of Parquet files from an S3 bucket into an existing Delta table. Previously loaded files must n...
A data engineer needs to ingest data from both S3 and Azure Data Lake Storage Gen2 paths into a single Delta table using a unified pipeline. The S3 pa...
In Auto Loader, what is the difference between directory listing mode and file notification mode for file discovery?
A data engineer uses Auto Loader to ingest files from an Azure Data Lake Storage Gen2 path. The source team periodically adds files in large batches —...
A team switches an Auto Loader job from directory listing mode to file notification mode and asks whether that change will preserve source-file arriva...
A pipeline uses Auto Loader together with `AUTO CDC` to process deletes that may arrive well after related upserts. Which target-table setting is docu...
A standalone streaming table is created with `read_files(..., includeExistingFiles => false)`. What data is ingested?
How does Auto Loader persist the state it uses to avoid reprocessing the same files?
A production Auto Loader stream must resume after failures without reprocessing files it already handled. Where does Auto Loader persist the file-trac...
A standalone streaming table uses `read_files` with `includeExistingFiles => false` against a cloud-storage path. Which files are ingested?
A bronze ingestion pipeline uses Auto Loader together with Lakeflow `AUTO CDC`, and delete records can arrive well after corresponding upserts because...
Sign in to see all 14 questions
Create a free account to browse all questions — completely free during our launch phase.