A data engineer needs to perform a one-time batch load of Parquet files from an S3 bucket into an existing Delta table. Previously loaded files must not be re-ingested on future runs of the same command. Which approach is most appropriate?

Question

Accepted Answer

B. Using the `COPY INTO` SQL command. `COPY INTO` is an idempotent SQL command designed for batch file ingestion into Delta tables. It tracks which files have already been loaded and, by default, skips previously ingested files on re-runs. This makes it the recommended approach for one-time or repeated batch ingestion with built-in deduplication.

A data engineer needs to perform a one-time batch load of Parquet files from an S3 bucket into an existing Delta table. Previously loaded files must not be re-ingested on future runs of the same command. Which approach is most appropriate?

Related Questions

Discussion