Skip to content
DEA
Data Processing & Transformations
hard
Question 5 of 62

A data engineer runs the following PySpark code but observes that the resulting Delta table has only 1 file:\n\n
\ndf.coalesce(1).write.format(\delta\).mode(\overwrite\).saveAsTable(\catalog.schema.output\)\n
\n\nAfter several weeks, queries against this table become slower than expected as data volume grows. What is the most likely cause and the recommended fix?

AThe Delta table is missing column statistics; run ANALYZE TABLE to refresh them
Bcoalesce(1) causes all data to be written to a single file, creating a large file that is slow to read; remove coalesce(1) and let Delta Lake manage the file layout, or run OPTIMIZE afterward
CThe table should use Parquet format instead of Delta for better read performance
DIncrease spark.sql.shuffle.partitions to improve parallelism

Educational Content — CertQnA practice questions are written against official exam objectives, covering the same domains tested on the real exam. All content is original and independent — not actual exam questions, not affiliated with any certification vendor. Learn more about our content policy

Discussion

Be the first to share your understanding of this concept

⚠️ Discussion is for concept clarification only. Do not share or request actual exam questions or answers.

Sign in to join the discussion