Skip to content
DEA
Data Processing & Transformations
medium
Question 16 of 62

A data engineer needs to join a large transactions DataFrame with a small lookup DataFrame that contains 500 rows. The join is causing a shuffle that is slowing down the pipeline. What optimization should the engineer apply?

ARepartition the transactions DataFrame to match the number of partitions in lookup
BUse a broadcast join hint to broadcast the small lookup DataFrame to all executors
CUse a cross join instead of the original join type
DIncrease the default shuffle parallelism with spark.sql.shuffle.partitions

Educational Content — CertQnA practice questions are written against official exam objectives, covering the same domains tested on the real exam. All content is original and independent — not actual exam questions, not affiliated with any certification vendor. Learn more about our content policy

Discussion

Be the first to share your understanding of this concept

⚠️ Discussion is for concept clarification only. Do not share or request actual exam questions or answers.

Sign in to join the discussion