Skip to content
DEP
Cost & Performance Optimisation
medium
Question 6 of 26

A PySpark job joins a large transactions table (200 GB) with a small currency rates lookup table (10 MB). The job experiences slow performance due to shuffle overhead. Which optimization directly addresses this problem?

ARepartition the transactions table to 2,000 partitions before the join
BUse the broadcast() hint on the small currency rates DataFrame
CIncrease spark.sql.shuffle.partitions from 200 to 2,000
DCache the transactions table in memory using .cache() before the join

Educational Content — CertQnA practice questions are written against official exam objectives, covering the same domains tested on the real exam. All content is original and independent — not actual exam questions, not affiliated with any certification vendor. Learn more about our content policy

Discussion

Be the first to share your understanding of this concept

⚠️ Discussion is for concept clarification only. Do not share or request actual exam questions or answers.

Sign in to join the discussion