A PySpark job joins a large transactions table (200 GB) with a small currency rates lookup table (10 MB). The job experiences slow performance due to shuffle overhead. Which optimization directly addresses this problem?
broadcast() hint on the small currency rates DataFramespark.sql.shuffle.partitions from 200 to 2,000.cache() before the joinMore Cost & Performance Optimisation Questions
26 questions
Full Databricks Certified Data Engineer Professional Practice Test
All topics covered
All Databricks Certified Data Engineer Professional Questions
Browse by topic
Related Questions
What is the primary purpose of running the `OPTIMIZE` command on a Delta table?...
Which statement best describes Databricks Predictive Optimization for Unity Catalog managed Delta ta...
A data engineer creates a new Delta table for event analytics. Queries will filter on different comb...
A data engineer changes the liquid clustering keys on an existing table from `(created_date)` to `(c...
A data engineering team switches their Delta Lake ETL workloads from standard Databricks Runtime to ...
Educational Content — CertQnA practice questions are written against official exam objectives, covering the same domains tested on the real exam. All content is original and independent — not actual exam questions, not affiliated with any certification vendor. Learn more about our content policy
Discussion
Be the first to share your understanding of this concept
Sign in to join the discussion