A data engineer needs to join a large transactions DataFrame with a small lookup DataFrame that contains 500 rows. The join is causing a shuffle that is slowing down the pipeline. What optimization should the engineer apply?
transactions DataFrame to match the number of partitions in lookuplookup DataFrame to all executorsspark.sql.shuffle.partitionsMore Data Processing & Transformations Questions
62 questions
Full Databricks Certified Data Engineer Associate Practice Test
All topics covered
All Databricks Certified Data Engineer Associate Questions
Browse by topic
Related Questions
In Apache Spark, what is the difference between a transformation and an action?...
A data engineer wants to add a new column `discounted_price` to a PySpark DataFrame `df` that equals...
What does the `MERGE INTO` SQL statement do in Delta Lake?...
A data engineer has a PySpark DataFrame `orders` and wants to calculate the total order amount group...
A data engineer writes the following Spark SQL to query a Delta table:\n\n```sql\nSELECT customer_id...
Educational Content — CertQnA practice questions are written against official exam objectives, covering the same domains tested on the real exam. All content is original and independent — not actual exam questions, not affiliated with any certification vendor. Learn more about our content policy
Discussion
Be the first to share your understanding of this concept
Sign in to join the discussion