A data engineer writes the following PySpark code to find the top 3 products by total revenue per category. Which statement best describes the result?
from pyspark.sql import functions as F
from pyspark.sql.window import Window
windowSpec = Window.partitionBy("category").orderBy(F.desc("total_revenue"))
result = (
df.groupBy("category", "product_id")
.agg(F.sum("revenue").alias("total_revenue"))
.withColumn("rank", F.rank().over(windowSpec))
.filter(F.col("rank") <= 3)
)
from pyspark.sql import functions as F
from pyspark.sql.window import Window
windowSpec = Window.partitionBy("category").orderBy(F.desc("total_revenue"))
result = (
df.groupBy("category", "product_id")
.agg(F.sum("revenue").alias("total_revenue"))
.withColumn("rank", F.rank().over(windowSpec))
.filter(F.col("rank") <= 3)
)
groupBy()rank() requires ROW_NUMBER() to be calculated firstMore Developing Code for Data Processing using Python and SQL Questions
44 questions
Full Databricks Certified Data Engineer Professional Practice Test
All topics covered
All Databricks Certified Data Engineer Professional Questions
Browse by topic
Related Questions
What does the `filter()` transformation in PySpark return?...
Which SQL syntax correctly queries a Delta table named `sales` at version 5?...
Which Structured Streaming output mode emits only new rows that were appended to the result table si...
A data engineer runs the following SQL query on an orders table. What does it return? ```sql SEL...
What is the primary purpose of the `MERGE INTO` statement in Delta Lake?...
Educational Content — CertQnA practice questions are written against official exam objectives, covering the same domains tested on the real exam. All content is original and independent — not actual exam questions, not affiliated with any certification vendor. Learn more about our content policy
Discussion
Be the first to share your understanding of this concept
Sign in to join the discussion