Skip to content
DEP
Developing Code for Data Processing using Python and SQL
hard
Question 5 of 44

A data engineer writes the following PySpark code to find the top 3 products by total revenue per category. Which statement best describes the result?

from pyspark.sql import functions as F
from pyspark.sql.window import Window

windowSpec = Window.partitionBy("category").orderBy(F.desc("total_revenue"))
result = (
    df.groupBy("category", "product_id")
    .agg(F.sum("revenue").alias("total_revenue"))
    .withColumn("rank", F.rank().over(windowSpec))
    .filter(F.col("rank") <= 3)
)

AThe code fails because window functions cannot be applied after groupBy()
BThe code returns the top 3 products per category by total revenue, but ties in revenue may cause more than 3 rows to be returned per category
CThe code returns exactly 3 rows per category regardless of revenue ties
DThe code fails because rank() requires ROW_NUMBER() to be calculated first

Educational Content — CertQnA practice questions are written against official exam objectives, covering the same domains tested on the real exam. All content is original and independent — not actual exam questions, not affiliated with any certification vendor. Learn more about our content policy

Discussion

Be the first to share your understanding of this concept

⚠️ Discussion is for concept clarification only. Do not share or request actual exam questions or answers.

Sign in to join the discussion