pyspark - 如何找到过滤的行总和
pyspark - how to find sum of rows filtered
我有一个包含两列的数据集:Country、Adclicks。如何找到广告点击次数最多的国家/地区?
Country | Ad Click
USA 1
USA 0
USA 1
PR 0
PR 0
PR 1
假设您的 DataFrame 定义为变量“df”,则如下所示:
import pyspark.sql.functions as psf
# Get aggregate sum
s = df.groupby("Country").agg({'Ad Click': 'sum'})
# Get and display top country
s.registerTempTable("sums_table")
query = """
SELECT Country
FROM sums_table
WHERE `sum(Ad Click)` = (
SELECT MAX(`sum(Ad Click)`)
FROM sums_table)
"""
top_country = spark.sql(query).collect()
print(top_country[0]["Country"])
我有一个包含两列的数据集:Country、Adclicks。如何找到广告点击次数最多的国家/地区?
Country | Ad Click
USA 1
USA 0
USA 1
PR 0
PR 0
PR 1
假设您的 DataFrame 定义为变量“df”,则如下所示:
import pyspark.sql.functions as psf
# Get aggregate sum
s = df.groupby("Country").agg({'Ad Click': 'sum'})
# Get and display top country
s.registerTempTable("sums_table")
query = """
SELECT Country
FROM sums_table
WHERE `sum(Ad Click)` = (
SELECT MAX(`sum(Ad Click)`)
FROM sums_table)
"""
top_country = spark.sql(query).collect()
print(top_country[0]["Country"])