如何根据另一列对一列求和?
How so sum a column based on another column?
下面有这段代码,我想在 notification_date
按日期分组时对 num
列求和?我还想将 num
列改为 DESC
这可能吗?
from pyspark.sql.functions import col,desc
results = spark.sql("SELECT lhd_2010_name, lhd_2010_code, notification_date, num FROM cases_df")
results.show()
spark.stop()
从 Spark 语法来看它看起来不错,您必须按如下方式修改 SQL 查询:
如果您只需要总和和日期列:
select * from
( select notification_date,sum(num) as s_num
from cases_d
group by notification_date ) a
order by s_num desc
如果需要所有字段:
select b.lhd_2010_name,b.lhd_2010_code,a.notification_date,a.s_num
from (select notification_date,sum(num) as s_num
from cases_d
group by notification_date) a
join cases_d b
on a.notification_date=b.notification_date
order by s_num desc
下面有这段代码,我想在 notification_date
按日期分组时对 num
列求和?我还想将 num
列改为 DESC
这可能吗?
from pyspark.sql.functions import col,desc
results = spark.sql("SELECT lhd_2010_name, lhd_2010_code, notification_date, num FROM cases_df")
results.show()
spark.stop()
从 Spark 语法来看它看起来不错,您必须按如下方式修改 SQL 查询:
如果您只需要总和和日期列:
select * from ( select notification_date,sum(num) as s_num from cases_d group by notification_date ) a order by s_num desc
如果需要所有字段:
select b.lhd_2010_name,b.lhd_2010_code,a.notification_date,a.s_num from (select notification_date,sum(num) as s_num from cases_d group by notification_date) a join cases_d b on a.notification_date=b.notification_date order by s_num desc