如何根据另一列对一列求和?

How so sum a column based on another column?

下面有这段代码,我想在 notification_date 按日期分组时对 num 列求和?我还想将 num 列改为 DESC 这可能吗?

from pyspark.sql.functions import col,desc

results = spark.sql("SELECT lhd_2010_name, lhd_2010_code, notification_date, num FROM cases_df")    
    
results.show()
spark.stop()

从 Spark 语法来看它看起来不错,您必须按如下方式修改 SQL 查询:

  1. 如果您只需要总和和日期列:

    select * from 
    ( select notification_date,sum(num) as s_num 
      from cases_d 
      group by notification_date ) a 
    order by s_num desc
    
  2. 如果需要所有字段:

    select b.lhd_2010_name,b.lhd_2010_code,a.notification_date,a.s_num 
    from (select notification_date,sum(num) as s_num 
          from cases_d 
          group by notification_date) a 
    join cases_d b
    on a.notification_date=b.notification_date
    order by s_num desc