pyspark中的排名日期列

Ranking date column in pyspark

我在 pyspark 中有以下数据框:

>>> df.show()
+----------+------+
|  date_col|counts|
+----------+------+
|2022-02-05|350647|
|2022-02-06|313091|
+----------+------+

我想创建一个结果数据框,它按升序排列 date_col:

>>> df.show()
+----------+------+---------+
|  date_col|counts|order_col|
+----------+------+---------+
|2022-02-05|350647|        2| 
|2022-02-06|313091|        1|
+----------+------+---------+

我们怎样才能做到这一点?

以下脚本可用于创建数据帧 df:

from datetime import datetime, date
from pyspark.sql import Row
from pyspark.sql import SparkSession
df = spark.createDataFrame([
    Row(date_col=date(2022, 02, 05), count=350647 ),
    Row(date_col=date(2022, 02, 06), count=313091 ),
])
df.show()

您可以使用 Rank , in conjunction with Window

轻松做到这一点

数据准备

from pyspark import SparkContext
from pyspark.sql import SQLContext
import pyspark.sql.functions as F
from pyspark.sql import Window

sc = SparkContext.getOrCreate()
sql = SQLContext(sc)


d = {
        'date_col':['2022-02-05', '2022-02-06', '2022-02-07', '2022-02-08'],
        'counts':[350647, 313091, 317791, 312145],
    }

sparkDF = sql.createDataFrame(pd.DataFrame(d))

sparkDF.show()

+----------+------+
|  date_col|counts|
+----------+------+
|2022-02-05|350647|
|2022-02-06|313091|
|2022-02-07|317791|
|2022-02-08|312145|
+----------+------+

排名

window = Window.orderBy(F.col('date_col').desc())
    
sparkDF = sparkDF.withColumn('order_col',F.rank().over(window))

sparkDF.show()

+----------+------+---------+
|  date_col|counts|order_col|
+----------+------+---------+
|2022-02-08|312145|        1|
|2022-02-07|317791|        2|
|2022-02-06|313091|        3|
|2022-02-05|350647|        4|
+----------+------+---------+

排名 - SparkSQL

sql.sql(
    """
    SELECT
         date_col
        ,counts
        ,RANK() OVER( ORDER BY date_col DESC) as order_col
    FROM TB1
    """
).show()

+----------+------+---------+
|  date_col|counts|order_col|
+----------+------+---------+
|2022-02-08|312145|        1|
|2022-02-07|317791|        2|
|2022-02-06|313091|        3|
|2022-02-05|350647|        4|
+----------+------+---------+