在 PySpark 中将行转为列

Transposing rows to columns in PySpark

如何以这种方式转置一个 spark DataFrame:

发件人:

Key Value
Key1 Value1
Key2 Value2
Key3 Value3

收件人:

Key1 Key2 Key3
Value1 Value2 Value3

谢谢!

pyspark.sql.GroupedData.pivot

df = spark.createDataFrame([('key1','value1'),('key2','value2'),('key3','value3')], ['key', 'value'])

import pyspark.sql.functions as F

df.groupBy().pivot('key').agg(F.first('value')).show()

df.groupBy().pivot('key').agg({"value":"first"}).show()

+------+------+------+
|  key1|  key2|  key3|
+------+------+------+
|value1|value2|value3|
+------+------+------+

您可以应用 pivot 操作将行转列。


from pyspark.sql import functions as F

data = [("Key1", "Value1", ),
("Key2", "Value2", ),
("Key3", "Value3", ), ]

df = spark.createDataFrame(data, ("Key", "Value", ))

df.groupBy().pivot("Key").agg(F.first("Value")).show()

"""
+------+------+------+
|  Key1|  Key2|  Key3|
+------+------+------+
|Value1|Value2|Value3|
+------+------+------+
"""