在 PySpark 中将行转为列
Transposing rows to columns in PySpark
如何以这种方式转置一个 spark DataFrame:
发件人:
Key
Value
Key1
Value1
Key2
Value2
Key3
Value3
收件人:
Key1
Key2
Key3
Value1
Value2
Value3
谢谢!
df = spark.createDataFrame([('key1','value1'),('key2','value2'),('key3','value3')], ['key', 'value'])
import pyspark.sql.functions as F
df.groupBy().pivot('key').agg(F.first('value')).show()
或
df.groupBy().pivot('key').agg({"value":"first"}).show()
+------+------+------+
| key1| key2| key3|
+------+------+------+
|value1|value2|value3|
+------+------+------+
您可以应用 pivot
操作将行转列。
from pyspark.sql import functions as F
data = [("Key1", "Value1", ),
("Key2", "Value2", ),
("Key3", "Value3", ), ]
df = spark.createDataFrame(data, ("Key", "Value", ))
df.groupBy().pivot("Key").agg(F.first("Value")).show()
"""
+------+------+------+
| Key1| Key2| Key3|
+------+------+------+
|Value1|Value2|Value3|
+------+------+------+
"""
如何以这种方式转置一个 spark DataFrame:
发件人:
Key | Value |
---|---|
Key1 | Value1 |
Key2 | Value2 |
Key3 | Value3 |
收件人:
Key1 | Key2 | Key3 |
---|---|---|
Value1 | Value2 | Value3 |
谢谢!
df = spark.createDataFrame([('key1','value1'),('key2','value2'),('key3','value3')], ['key', 'value'])
import pyspark.sql.functions as F
df.groupBy().pivot('key').agg(F.first('value')).show()
或
df.groupBy().pivot('key').agg({"value":"first"}).show()
+------+------+------+
| key1| key2| key3|
+------+------+------+
|value1|value2|value3|
+------+------+------+
您可以应用 pivot
操作将行转列。
from pyspark.sql import functions as F
data = [("Key1", "Value1", ),
("Key2", "Value2", ),
("Key3", "Value3", ), ]
df = spark.createDataFrame(data, ("Key", "Value", ))
df.groupBy().pivot("Key").agg(F.first("Value")).show()
"""
+------+------+------+
| Key1| Key2| Key3|
+------+------+------+
|Value1|Value2|Value3|
+------+------+------+
"""