如何在数据帧的开头移动 pyspark 数据帧的特定列

Question

我有一个pyspark数据框如下（这只是一个简化的例子，我的实际数据框有数百列）：

col1,col2,......,col_with_fix_header
1,2,.......,3
4,5,.......,6
2,3,........,4

我想在开头移动col_with_fix_header，这样输出如下：

col_with_fix_header,col1,col2,............
3,1,2,..........
6,4,5,....
4,2,3,.......

我不想列出解决方案中的所有列。

Answer 1

如果您不想列出数据框的所有列，可以使用数据框属性 columns。属性为您提供了一个 python 列名称列表，您可以简单地将其切片：

df = spark.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
  ("d", "David", 29),
  ("e", "Esther", 32),
  ("f", "Fanny", 36),
  ("g", "Gabby", 60)], ["id", "name", "age"])
  
df.select([df.columns[-1]] + df.columns[:-1]).show()

输出：

+---+---+-------+
|age| id|   name|
+---+---+-------+
| 34|  a|  Alice|
| 36|  b|    Bob|
| 30|  c|Charlie|
| 29|  d|  David|
| 32|  e| Esther|
| 36|  f|  Fanny|
| 60|  g|  Gabby|
+---+---+-------+

如何在数据帧的开头移动 pyspark 数据帧的特定列

How to move a specific column of a pyspark dataframe in the start of the dataframe

pyspark

pyspark-dataframes