填充缺失数据

filling the missing data

我有两个数据框

df_1:

ID    |  title  |  name   |   age
----------------------------------
32    |  AA     | Alex    | 30
----------------------------------
4568  |  BB     |  Dom    |  35
----------------------------------
3804  |  CC     |  pascal |  58
----------------------------------




  df_2:


ID   |  title   
--------------
288  |  AZERTY    
--------------
290  |  querty      
--------------

我想将 df_2 的数据附加到 df_1。我想在来自 df_1.

的行前面用 right 填充年龄列

df_1 为:

ID    |  title  |  name   |   age
----------------------------------
32    |  AA     | Alex    | 30
----------------------------------
4568  |  BB     |  Dom    |  35
----------------------------------
3804  |  CC     |  pascal |  58
----------------------------------
288  |  AZERTY  | right   | right
-----------------------------------
290  |  querty  | right   | right    
-----------------------------------

如何在 pyspark 中追加数据帧并填充缺失的列?

您需要合并表:

df_2 = df_2
  .withColumn("name", lit("right"))
  .withColumn("age", lit("right"))

df_1.union(df_2).show()

+----+------+-------+-------+
|  id| title|   name|    age|
+----+------+-------+-------+
|  32|    AA|   Alex|     30|
|4568|    BB|    Dom|     35|
| 288|AZERTY|right  |right  |
| 290|querty|right  |right  |
+----+------+-------+-------+