当一列是数组而另一列是字符串时,如何在 pyspark 中合并两个数据框?

How to merge two dataframe in pyspark when one column is an array and another column is string?

df1:

+---+------+
| id|  code|
+---+------+
|  1|[A, F]|
|  2|   [G]|
|  3|   [A]|
+---+------+

df2:

+--------+----+
|    col1|col2|
+--------+----+
|   Apple|   A|
|  Google|   G|
|Facebook|   F|
+--------+----+

我希望 df3 应该像这样使用 df1 和 df2 列:

+---+------+-----------------+
| id|  code|          changed|
+---+------+-----------------+
|  1|[A, F]|[Apple, Facebook]|
|  2|   [G]|         [Google]|
|  3|   [A]|          [Apple]|
+---+------+-----------------+

我知道如果代码列不是数组,则可以将其存档。我不知道如何为此目的迭代代码数组。

尝试:

from pyspark.sql.functions import *
import pyspark.sql.functions as f

res=(df1
     .select(f.col("id"), f.explode(f.col("code")).alias("code"))
     .join(df2, f.col("code")==df2.col2)
     .groupBy("id")
     .agg(f.collect_list(f.col("code")).alias("code"), f.collect_list(f.col("col1")).alias("changed"))
)