在 Scala 中将来自不同数据帧的行合并在一起

Question

例如，首先我有一个这样的数据框

+----+-----+-----+--------------------+-----+
|year| make|model|             comment|blank|
+----+-----+-----+--------------------+-----+
|2012|Tesla|    S|          No comment|     |
|1997| Ford| E350|Go get one now th...|     |
|2015|Chevy| Volt|                null| null|
+----+-----+-----+--------------------+-----+

我们有 2012 年、1997 年和 2015 年。我们还有另一个像这样的 Dataframe

+----+-----+-----+--------------------+-----+
|year| make|model|             comment|blank|
+----+-----+-----+--------------------+-----+
|2012|BMW  |    3|          No comment|     |
|1997|VW   | GTI |   get              |     |
|2015|MB   | C200|                good| null|
+----+-----+-----+--------------------+-----+

我们还有 2012 年、1997 年、2015 年。我们如何将同一年的行合并在一起？谢谢

输出应该是这样的

+----+-----+-----+--------------------+-----++-----+-----+--------------------------+
|year| make|model|             comment|blank|| make|model|             comment|blank|
+----+-----+-----+--------------------+-----++-----+-----+-----+--------------------+
|2012|Tesla|    S|          No comment|     |BMW   | 3   |          no comment|
|1997| Ford| E350|Go get one now th...|     |VW    |GTI  |      get           |
|2015|Chevy| Volt|                null| null|MB    |C200 |             Good   |null
+----+-----+-----+--------------------+-----++----+-----+-----+---------------------+

Answer 1

您可以通过简单的 join 获得您想要的 table。类似于：

val joined = df1.join(df2, df1("year") === df2("year"))

我加载了您的输入，因此我看到了以下内容：

scala> df1.show
...
year make  model comment
2012 Tesla S     No comment
1997 Ford  E350  Go get one now
2015 Chevy Volt  null

scala> df2.show
...
year make model comment
2012 BMW  3     No comment
1997 VW   GTI   get
2015 MB   C200  good

当我运行 join 时，我得到：

scala> val joined = df1.join(df2, df1("year") === df2("year"))
joined: org.apache.spark.sql.DataFrame = [year: string, make: string, model: string, comment: string, year: string, make: string, model: string, comment: string]

scala> joined.show
...
year make  model comment        year make model comment
2012 Tesla S     No comment     2012 BMW  3     No comment
2015 Chevy Volt  null           2015 MB   C200  good
1997 Ford  E350  Go get one now 1997 VW   GTI   get

需要注意的一件事是您的列名可能不明确，因为它们在数据帧中的名称相同（因此您可以更改它们的名称以使对结果数据帧的操作更容易编写）。

在 Scala 中将来自不同数据帧的行合并在一起

Merging rows from different dataframes together in Scala

scala

apache-spark

spark-dataframe