从 `dataframe/Dataset` 函数中获取与 SQL 查询返回的结果相同的结果
Get the same result from `dataframe/Dataset` functions as returned by SQL query
数据:
1,Coke
1,Beans
1,paper
2,Beans
2,Pen
2,Sheets
2,Banana
预期输出:
+---+------------------------+
| 1|Coke,Beans,Paper |
| 2|Beans,Pen,Sheets,Banana |
+---+------------------------+
我可以通过编写 sql 查询来实现这一点。
val df = sparkSession.read.csv("file_location")
df.registerTempTable("data")
val result = sparkSession
.sql("select _c0 ,concat_ws(',', collect_list(_c1)) as product from data group by _c0")
result.show
请帮助我使用 dataframe/Dataset
函数(select、groupby、agg 等)获得相同的结果
很简单,答案就在那里,但我希望我不只是在做一些研究生作业。 DataFrame
类似于 SQL table,因此您可以使用其方法查询它。
import org.apache.spark.sql.functions._
val df = sc.parallelize(List(
(1, "Coke"),
(1, "Beans"),
(1,"paper"),
(2,"Beans"),
(2,"Pen"),
(2,"Sheets"),
(2,"Banana")
)).toDF("id", "product_name")
df.groupBy("id").agg(concat_ws(",", collect_list("product_name")).as("product_list")).show()
输出为:
+---+-----------------------+
|id |product_list |
+---+-----------------------+
|1 |Coke,Beans,paper |
|2 |Beans,Pen,Sheets,Banana|
+---+-----------------------+
数据:
1,Coke
1,Beans
1,paper
2,Beans
2,Pen
2,Sheets
2,Banana
预期输出:
+---+------------------------+
| 1|Coke,Beans,Paper |
| 2|Beans,Pen,Sheets,Banana |
+---+------------------------+
我可以通过编写 sql 查询来实现这一点。
val df = sparkSession.read.csv("file_location")
df.registerTempTable("data")
val result = sparkSession
.sql("select _c0 ,concat_ws(',', collect_list(_c1)) as product from data group by _c0")
result.show
请帮助我使用 dataframe/Dataset
函数(select、groupby、agg 等)获得相同的结果
很简单,答案就在那里,但我希望我不只是在做一些研究生作业。 DataFrame
类似于 SQL table,因此您可以使用其方法查询它。
import org.apache.spark.sql.functions._
val df = sc.parallelize(List(
(1, "Coke"),
(1, "Beans"),
(1,"paper"),
(2,"Beans"),
(2,"Pen"),
(2,"Sheets"),
(2,"Banana")
)).toDF("id", "product_name")
df.groupBy("id").agg(concat_ws(",", collect_list("product_name")).as("product_list")).show()
输出为:
+---+-----------------------+
|id |product_list |
+---+-----------------------+
|1 |Coke,Beans,paper |
|2 |Beans,Pen,Sheets,Banana|
+---+-----------------------+