sparklyr 中 sdf_pivot 可以使用哪些聚合函数?
What aggregation functions can be used with sdf_pivot in sparklyr?
正在尝试将 sdf_pivot
与 sparklyr
的开发版本一起使用。唯一似乎有效的聚合函数是 count
。如果我尝试 sum
或 avg
我得到一个异常说明 No matched method found for class org.apache.spark.sql.RelationalGroupedDataset.sum
这里是一些要重现的代码:
iris_tbl <- copy_to(sc, iris)
iris_tbl %>% sdf_pivot(Species ~ Sepal_Width) # this works
iris_tbl %>% sdf_pivot(Species ~ Sepal_Width, "sum") # this doesn't
我认为这仍然是 未记录的 但您收到此错误的原因是您需要将 sdf_pivot
函数与 R list or R function for aggregation.
这里有一些例子:
使用 R 列表:
> iris_tbl %>% sdf_pivot(Species ~ Sepal_Width, list(Sepal_Width="sum")) %>% head()
# Source: lazy query [?? x 24]
# Database: spark_connection
Species `2.0` `2.2` `2.3` `2.4` `2.5` `2.6` `2.7` `2.8` `2.9` `3.0` `3.1` `3.2` `3.3`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 versicolor 2 4.4 6.9 7.2 10 7.8 13.5 16.8 20.3 24 9.3 9.6 3.3
2 virginica NaN 2.2 NaN NaN 10 5.2 10.8 22.4 5.8 36 12.4 16.0 9.9
3 setosa NaN NaN 2.3 NaN NaN NaN NaN NaN 2.9 18 12.4 16.0 6.6
# ... with 10 more variables: `3.4` <dbl>, `3.5` <dbl>, `3.6` <dbl>, `3.7` <dbl>,
# `3.8` <dbl>, `3.9` <dbl>, `4.0` <dbl>, `4.1` <dbl>, `4.2` <dbl>, `4.4` <dbl>
使用R函数:
> sum_sepal_width <- function(gdf) {
expr <- invoke_static(
sc,
"org.apache.spark.sql.functions",
"expr",
"sum(Sepal_Width)"
)
gdf %>% invoke("agg", expr, list())
}
> iris_tbl %>% sdf_pivot(Species ~ Sepal_Width, fun.aggregate = fun.aggregate)
# Source: table<sparklyr_tmp_4ee61c86311c> [?? x 24]
# Database: spark_connection
Species `2.0` `2.2` `2.3` `2.4` `2.5` `2.6` `2.7` `2.8` `2.9` `3.0` `3.1` `3.2` `3.3`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 versicolor 2 4.4 6.9 7.2 10 7.8 13.5 16.8 20.3 24 9.3 9.6 3.3
2 virginica NaN 2.2 NaN NaN 10 5.2 10.8 22.4 5.8 36 12.4 16.0 9.9
3 setosa NaN NaN 2.3 NaN NaN NaN NaN NaN 2.9 18 12.4 16.0 6.6
# ... with 10 more variables: `3.4` <dbl>, `3.5` <dbl>, `3.6` <dbl>, `3.7` <dbl>,
# `3.8` <dbl>, `3.9` <dbl>, `4.0` <dbl>, `4.1` <dbl>, `4.2` <dbl>, `4.4` <dbl>
注:sdf_pivot
在sparklyr-0-6-0-unreleased之前不可用。
正在尝试将 sdf_pivot
与 sparklyr
的开发版本一起使用。唯一似乎有效的聚合函数是 count
。如果我尝试 sum
或 avg
我得到一个异常说明 No matched method found for class org.apache.spark.sql.RelationalGroupedDataset.sum
这里是一些要重现的代码:
iris_tbl <- copy_to(sc, iris)
iris_tbl %>% sdf_pivot(Species ~ Sepal_Width) # this works
iris_tbl %>% sdf_pivot(Species ~ Sepal_Width, "sum") # this doesn't
我认为这仍然是 未记录的 但您收到此错误的原因是您需要将 sdf_pivot
函数与 R list or R function for aggregation.
这里有一些例子:
使用 R 列表:
> iris_tbl %>% sdf_pivot(Species ~ Sepal_Width, list(Sepal_Width="sum")) %>% head()
# Source: lazy query [?? x 24]
# Database: spark_connection
Species `2.0` `2.2` `2.3` `2.4` `2.5` `2.6` `2.7` `2.8` `2.9` `3.0` `3.1` `3.2` `3.3`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 versicolor 2 4.4 6.9 7.2 10 7.8 13.5 16.8 20.3 24 9.3 9.6 3.3
2 virginica NaN 2.2 NaN NaN 10 5.2 10.8 22.4 5.8 36 12.4 16.0 9.9
3 setosa NaN NaN 2.3 NaN NaN NaN NaN NaN 2.9 18 12.4 16.0 6.6
# ... with 10 more variables: `3.4` <dbl>, `3.5` <dbl>, `3.6` <dbl>, `3.7` <dbl>,
# `3.8` <dbl>, `3.9` <dbl>, `4.0` <dbl>, `4.1` <dbl>, `4.2` <dbl>, `4.4` <dbl>
使用R函数:
> sum_sepal_width <- function(gdf) {
expr <- invoke_static(
sc,
"org.apache.spark.sql.functions",
"expr",
"sum(Sepal_Width)"
)
gdf %>% invoke("agg", expr, list())
}
> iris_tbl %>% sdf_pivot(Species ~ Sepal_Width, fun.aggregate = fun.aggregate)
# Source: table<sparklyr_tmp_4ee61c86311c> [?? x 24]
# Database: spark_connection
Species `2.0` `2.2` `2.3` `2.4` `2.5` `2.6` `2.7` `2.8` `2.9` `3.0` `3.1` `3.2` `3.3`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 versicolor 2 4.4 6.9 7.2 10 7.8 13.5 16.8 20.3 24 9.3 9.6 3.3
2 virginica NaN 2.2 NaN NaN 10 5.2 10.8 22.4 5.8 36 12.4 16.0 9.9
3 setosa NaN NaN 2.3 NaN NaN NaN NaN NaN 2.9 18 12.4 16.0 6.6
# ... with 10 more variables: `3.4` <dbl>, `3.5` <dbl>, `3.6` <dbl>, `3.7` <dbl>,
# `3.8` <dbl>, `3.9` <dbl>, `4.0` <dbl>, `4.1` <dbl>, `4.2` <dbl>, `4.4` <dbl>
注:sdf_pivot
在sparklyr-0-6-0-unreleased之前不可用。