data.table:分组依据,然后使用返回多个新列的自定义函数进行聚合
data.table: Group by, then aggregate with custom function returning several new columns
在 data.table 中,我该如何执行以下操作:
- 组合 table 几列
- 然后将每个组交给自定义聚合函数,即:
- 从组 table 子集中获取所有列,并通过返回几个将添加到 table
的新列来聚合它们
这里的技巧是在不多次调用聚合函数的情况下生成几个新列。
示例:
library(data.table)
mtcars_dt <- data.table(mtcars)
returnsOneColumn <- function(dt_group_all_columns){
"returned_value_1"
}
# works great, returns one new column as summary per group
mtcars_dt[,
list( new_column_1 = returnsOneColumn(dt_group_all_columns= .SD) ),
by = c("mpg", "cyl"),
.SDcols = colnames(mtcars_dt)
]
returnsMultipleColumns <- function (dt_group_all_columns){
list( "new_column_1" = "returned_value_1",
"new_column_2" = "returned_value_2" )
}
# does not work: Ideally, I would like to have mpg, cyl, and several columns
# generated from once calling returnsMultipleColumns
mtcars_dt[,
list( returnsMultipleColumns(dt_group_all_columns = .SD) ),
by = c("mpg", "cyl"),
.SDcols = colnames(mtcars_dt)
]
# desired output should look like this
#
# mpg cyl new_column_1 new_column_2
# 1: 21.0 6 returned_value_1 returned_value_2
# 2: 22.8 4 returned_value_1 returned_value_2
# 3: 21.4 6 returned_value_1 returned_value_2
# 4: 18.7 8 returned_value_1 returned_value_2
相关:
Assign multiple columns using := in data.table, by group
您已经从函数中返回了一个列表。您无需再次列出它们。所以删除 list
并使用如下代码
mtcars_dt[,
returnsMultipleColumns(dt_group_all_columns = .SD),
by = c("mpg", "cyl"),
.SDcols = colnames(mtcars_dt)
]
mpg cyl new_column_1 new_column_2
1: 21.0 6 returned_value_1 returned_value_2
2: 22.8 4 returned_value_1 returned_value_2
3: 21.4 6 returned_value_1 returned_value_2
4: 18.7 8 returned_value_1 returned_value_2
在 data.table 中,我该如何执行以下操作:
- 组合 table 几列
- 然后将每个组交给自定义聚合函数,即:
- 从组 table 子集中获取所有列,并通过返回几个将添加到 table 的新列来聚合它们
这里的技巧是在不多次调用聚合函数的情况下生成几个新列。
示例:
library(data.table)
mtcars_dt <- data.table(mtcars)
returnsOneColumn <- function(dt_group_all_columns){
"returned_value_1"
}
# works great, returns one new column as summary per group
mtcars_dt[,
list( new_column_1 = returnsOneColumn(dt_group_all_columns= .SD) ),
by = c("mpg", "cyl"),
.SDcols = colnames(mtcars_dt)
]
returnsMultipleColumns <- function (dt_group_all_columns){
list( "new_column_1" = "returned_value_1",
"new_column_2" = "returned_value_2" )
}
# does not work: Ideally, I would like to have mpg, cyl, and several columns
# generated from once calling returnsMultipleColumns
mtcars_dt[,
list( returnsMultipleColumns(dt_group_all_columns = .SD) ),
by = c("mpg", "cyl"),
.SDcols = colnames(mtcars_dt)
]
# desired output should look like this
#
# mpg cyl new_column_1 new_column_2
# 1: 21.0 6 returned_value_1 returned_value_2
# 2: 22.8 4 returned_value_1 returned_value_2
# 3: 21.4 6 returned_value_1 returned_value_2
# 4: 18.7 8 returned_value_1 returned_value_2
相关:
Assign multiple columns using := in data.table, by group
您已经从函数中返回了一个列表。您无需再次列出它们。所以删除 list
并使用如下代码
mtcars_dt[,
returnsMultipleColumns(dt_group_all_columns = .SD),
by = c("mpg", "cyl"),
.SDcols = colnames(mtcars_dt)
]
mpg cyl new_column_1 new_column_2
1: 21.0 6 returned_value_1 returned_value_2
2: 22.8 4 returned_value_1 returned_value_2
3: 21.4 6 returned_value_1 returned_value_2
4: 18.7 8 returned_value_1 returned_value_2