传播或 dcast 并填写计数
Spread or dcast and fill in counts
可能是个基本问题。
我有一个key - value
data.frame
(下面df
):
features <- paste0("f",1:5)
set.seed(1)
ids <- paste0("id",1:10)
df <- do.call(rbind,lapply(ids,function(i){
data.frame(id = i, feature = sample(features,3,replace = F))
}))
我想 tidyr::spread
或 reshape2::dcast
它,这样行就是 id' the columns are
feature, but the values are the sum of
featuresfor each
id`.
一个简单的:
reshape2::dcast(df, id ~ feature)
没有做到这一点。它只是填写 feature
s 和 NA
s
将 fun.aggregate = sum
添加到上面的命令会导致错误:
> reshape2::dcast(df, id ~ feature, fun.aggregate = sum)
Using feature as value column: use value.var to override.
Error in .fun(.value[0], ...) : invalid 'type' (character) of argument
并且 tidyr::spread 也会产生错误:
tidyr::spread(df, key = id, value = feature)
Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 30 rows:
有什么想法吗?
我想你想计算特征而不是 sum
它们。尝试使用函数 length
.
tidyr::pivot_wider(df, names_from = feature,
values_from = feature, values_fn = length, values_fill = 0)
或 dcast
.
library(data.table)
dcast(setDT(df), id~feature, value.var = 'feature', fun.aggregate = length)
在 base R 中,使用 table(df)
会得到相同的输出。
table(df)
# feature
#id f1 f2 f3 f4 f5
# id1 1 0 1 1 0
# id10 1 0 1 1 0
# id2 1 1 0 0 1
# id3 0 1 1 1 0
# id4 1 0 1 0 1
# id5 1 1 0 0 1
# id6 1 1 1 0 0
# id7 1 0 0 1 1
# id8 1 1 0 0 1
# id9 0 1 0 1 1
可能是个基本问题。
我有一个key - value
data.frame
(下面df
):
features <- paste0("f",1:5)
set.seed(1)
ids <- paste0("id",1:10)
df <- do.call(rbind,lapply(ids,function(i){
data.frame(id = i, feature = sample(features,3,replace = F))
}))
我想 tidyr::spread
或 reshape2::dcast
它,这样行就是 id' the columns are
feature, but the values are the sum of
featuresfor each
id`.
一个简单的:
reshape2::dcast(df, id ~ feature)
没有做到这一点。它只是填写 feature
s 和 NA
s
将 fun.aggregate = sum
添加到上面的命令会导致错误:
> reshape2::dcast(df, id ~ feature, fun.aggregate = sum)
Using feature as value column: use value.var to override.
Error in .fun(.value[0], ...) : invalid 'type' (character) of argument
并且 tidyr::spread 也会产生错误:
tidyr::spread(df, key = id, value = feature)
Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 30 rows:
有什么想法吗?
我想你想计算特征而不是 sum
它们。尝试使用函数 length
.
tidyr::pivot_wider(df, names_from = feature,
values_from = feature, values_fn = length, values_fill = 0)
或 dcast
.
library(data.table)
dcast(setDT(df), id~feature, value.var = 'feature', fun.aggregate = length)
在 base R 中,使用 table(df)
会得到相同的输出。
table(df)
# feature
#id f1 f2 f3 f4 f5
# id1 1 0 1 1 0
# id10 1 0 1 1 0
# id2 1 1 0 0 1
# id3 0 1 1 1 0
# id4 1 0 1 0 1
# id5 1 1 0 0 1
# id6 1 1 1 0 0
# id7 1 0 0 1 1
# id8 1 1 0 0 1
# id9 0 1 0 1 1