传播或 dcast 并填写计数

Question

可能是个基本问题。

我有一个key - valuedata.frame（下面df）：

features <- paste0("f",1:5)
set.seed(1)
ids <- paste0("id",1:10)

df <- do.call(rbind,lapply(ids,function(i){
  data.frame(id = i, feature = sample(features,3,replace = F))
}))

我想 tidyr::spread 或 reshape2::dcast 它，这样行就是 id' the columns are feature, but the values are the sum of featuresfor eachid`.

一个简单的：

reshape2::dcast(df, id ~ feature)

没有做到这一点。它只是填写 features 和 NAs

将 fun.aggregate = sum 添加到上面的命令会导致错误：

> reshape2::dcast(df, id ~ feature, fun.aggregate = sum)
Using feature as value column: use value.var to override.
Error in .fun(.value[0], ...) : invalid 'type' (character) of argument

并且 tidyr::spread 也会产生错误：

tidyr::spread(df, key = id, value = feature)

Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 30 rows:

有什么想法吗？

Answer 1

我想你想计算特征而不是 sum 它们。尝试使用函数 length.

tidyr::pivot_wider(df, names_from = feature, 
            values_from = feature, values_fn = length, values_fill = 0)

或 dcast.

library(data.table)
dcast(setDT(df), id~feature, value.var = 'feature', fun.aggregate = length)

在 base R 中，使用 table(df) 会得到相同的输出。

table(df)

#     feature
#id     f1 f2 f3 f4 f5
#  id1   1  0  1  1  0
#  id10  1  0  1  1  0
#  id2   1  1  0  0  1
#  id3   0  1  1  1  0
#  id4   1  0  1  0  1
#  id5   1  1  0  0  1
#  id6   1  1  1  0  0
#  id7   1  0  0  1  1
#  id8   1  1  0  0  1
#  id9   0  1  0  1  1

传播或 dcast 并填写计数

Spread or dcast and fill in counts

r

spread

reshape2

tidyr

dcast