R：使用 dcast 时包括没有条目的因素

Question

我正在数据帧上使用 reshape2 函数 dcast。其中一个变量是一些级别未出现在数据框中的因素，但我会在创建的新列中包含所有值。

例如说我运行以下

library(reshape2)
dataDF <- data.frame(
  id = 1:6,
  id2 = c(1,2,3,1,2,3),
  x = c(rep('t1', 3), rep('t2', 3)),
  y = factor(c('A', 'B', 'A', 'B', 'B', 'C'), levels = c('A', 'B', 'C', 'D')),
  value = rep(1)
)

dcast(dataDF, id + id2 ~ x + y, fill = 0)

我得到以下信息

  id id2 t1_A t1_B t2_B t2_C
1  1   1    1    0    0    0
2  2   2    0    1    0    0
3  3   3    1    0    0    0
4  4   1    0    0    1    0
5  5   2    0    0    1    0
6  6   3    0    0    0    1

但我还想包含全为 0 的 t1_C、t1_D、t2_A 和 t2_D 列

即我想要以下

  id id2 t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1  1   1    1    0    0    0    0    0    0    0
2  2   2    0    1    0    0    0    0    0    0
3  3   3    1    0    0    0    0    0    0    0
4  4   1    0    0    0    0    0    1    0    0
5  5   2    0    0    0    0    0    1    0    0
6  6   3    0    0    0    0    0    0    1    0

此外，作为助手，是否可以在初始数据框中不让列 'value' 充满 1 的情况下创建上述内容。基本上只想将 x 和 y 投射到它们自己的列中，如果它们存在于该 id 中，则为 1。

提前致谢

编辑：最初在 LHS 上有一个变量，Jeremy 在下面回答，但实际上在 LHS 上有多个变量，因此编辑问题以反映这一点

Answer 1

尝试将 drop = FALSE 添加到您的 dcast 调用中，这样未使用的因子水平就不会被丢弃：

dcast(dataDF, id ~ x + y, fill = 0, drop = FALSE)

  id t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1  1    1    0    0    0    0    0    0    0
2  2    0    1    0    0    0    0    0    0
3  3    1    0    0    0    0    0    0    0
4  4    0    0    0    0    0    1    0    0
5  5    0    0    0    0    0    1    0    0
6  6    0    0    0    0    0    0    1    0

顺便说一下，是的，我们只需要告诉 dcast 你想要什么，使用函数 aggregate，在这种情况下你想要 length:

data2 <- dataDF[,1:3]
dcast(data2, id ~ x + y, fill = 0, drop = FALSE, fun.aggregate = length)

对于您的编辑，我会使用 tidyr 和 dplyr 而不是 reshape2:

library(tidyr)
library(dplyr)

dataDF %>% left_join(expand.grid(x = levels(dataDF$x), y = levels(dataDF$y)), .) %>%
           unite(z, x, y) %>%
           spread(z, value, fill = 0) %>%
           na.omit

首先我们使用 expand.grid 完成 x 和 y 的所有组合并合并，然后我们 unite 将它们合并为一列 z，然后我们 spread 将它们取出，然后删除id 列中的 NA：

  id id2 t1_A t1_B t1_C t1_D t2_A t2_B t2_C t2_D
1  1   1    1    0    0    0    0    0    0    0
2  2   2    0    1    0    0    0    0    0    0
3  3   3    1    0    0    0    0    0    0    0
4  4   1    0    0    0    0    0    1    0    0
5  5   2    0    0    0    0    0    1    0    0
6  6   3    0    0    0    0    0    0    1    0

R：使用 dcast 时包括没有条目的因素

R: include factors with no entries when using dcast

r

reshape2