来自 data.frame 列的列联表

Question

我正在尝试根据我的数据集创建 4 向意外事件 table。我的数据集如下所示：

a <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1)
b <- c(1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1)
group1 <- sample(letters[25:26], 12, replace = T)
group2 <- sample(letters[7:10], 12, replace = T)

df <- data.frame(a, b, group1, group2)

我尝试使用 aggregate 函数。创建3路应急时一切正常table

aggregate(cbind(a, b) ~ group1, data = df, FUN = table)
  group1 a.0 a.1 b.0 b.1
1      y   3   4   3   4
2      z   2   3   2   3

但是，当添加第二个分组变量时，输出令人困惑且不合需要。

aggregate(. ~ group1 + group2, data = df, FUN = table)
  group1 group2    a    b
1      y      g    3    3
2      z      g    1    1
3      z      h    1    1
4      y      i    1    1
5      z      i    1    1
6      y      j 2, 1    3
7      z      j 1, 1 1, 1

由于我的原始数据集非常大，我希望能有一些优雅而自动的方法来处理它。 T

Answer 1

不清楚预期的输出。也许我们需要 melt/dcast

library(data.table)
dcast(melt(setDT(df), id.var = c("group1", "group2")), 
                       group1 + group2 ~variable + value, length)

或使用 recast（来自 reshape2 的 melt/dcast 的包装器）

library(reshape2)
recast(df, measure.var = c("a", "b"), ... ~ variable + value, length)
#    group1 group2 a_0 a_1 b_0 b_1
#1      y      g   1   4   3   2
#2      y      h   1   0   1   0
#3      y      j   1   1   0   2
#4      z      g   2   0   0   2
#5      z      i   0   1   0   1
#6      z      j   0   1   1   0

OP 的 aggregate 给出了这个输出

aggregate(. ~ group1 + group2, data = df, FUN = table)
#  group1 group2    a    b
#1      y      g 1, 4 3, 2
#2      z      g    2    2
#3      y      h    1    1
#4      z      i    1    1
#5      y      j 1, 1    2
#6      z      j    1    1

如果我们希望 aggregate 获得两个 levels，则转换为指定 levels 的 factor 并执行 table

do.call(data.frame, aggregate(cbind(a, b) ~ group1 + group2, data = df, 
              FUN = function(x) table(factor(x, levels = 0:1))))
#  group1 group2 a.0 a.1 b.0 b.1
#1      y      g   1   4   3   2
#2      z      g   2   0   0   2
#3      y      h   1   0   1   0
#4      z      i   0   1   0   1
#5      y      j   1   1   0   2
#6      z      j   0   1   1   0

如果我们想要所有的组合，dcast中有drop = FALSE

dcast(melt(setDT(df), id.var = c("group1", "group2")), group1 + group2 ~
                   variable + value, length, drop = FALSE)

或在recast

recast(df, measure.var = c("a", "b"), ... ~ variable + value, length, drop = FALSE)

注意：sample 没有 set.seed，因此此处显示的输出将不同于 OP 的输出

Answer 2

可能有点复杂，但也许有帮助，据我所知，您只是想数数，所以这可能会有所帮助：

#Creating data
a <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1)
b <- c(1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1)
group1 <- sample(letters[25:26], 12, replace = T)
group2 <- sample(letters[7:10], 12, replace = T)
df <- data.frame(a, b, group1, group2)

# Counting variable a and b seperatly in a data frame
counta <- xtabs( ~ group1 + group2 + a, data = df)
countb <- xtabs( ~ group1 + group2 + b, data = df)
df.a <- data.frame(counta)
df.b <- data.frame(countb)

#Now merging the data.frames:
result.df <- merge(df.a, df.b, by.x= c("group1", "group2"),by.y=c("group1", "group2"), all = TRUE)

# Result Looks like this:
result.df

#          group1 group2 a Freq.x    b Freq.y
#   1       y      g     0      2    0      1
#   2       y      g     0      2    1      1
#   3       y      g     1      0    0      1
#   4       y      g     1      0    1      1
#   5       y      h     0      1    0      0
#   6       y      h     0      1    1      1
#   7       y      h     1      0    0      0
#   8       y      h     1      0    1      1
#   9       y      i     0      1    0      2
#  10       y      i     0      1    1      1
#  11       y      i     1      2    0      2
#  12       y      i     1      2    1      1
#  13       y      j     0      0    0      0
#  14       y      j     0      0    1      0
#  15       y      j     1      0    0      0
#  16       y      j     1      0    1      0
#  17       z      g     0      0    0      1
#  18       z      g     0      0    1      0
#  19       z      g     1      1    0      1
#  20       z      g     1      1    1      0
#  21       z      h     0      0    0      1
#  22       z      h     0      0    1      1
#  23       z      h     1      2    0      1
#  24       z      h     1      2    1      1
#  25       z      i     0      1    0      0
#  26       z      i     0      1    1      1
#  27       z      i     1      0    0      0
#  28       z      i     1      0    1      1
#  29       z      j     0      0    0      0
#  30       z      j     0      0    1      2
#  31       z      j     1      2    0      0
#  32       z      j     1      2    1      2

来自 data.frame 列的列联表

Contingency tables from data.frame columns

aggregate

r

dataset

dataframe