R 中按 GroupID 分组的值出现次数

Number of value occurences grouped by GroupID in R

我有一个数据集,它有多个列,每列有多个值。我想要的是将每列中的每个值的计数按 groupID

分组

例子

 GroupId | C1            |    C2
      1  | "valColOne1"  | "valColTwo2"
      2  |  "valColOne1" | "valColTwo2"
      2  |  "valColOne1" | "valColTwo2"
      2  |  "valColOne2" | "valColTwo1"
      1  |  "valColOne1" | "valColTwo1"

结果应该是

    GroupId | valColOne1 | valColOne2 | valColTwo1 | valColTwo2
         1  |    2       |     0      |    1       |  1
         2  |    2       |     1      |    1       |  2

要提到初始 table 中的所有值都将是字符串。

将您的原始数据框(我称之为 dat)和 melt 转换为长格式。然后用dcast统计每个值出现的次数。

library(reshape2)

dat.m = melt(dat, id.var="GroupId")

dcast(dat.m, GroupId ~ value)

  GroupId   valColOne1    valColOne2   valColTwo1  valColTwo2
1       1             2             0           1           1
2       2             2             1           1           2

如果您 运行 它们并查看中间结果,则最容易了解每个函数的作用。有关示例,请参阅 here and here

您可以使用 base R

中的 table
table(data.frame(GroupId= df1$GroupId, Val=unlist(df1[-1])))
#         Val
# GroupId valColOne1 valColOne2 valColTwo1 valColTwo2
#  1          2          0          1          1
#  2          2          1          1          2

数据

df1 <- structure(list(GroupId = c(1, 2, 2, 2, 1), C1 = c("valColOne1", 
"valColOne1", "valColOne1", "valColOne2", "valColOne1"), 
C2 =   c("valColTwo2", 
"valColTwo2", "valColTwo2", "valColTwo1", "valColTwo1")),
.Names =  c("GroupId", 
"C1", "C2"), row.names = c(NA, -5L), class = "data.frame")