R中的ddply小计

Question

我在 R 中使用 ddply，我以两种不同的方式分解数据，但我想要两者的小计。这是我正在使用的功能

    require(plyr)
dfx <- data.frame(
  group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54)
)

# Note the use of the '.' function to allow
# group and sex to be used without quoting
ddply(dfx, .(group, sex), summarize,
 mean = round(mean(age), 2),
 sd = round(sd(age), 2))

我也想按组汇总(mean,sd)和整个数据集的(mean,sd)汇总。有没有办法将它包含在同一个 ddply 中？

Answer 1

您可以复制数据4次： - 包括性别和群体 - 包括性 - 包括组 - 不包括任何列

未包含的列变为"all"

require(plyr)
dfx <- data.frame(
  group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54)
)

# replicate the data not taking account of one or more attributed
dfAll <- dfx
dfAll$group <- "all"
dfAll$sex <- "all"
dfGroup <- dfx
dfGroup$group <- "all_group"
dfSex <- dfx
dfSex$group <- "all_sex"
dfToGroup <- rbind(dfx, dfGroup, dfSex, dfAll)

# Note the use of the '.' function to allow
# group and sex to be used without quoting
ddply(dfToGroup, .(group, sex), summarize,
      mean = round(mean(age), 2),
      sd = round(sd(age), 2))

Answer 2

这不是 plyr，而是 dplyr 建议。如果我没记错的话，您需要 1) group * sex、2) group 和 3) 整个数据集的均值和 sd。如果你不想让你的数据变大，你可以尝试这样的事情。

library(dplyr)

bind_rows(summarise_each(group_by(dfx, group, sex), funs(mean, sd)), 
          summarise_each(group_by(dfx, group), funs(mean, sd), age),
          summarise_each(dfx, funs(mean, sd), age))

您可以使用三个 summarise_each 函数以您想要的方式汇总数据。然后，使用 dplyr (dplyr 0.4) 的开发版本中可用的 bind_rows 将它们全部绑定。如果需要修改NA，可以稍后再做。

#   group sex     mean        sd
#1      A   F 40.81629  9.190859
#2      A   M 34.27423 10.408674
#3      B   F 28.94309  9.002275
#4      B   M 37.70992 11.606198
#5      C   F 41.36827  8.796248
#6      C   M 38.16745  8.912859
#7      A  NA 36.72750  9.874593
#8      B  NA 34.20319 11.210715
#9      C  NA 39.76786  8.111645
#10    NA  NA 36.05086 10.192498

R中的ddply小计

subtotal with ddply in R

r

subtotal

plyr