按组划分的 ddply 多个分位数不一致

Question

我正在尝试使用 ddply 在相对较小的数据集中汇总多个组的中位数和第 25/75 个百分位数。我按 DoseWt 对测量数据点 AUC_INFobs 和 Cmax 进行分组。（在 Windows 10 上使用 RStudio 1.3.1093 中的 R 4.0.4）尽管 AUCINF_obs 的结果同意是否按行计算（对于 DoseWt==0.3）与 ddply 和总结，但我的 Cmax 数据并非如此：

median(NCAtrim$Cmax[NCAtrim$DoseWt==0.3])
quantile(NCAtrim$Cmax[NCAtrim$DoseWt==0.3], 0.25)
quantile(NCAtrim$Cmax[NCAtrim$DoseWt==0.3], 0.75)

NCA.by.Dose.25_75tile<-ddply(NCAtrim, .(DoseWt), summarize,
   AUC_inf = round(median(AUCINF_obs),2), AUCinf25 = round(quantile(AUCINF_obs, 0.25),2), AUCinf75 = round(quantile(AUCINF_obs, 0.75),2),
     Cmax = round(median(Cmax), 2), Cmax_25 = round(quantile(Cmax, 0.25), 2), Cmax_75 = round(quantile(Cmax, 0.75), 2))    
NCA.by.Dose.25_75tile

谁能解释为什么我无法在此处使用 ddply summarize for Cmax 生成第 25 和第 75 个百分位数，但第 25、50 和 75 个百分位数 AUCINF_obs 有效？（我也试过 quantile(Cmax, probs =0.25).

NCAtrim <- structure(list(Subject = c(103L, 103L, 103L, 105L, 105L, 107L, 
107L, 107L, 109L, 111L, 111L, 111L, 113L, 113L, 113L, 114L, 114L, 
114L, 117L, 117L, 117L, 124L, 124L, 124L, 126L, 126L, 126L, 127L, 
127L, 127L, 130L, 130L, 130L), DoseWt = c(0.3, 0.45, 0.6, 0.3, 
0.45, 0.3, 0.45, 0.6, 0.3, 0.3, 0.45, 0.6, 0.3, 0.45, 0.6, 0.3, 
0.45, 0.6, 0.3, 0.45, 0.6, 0.3, 0.45, 0.6, 0.3, 0.45, 0.6, 0.3, 
0.45, 0.6, 0.3, 0.45, 0.6), AUCINF_obs = c(75.57957417, 104.7376298, 
193.1863023, 150.8553768, 231.6657641, 97.55371159, 153.2804929, 
213.179011, 90.84944244, 54.65739998, 93.3108462, 78.07527241, 
61.31713576, 89.91275385, 126.6723822, 94.02414615, 166.3379068, 
227.4162735, 98.84793101, 172.1750658, 149.2339892, 79.45304645, 
142.0389319, 171.7761067, 44.36951602, 86.64275743, 107.4389943, 
56.42917332, 112.4691754, 144.4193233, 87.22135293, 137.3190569, 
151.0853702), Cmax = c(17.2, 22.7, 54.1, 16, 43.3, 19.8, 35.1, 
48, 30.6, 12.4, 18.2, 16.4, 16, 27.8, 31.3, 14.5, 24.6, 37.6, 
15.3, 26, 27.7, 16.5, 24.3, 19.7, 11, 15.8, 43.2, 14.6, 29.8, 
35.6, 19, 38.1, 39)), class = "data.frame", row.names = c(NA, 
-33L))

Answer 1

那是因为Cmax的值在你运行Cmax = round(median(Cmax), 2)的时候改变了。您运行 (Cmax_25 = round(quantile(Cmax, 0.25), 2)) 的下一个命令会得到此更改的 Cmax 值，而不是原始值。

您可以将该行保留在最后，这样它就不会更改 Cmax 值。 plyr 也已停用，因此您可能需要切换到 dplyr。

library(dplyr)

NCAtrim %>%
  group_by(DoseWt) %>%
  summarise(AUC_inf = round(median(AUCINF_obs),2), 
            AUCinf25 = round(quantile(AUCINF_obs, 0.25),2), 
            AUCinf75 = round(quantile(AUCINF_obs, 0.75),2),
            Cmax_25 = round(quantile(Cmax, 0.25), 2), 
            Cmax_75 = round(quantile(Cmax, 0.75), 2), 
            Cmax = round(median(Cmax), 2)) -> NCA.by.Dose.25_75tile

NCA.by.Dose.25_75tile

按组划分的 ddply 多个分位数不一致

Inconsistent ddply multiple quantiles by group

r

quantile

plyr

summarize