宽格式 tibble 的置信区间

Question

我有一个很大的小标题，下面显示了一个例子。它有七个预测变量（V4 到 V10）和九个结果（w1、w2、w3、mw、i1 , i2, i3, mi, p2).
我想要做的是为第 2 列 (w1) 到第 10 列 (p2)

中的结果创建置信区间

vars    w1    w2      w3      mw      i1      i2      i3      mi      p2
V4  0.084   0.017   0.061   0.054   22.800  4.570   16.700  14.700  0.367
V5  0.032   0.085   0.039   0.052   8.840   23.100  10.700  14.200  0.367
V6  0.026   0.066   0.022   0.038   7.030   18.000  6.070   10.400  0.367
V7  0.097   0.020   0.066   0.061   26.300  5.420   18.100  16.600  0.367
V8  0.048   0.071   0.043   0.054   13.100  19.300  11.800  14.700  0.367
V9  0.018   0.111   0.020   0.050   4.800   30.300  5.440   13.500  0.367
V10 0.053   0.020   0.103   0.058   14.300  5.330   28.000  15.900  0.367
V4  0.084   0.017   0.060   0.054   22.400  4.420   16.200  14.300  0.373
V5  0.032   0.072   0.036   0.047   8.630   19.300  9.760   12.500  0.373
V6  0.030   0.076   0.023   0.043   8.080   20.500  6.070   11.500  0.373
V7  0.080   0.021   0.087   0.063   21.500  5.720   23.300  16.800  0.373
V8  0.053   0.090   0.034   0.059   14.100  24.000  9.110   15.700  0.373
V9  0.016   0.101   0.025   0.048   4.410   27.100  6.790   12.800  0.373
V10 0.060   0.022   0.100   0.061   16.000  5.950   26.800  16.300  0.373

当我 group_by dplyr 中的变量 (vars) 和运行中三个结果的分位数（作为测试）时，它并没有给我我正在寻找的东西为了。它没有给我三个结果的置信区间，而是给我一个置信区间作为如下所示：

+   group_by(vars) %>% 
+   do(data.frame(t(quantile(c(.$w1, .$w2, .$w3), probs = c(0.025, 0.975)))))
# A tibble: 7 x 3
# Groups:   variables [7]
  variables  X2.5 X97.5
1 V10       0.0202 0.103 
2 V4        0.017  0.084 
3 V5        0.032  0.0834
4 V6        0.0221 0.0748
5 V7        0.0201 0.0958
6 V8        0.0351 0.0876
7 V9        0.0162 0.110

简而言之，我正在寻找的是类似于下面的 table 的东西，我在其中获得每个结果的置信区间。

         w1                w2                    w3 
vars X2.5   X97.5   vars  X2.5  X97.5  vars X2.5    X97.5
V10 0.020   0.103   V10 0.020   0.103   V10 0.020   0.103
V4  0.017   0.084   V4  0.017   0.084   V4  0.017   0.084
V5  0.032   0.083   V5  0.032   0.083   V5  0.032   0.083
V6  0.022   0.075   V6  0.022   0.075   V6  0.022   0.075
V7  0.020   0.096   V7  0.020   0.096   V7  0.020   0.096
V8  0.035   0.088   V8  0.035   0.088   V8  0.035   0.088
V9  0.016   0.110   V9  0.016   0.110   V9  0.016   0.110

任何正确方向的指示将不胜感激。我在 Whosebug 上阅读过，但似乎找不到解决我想做的事情的答案。

Answer 1

这里有两种方法。

基础 R.

aggregate(df1[-1], list(df1[[1]]), quantile, probs = c(0.025, 0.975))

随着 tidyverse.

library(dplyr)

df1 %>%
  group_by(vars) %>%
  mutate_at(vars(w1:p2), quantile, probs = c(0.025, 0.975))

注意第二种方式，输出格式不同，第一个分位数（0.025）在第一行，第二个（0.975）在最后一行。

数据.

df1 <-
structure(list(vars = structure(c(2L, 3L, 4L, 
5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L), 
.Label = c("V10", "V4", "V5", "V6", "V7", "V8", 
"V9"), class = "factor"), w1 = c(0.084, 0.032, 
0.026, 0.097, 0.048, 0.018, 0.053, 0.084, 
0.032, 0.03, 0.08, 0.053, 0.016, 0.06), 
w2 = c(0.017, 0.085, 0.066, 0.02, 0.071, 0.111, 
0.02, 0.017, 0.072, 0.076, 0.021, 0.09, 0.101, 
0.022), w3 = c(0.061, 0.039, 0.022, 0.066, 
0.043, 0.02, 0.103, 0.06, 0.036, 0.023, 0.087, 
0.034, 0.025, 0.1), mw = c(0.054, 0.052, 0.038, 
0.061, 0.054, 0.05, 0.058, 0.054, 0.047, 0.043, 
0.063, 0.059, 0.048, 0.061), i1 = c(22.8, 8.84, 
7.03, 26.3, 13.1, 4.8, 14.3, 22.4, 8.63, 8.08, 
21.5, 14.1, 4.41, 16), i2 = c(4.57, 23.1, 18, 5.42, 
19.3, 30.3, 5.33, 4.42, 19.3, 20.5, 5.72, 24, 27.1, 
5.95), i3 = c(16.7, 10.7, 6.07, 18.1, 11.8, 5.44, 
28, 16.2, 9.76, 6.07, 23.3, 9.11, 6.79, 26.8), 
mi = c(14.7, 14.2, 10.4, 16.6, 14.7, 13.5, 15.9, 
14.3, 12.5, 11.5, 16.8, 15.7, 12.8, 16.3), 
p2 = c(0.367, 0.367, 0.367, 0.367, 0.367, 0.367, 
0.367, 0.373, 0.373, 0.373, 0.373, 0.373, 0.373, 
0.373)), class = "data.frame", 
row.names = c(NA, -14L))

Answer 2

另一种可能：melt/pivot为长格式；计算摘要；然后 cast/pivot 到宽格式

library(tidyverse)
df2 <- (df1 
     %>% pivot_longer(-vars,"outcome","value") 
     %>% group_by(vars,outcome) 
     %>% summarise(lwr=quantile(value,0.025),upr=quantile(value,0.975))
)

df2 %>% pivot_wider(names_from=outcome,values_from=c(lwr,upr))

很遗憾，列的顺序不符合您的要求；我想不出 quick 修复（你可以 select() 按你想要的顺序使用变量 ...

宽格式 tibble 的置信区间

confidence intervals for a tibble in wide format

r

confidence-interval

dplyr

tibble