获取由 R 中数据框中的另一列中的值分隔的列的箱线图统计信息

Get boxplot stats of a column separated by values in another column in dataframe in R

我有这样一个数据框:

distance exclude   
1.1    F
1.5    F
3      F
2      F
1      F
5      T
3      F
63     F
32     F
21     F
15     F
1      T

我想获取距离列中每个数据段的四个箱线图统计数据,在排除列中用 "T" 分隔,这里 "T" 作为分隔符。

有人可以帮忙吗?非常感谢!

首先,让我们创建一些假数据:

library(dplyr)

# Fake data
set.seed(49349)
dat = data.frame(distance=rnorm(500, 50, 10), 
                 exclude=sample(c("T","F"), 500, replace=TRUE, prob=c(0.03,0.95)))

现在每次 exclude = "T" 时创建一个新组。然后,对于每个组,计算你想要的任何统计数据和 return 数据框中的结果:

box.stats = dat %>% 
  mutate(group = cumsum(exclude=="T")) %>%
  group_by(group) %>%
  do(data.frame(n=length(.$distance), 
                out_90 = sum(.$distance > quantile(.$distance, 0.9)),
                out_10 = sum(.$distance < quantile(.$distance, 0.1)),
                MEAN = round(mean(.$distance),2), 
                SD = round(sd(.$distance),2),
                out_2SD_high = sum(.$distance > mean(.$distance) + 2*sd(.$distance)),
                round(t(quantile(.$distance, probs=c(0,0.1,0.25,0.5,0.75,0.9,1))),2)))

names(box.stats) = gsub("X(.*)\.$", "p\1", names(box.stats))

box.stats
   group     n out_90 out_10  MEAN    SD out_2SD_high    p0   p10   p25   p50   p75   p90  p100
1      0    15      2      2 46.21  8.78            0 28.66 36.03 41.88 46.04 52.33 56.30 61.98
2      1    36      4      4 50.03 10.01            0 21.71 38.78 44.63 51.13 56.66 61.58 67.84
3      2    80      8      8 50.36  9.00            1 20.30 38.10 45.95 51.28 56.51 61.74 70.44
4      3     9      1      1 55.62  8.58            0 42.11 47.10 49.19 54.54 63.63 65.84 67.88
5      4    16      2      2 47.70  7.79            0 29.03 39.89 43.60 49.26 52.92 56.97 58.02
6      5    66      7      7 49.86  9.93            2 24.84 36.00 45.05 50.51 55.65 61.41 75.27
7      6    44      5      5 50.35 10.39            1 31.72 36.36 43.49 50.95 55.78 64.88 73.64
8      7    80      8      8 49.18  9.24            1 27.62 37.86 42.06 50.34 56.60 59.66 72.13
9      8    31      3      3 52.56 11.18            0 25.78 39.94 44.10 51.32 62.02 66.35 70.40
10     9    60      6      6 50.31  9.82            1 25.43 37.44 44.53 50.31 56.78 62.36 71.77
11    10    33      4      4 49.99  9.78            2 32.74 38.72 42.56 49.60 55.75 62.86 72.20
12    11    30      3      3 48.26 11.47            1 30.03 37.68 40.24 45.65 55.42 60.18 79.36