r 具有许多检查点的多级因素摘要

r summary for multilevel factor with many checkpoints

我有一个包含一个因子变量和多个水平的长数据集,为简单起见,我在这里只使用两个(曲线、直线)。

值(分数)可以是高于(高于阈值)或低于(低于阈值),它是在三个不同的时间点测量的。并非所有 ID 都在所有 3 个时间点进行测量。有些仅在 t1 期间测量,有些在 t1、t2 期间测量,有些在 t1、t3 期间测量,有些在所有时间点测量,t1、t2、t3。

Id  Factor  Level   Time    Score
1   Curve   Above   t1  16.11
1   Curve   Above   t2  15.67
1   Curve   Above   t3  11.24
2   Curve   Above   t1  17.93
2   Curve   Above   t2  11.82
2   Curve   Above   t3  12.95
3   Curve   Above   t1  12.68
4   Curve   Above   t1  11.53
4   Curve   Above   t2  11.74
4   Curve   Above   t3  14.40
5   Curve   Above   t1  14.48
5   Curve   Above   t3  17.32
6   Curve   Above   t1  11.61
6   Curve   Above   t2  14.96
7   Curve   Above   t1  14.00
7   Curve   Above   t2  10.02
7   Curve   Above   t3  14.52
8   Curve   Above   t1  11.85
8   Curve   Below   t3  3.26
9   Curve   Below   t1  2.49
9   Curve   Below   t3  7.00
10  Curve   Below   t2  3.68
10  Curve   Below   t3  1.62
11  Curve   Below   t1  8.08
11  Curve   Below   t2  1.59
11  Curve   Below   t3  1.59
1   Line    Above   t1  10.20
1   Line    Above   t2  13.20
1   Line    Above   t3  15.85
2   Line    Above   t1  19.80
2   Line    Above   t2  11.99
3   Line    Above   t3  17.32
3   Line    Above   t1  10.43
4   Line    Above   t1  12.34
4   Line    Above   t2  14.25
5   Line    Above   t3  14.72
5   Line    Above   t1  15.02
6   Line    Above   t3  17.94
6   Line    Above   t1  19.65
7   Line    Above   t1  18.75
7   Line    Below   t3  3.25
9   Line    Below   t1  2.43
10  Line    Below   t1  7.51
11  Line    Below   t3  2.15
11  Line    Below   t1  7.47
12  Line    Below   t1  1.56
12  Line    Below   t3  6.03
13  Line    Below   t1  4.98

我想做的是创建这样的摘要。

我可以手动或在 excel 中执行此操作,但事实上我的实际数据集中有 20 多个级别,这使得此过程乏味且效率低下。我正在寻求帮助,以便使此过程在 r 中更高效、更快速。感谢您的时间和帮助。提前致谢。

我想你想要这个?

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.1.3

df <- read.table(text = 'Id  Factor  Level   Time    Score
1   Curve   Above   t1  16.11
1   Curve   Above   t2  15.67
1   Curve   Above   t3  11.24
2   Curve   Above   t1  17.93
2   Curve   Above   t2  11.82
2   Curve   Above   t3  12.95
3   Curve   Above   t1  12.68
4   Curve   Above   t1  11.53
4   Curve   Above   t2  11.74
4   Curve   Above   t3  14.40
5   Curve   Above   t1  14.48
5   Curve   Above   t3  17.32
6   Curve   Above   t1  11.61
6   Curve   Above   t2  14.96
7   Curve   Above   t1  14.00
7   Curve   Above   t2  10.02
7   Curve   Above   t3  14.52
8   Curve   Above   t1  11.85
8   Curve   Below   t3  3.26
9   Curve   Below   t1  2.49
9   Curve   Below   t3  7.00
10  Curve   Below   t2  3.68
10  Curve   Below   t3  1.62
11  Curve   Below   t1  8.08
11  Curve   Below   t2  1.59
11  Curve   Below   t3  1.59
1   Line    Above   t1  10.20
1   Line    Above   t2  13.20
1   Line    Above   t3  15.85
2   Line    Above   t1  19.80
2   Line    Above   t2  11.99
3   Line    Above   t3  17.32
3   Line    Above   t1  10.43
4   Line    Above   t1  12.34
4   Line    Above   t2  14.25
5   Line    Above   t3  14.72
5   Line    Above   t1  15.02
6   Line    Above   t3  17.94
6   Line    Above   t1  19.65
7   Line    Above   t1  18.75
7   Line    Below   t3  3.25
9   Line    Below   t1  2.43
10  Line    Below   t1  7.51
11  Line    Below   t3  2.15
11  Line    Below   t1  7.47
12  Line    Below   t1  1.56
12  Line    Below   t3  6.03
13  Line    Below   t1  4.98', header = TRUE)

df %>% 
  group_by(Factor, Time) %>% 
  summarise(median= median(Score),
            first = quantile(Score, 0.25),
            third = quantile(Score, 0.75),
            average = mean(Score[Level == 'Below']),
            perc = sum(Level == 'Below')*100/n(),
            .groups = 'drop') %>% 
  pivot_wider(id_cols = Factor, names_from = Time, values_from = median:perc, names_vary = 'slowest')
#> # A tibble: 2 x 16
#>   Factor median_t1 first_t1 third_t1 average_t1 perc_t1 median_t2 first_t2
#>   <chr>      <dbl>    <dbl>    <dbl>      <dbl>   <dbl>     <dbl>    <dbl>
#> 1 Curve       12.3    11.5      14.4       5.28    20        11.7     6.85
#> 2 Line        10.3     6.85     16.0       4.79    41.7      13.2    12.6 
#> # ... with 8 more variables: third_t2 <dbl>, average_t2 <dbl>, perc_t2 <dbl>,
#> #   median_t3 <dbl>, first_t3 <dbl>, third_t3 <dbl>, average_t3 <dbl>,
#> #   perc_t3 <dbl>

reprex package (v2.0.1)

于 2022-04-08 创建