r 具有许多检查点的多级因素摘要
r summary for multilevel factor with many checkpoints
我有一个包含一个因子变量和多个水平的长数据集,为简单起见,我在这里只使用两个(曲线、直线)。
值(分数)可以是高于(高于阈值)或低于(低于阈值),它是在三个不同的时间点测量的。并非所有 ID 都在所有 3 个时间点进行测量。有些仅在 t1 期间测量,有些在 t1、t2 期间测量,有些在 t1、t3 期间测量,有些在所有时间点测量,t1、t2、t3。
Id Factor Level Time Score
1 Curve Above t1 16.11
1 Curve Above t2 15.67
1 Curve Above t3 11.24
2 Curve Above t1 17.93
2 Curve Above t2 11.82
2 Curve Above t3 12.95
3 Curve Above t1 12.68
4 Curve Above t1 11.53
4 Curve Above t2 11.74
4 Curve Above t3 14.40
5 Curve Above t1 14.48
5 Curve Above t3 17.32
6 Curve Above t1 11.61
6 Curve Above t2 14.96
7 Curve Above t1 14.00
7 Curve Above t2 10.02
7 Curve Above t3 14.52
8 Curve Above t1 11.85
8 Curve Below t3 3.26
9 Curve Below t1 2.49
9 Curve Below t3 7.00
10 Curve Below t2 3.68
10 Curve Below t3 1.62
11 Curve Below t1 8.08
11 Curve Below t2 1.59
11 Curve Below t3 1.59
1 Line Above t1 10.20
1 Line Above t2 13.20
1 Line Above t3 15.85
2 Line Above t1 19.80
2 Line Above t2 11.99
3 Line Above t3 17.32
3 Line Above t1 10.43
4 Line Above t1 12.34
4 Line Above t2 14.25
5 Line Above t3 14.72
5 Line Above t1 15.02
6 Line Above t3 17.94
6 Line Above t1 19.65
7 Line Above t1 18.75
7 Line Below t3 3.25
9 Line Below t1 2.43
10 Line Below t1 7.51
11 Line Below t3 2.15
11 Line Below t1 7.47
12 Line Below t1 1.56
12 Line Below t3 6.03
13 Line Below t1 4.98
我想做的是创建这样的摘要。
我可以手动或在 excel 中执行此操作,但事实上我的实际数据集中有 20 多个级别,这使得此过程乏味且效率低下。我正在寻求帮助,以便使此过程在 r 中更高效、更快速。感谢您的时间和帮助。提前致谢。
我想你想要这个?
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.1.3
df <- read.table(text = 'Id Factor Level Time Score
1 Curve Above t1 16.11
1 Curve Above t2 15.67
1 Curve Above t3 11.24
2 Curve Above t1 17.93
2 Curve Above t2 11.82
2 Curve Above t3 12.95
3 Curve Above t1 12.68
4 Curve Above t1 11.53
4 Curve Above t2 11.74
4 Curve Above t3 14.40
5 Curve Above t1 14.48
5 Curve Above t3 17.32
6 Curve Above t1 11.61
6 Curve Above t2 14.96
7 Curve Above t1 14.00
7 Curve Above t2 10.02
7 Curve Above t3 14.52
8 Curve Above t1 11.85
8 Curve Below t3 3.26
9 Curve Below t1 2.49
9 Curve Below t3 7.00
10 Curve Below t2 3.68
10 Curve Below t3 1.62
11 Curve Below t1 8.08
11 Curve Below t2 1.59
11 Curve Below t3 1.59
1 Line Above t1 10.20
1 Line Above t2 13.20
1 Line Above t3 15.85
2 Line Above t1 19.80
2 Line Above t2 11.99
3 Line Above t3 17.32
3 Line Above t1 10.43
4 Line Above t1 12.34
4 Line Above t2 14.25
5 Line Above t3 14.72
5 Line Above t1 15.02
6 Line Above t3 17.94
6 Line Above t1 19.65
7 Line Above t1 18.75
7 Line Below t3 3.25
9 Line Below t1 2.43
10 Line Below t1 7.51
11 Line Below t3 2.15
11 Line Below t1 7.47
12 Line Below t1 1.56
12 Line Below t3 6.03
13 Line Below t1 4.98', header = TRUE)
df %>%
group_by(Factor, Time) %>%
summarise(median= median(Score),
first = quantile(Score, 0.25),
third = quantile(Score, 0.75),
average = mean(Score[Level == 'Below']),
perc = sum(Level == 'Below')*100/n(),
.groups = 'drop') %>%
pivot_wider(id_cols = Factor, names_from = Time, values_from = median:perc, names_vary = 'slowest')
#> # A tibble: 2 x 16
#> Factor median_t1 first_t1 third_t1 average_t1 perc_t1 median_t2 first_t2
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Curve 12.3 11.5 14.4 5.28 20 11.7 6.85
#> 2 Line 10.3 6.85 16.0 4.79 41.7 13.2 12.6
#> # ... with 8 more variables: third_t2 <dbl>, average_t2 <dbl>, perc_t2 <dbl>,
#> # median_t3 <dbl>, first_t3 <dbl>, third_t3 <dbl>, average_t3 <dbl>,
#> # perc_t3 <dbl>
由 reprex package (v2.0.1)
于 2022-04-08 创建
我有一个包含一个因子变量和多个水平的长数据集,为简单起见,我在这里只使用两个(曲线、直线)。
值(分数)可以是高于(高于阈值)或低于(低于阈值),它是在三个不同的时间点测量的。并非所有 ID 都在所有 3 个时间点进行测量。有些仅在 t1 期间测量,有些在 t1、t2 期间测量,有些在 t1、t3 期间测量,有些在所有时间点测量,t1、t2、t3。
Id Factor Level Time Score
1 Curve Above t1 16.11
1 Curve Above t2 15.67
1 Curve Above t3 11.24
2 Curve Above t1 17.93
2 Curve Above t2 11.82
2 Curve Above t3 12.95
3 Curve Above t1 12.68
4 Curve Above t1 11.53
4 Curve Above t2 11.74
4 Curve Above t3 14.40
5 Curve Above t1 14.48
5 Curve Above t3 17.32
6 Curve Above t1 11.61
6 Curve Above t2 14.96
7 Curve Above t1 14.00
7 Curve Above t2 10.02
7 Curve Above t3 14.52
8 Curve Above t1 11.85
8 Curve Below t3 3.26
9 Curve Below t1 2.49
9 Curve Below t3 7.00
10 Curve Below t2 3.68
10 Curve Below t3 1.62
11 Curve Below t1 8.08
11 Curve Below t2 1.59
11 Curve Below t3 1.59
1 Line Above t1 10.20
1 Line Above t2 13.20
1 Line Above t3 15.85
2 Line Above t1 19.80
2 Line Above t2 11.99
3 Line Above t3 17.32
3 Line Above t1 10.43
4 Line Above t1 12.34
4 Line Above t2 14.25
5 Line Above t3 14.72
5 Line Above t1 15.02
6 Line Above t3 17.94
6 Line Above t1 19.65
7 Line Above t1 18.75
7 Line Below t3 3.25
9 Line Below t1 2.43
10 Line Below t1 7.51
11 Line Below t3 2.15
11 Line Below t1 7.47
12 Line Below t1 1.56
12 Line Below t3 6.03
13 Line Below t1 4.98
我想做的是创建这样的摘要。
我可以手动或在 excel 中执行此操作,但事实上我的实际数据集中有 20 多个级别,这使得此过程乏味且效率低下。我正在寻求帮助,以便使此过程在 r 中更高效、更快速。感谢您的时间和帮助。提前致谢。
我想你想要这个?
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.1.3
df <- read.table(text = 'Id Factor Level Time Score
1 Curve Above t1 16.11
1 Curve Above t2 15.67
1 Curve Above t3 11.24
2 Curve Above t1 17.93
2 Curve Above t2 11.82
2 Curve Above t3 12.95
3 Curve Above t1 12.68
4 Curve Above t1 11.53
4 Curve Above t2 11.74
4 Curve Above t3 14.40
5 Curve Above t1 14.48
5 Curve Above t3 17.32
6 Curve Above t1 11.61
6 Curve Above t2 14.96
7 Curve Above t1 14.00
7 Curve Above t2 10.02
7 Curve Above t3 14.52
8 Curve Above t1 11.85
8 Curve Below t3 3.26
9 Curve Below t1 2.49
9 Curve Below t3 7.00
10 Curve Below t2 3.68
10 Curve Below t3 1.62
11 Curve Below t1 8.08
11 Curve Below t2 1.59
11 Curve Below t3 1.59
1 Line Above t1 10.20
1 Line Above t2 13.20
1 Line Above t3 15.85
2 Line Above t1 19.80
2 Line Above t2 11.99
3 Line Above t3 17.32
3 Line Above t1 10.43
4 Line Above t1 12.34
4 Line Above t2 14.25
5 Line Above t3 14.72
5 Line Above t1 15.02
6 Line Above t3 17.94
6 Line Above t1 19.65
7 Line Above t1 18.75
7 Line Below t3 3.25
9 Line Below t1 2.43
10 Line Below t1 7.51
11 Line Below t3 2.15
11 Line Below t1 7.47
12 Line Below t1 1.56
12 Line Below t3 6.03
13 Line Below t1 4.98', header = TRUE)
df %>%
group_by(Factor, Time) %>%
summarise(median= median(Score),
first = quantile(Score, 0.25),
third = quantile(Score, 0.75),
average = mean(Score[Level == 'Below']),
perc = sum(Level == 'Below')*100/n(),
.groups = 'drop') %>%
pivot_wider(id_cols = Factor, names_from = Time, values_from = median:perc, names_vary = 'slowest')
#> # A tibble: 2 x 16
#> Factor median_t1 first_t1 third_t1 average_t1 perc_t1 median_t2 first_t2
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Curve 12.3 11.5 14.4 5.28 20 11.7 6.85
#> 2 Line 10.3 6.85 16.0 4.79 41.7 13.2 12.6
#> # ... with 8 more variables: third_t2 <dbl>, average_t2 <dbl>, perc_t2 <dbl>,
#> # median_t3 <dbl>, first_t3 <dbl>, third_t3 <dbl>, average_t3 <dbl>,
#> # perc_t3 <dbl>
由 reprex package (v2.0.1)
于 2022-04-08 创建