计算测量的汇总统计数据并将它们旋转到 R 中的列
Calculate summary statistics of measurements and pivot them to columns in R
我有一个这样的数据框
Step <- c("1","1","4","3","2","2","3","4","4","3","1","3","2","4","3","1","2")
Length <- c(0.1,0.5,0.7,0.8,0.2,0.1,0.3,0.8,0.9,0.15,0.25,0.27,0.28,0.61,0.15,0.37,0.18)
Breadth <- c(0.13,0.35,0.87,0.38,0.52,0.71,0.43,0.8,0.9,0.15,0.45,0.7,0.8,0.11,0.11,0.47,0.28)
Height <- c(0.31,0.35,0.37,0.38,0.32,0.51,0.53,0.48,0.9,0.15,0.35,0.32,0.22,0.11,0.17,0.27,0.38)
Width <- c(0.21,0.25,0.27,0.8,0.2,0.21,0.3,0.28,0.29,0.65,0.55,0.37,0.26,0.31,0.5,0.7,0.8)
df <- data.frame(Step,Length,Breadth,Height,Width)
我正在尝试计算按步骤分组的测量值的 最大值、最小值、平均值、中值、标准差,然后将那些具有测量值的列作为列。
我的期望输出是
Measurement max_1 min_1 mean_1 median_1 sd_1 max_2 min_2 mean_2 median_2 sd_2 max_3 min_3 mean_3 median_3 sd_3 max_4 min_4 mean_4 median_4 sd_4
Length 0.50 0.10 0.3050 0.31 0.17058722 0.28 0.10 0.1900 0.190 0.07393691 0.80 0.15 0.334 0.27 0.2693139 0.90 0.61 0.7525 0.750 0.12526638
Breadth 0.47 0.13 0.3500 0.40 0.15577760 0.80 0.28 0.5775 0.615 0.23012680 0.70 0.11 0.354 0.38 0.2383904 0.90 0.11 0.6700 0.835 0.37567720
Height 0.35 0.27 0.3200 0.33 0.03829708 0.51 0.22 0.3575 0.350 0.12120919 0.53 0.15 0.310 0.32 0.1570032 0.90 0.11 0.4650 0.425 0.32888701
Width 0.70 0.21 0.4275 0.40 0.23669601 0.80 0.20 0.3675 0.235 0.28952547 0.80 0.30 0.524 0.50 0.2040343 0.31 0.27 0.2875 0.285 0.01707825
我正在尝试通过这种方式来计算汇总统计数据,但这不是一种有效的方法。
library(dplyr)
df1 <- df %>%
group_by(Step) %>%
summarise(Length_Mean = mean(Length),
Breadth_Mean = mean(Breadth),
Height_Mean = mean(Height),
Width_Mean = mean(Width))
如何使用最少的代码高效地完成我想要的输出?有人能给我指出正确的方向吗?
您可以使用 summarize
的 "scoped" 版本来计算相同的摘要
一次统计多列。来自 ?scoped
:
The variants suffixed with _if, _at or _all apply an expression
(sometimes several) to all variables within a specified subset. This
subset can contain all variables (_all variants), a vars() selection
(_at variants), or variables selected with a predicate (_if variants).
这里summarize_all
可能是个不错的选择;它选择除
对于分组 columns.You 也可以提供几个汇总函数
计算选择中的每个变量。
library(tidyverse)
# Calculate the summary statistics
sums <- df %>%
group_by(Step) %>%
summarize_all(funs(max, min, mean, median, sd))
sums
#> # A tibble: 4 x 21
#> Step Length_max Breadth_max Height_max Width_max Length_min Breadth_min
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.5 0.47 0.35 0.7 0.1 0.13
#> 2 2 0.28 0.8 0.51 0.8 0.1 0.28
#> 3 3 0.8 0.7 0.53 0.8 0.15 0.11
#> 4 4 0.9 0.9 0.9 0.31 0.61 0.11
#> # ... with 14 more variables: Height_min <dbl>, Width_min <dbl>,
#> # Length_mean <dbl>, Breadth_mean <dbl>, Height_mean <dbl>,
#> # Width_mean <dbl>, Length_median <dbl>, Breadth_median <dbl>,
#> # Height_median <dbl>, Width_median <dbl>, Length_sd <dbl>,
#> # Breadth_sd <dbl>, Height_sd <dbl>, Width_sd <dbl>
现在我们有了汇总统计数据,剩下要做的就是
重塑数据以获得所需的输出。为此,gather
、spread
、
来自 tidyr 的 separate
和 unite
派上用场了:
sums %>%
# Reshape to long format
gather(col, val, -Step) %>%
# Separate the measurement and the summary statistic
separate(col, into = c("Measurement", "stat")) %>%
arrange(Step) %>%
# Create the desired column headings
unite(col, stat, Step) %>%
# Need to use factors to preserve order
mutate_at(vars(col, Measurement), fct_inorder) %>%
# Reshape back to wide format
spread(col, val)
#> # A tibble: 4 x 21
#> Measurement max_1 min_1 mean_1 median_1 sd_1 max_2 min_2 mean_2
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Length 0.5 0.1 0.305 0.31 0.171 0.28 0.1 0.19
#> 2 Breadth 0.47 0.13 0.35 0.4 0.156 0.8 0.28 0.578
#> 3 Height 0.35 0.27 0.32 0.330 0.0383 0.51 0.22 0.358
#> 4 Width 0.7 0.21 0.428 0.4 0.237 0.8 0.2 0.368
#> # ... with 12 more variables: median_2 <dbl>, sd_2 <dbl>, max_3 <dbl>,
#> # min_3 <dbl>, mean_3 <dbl>, median_3 <dbl>, sd_3 <dbl>, max_4 <dbl>,
#> # min_4 <dbl>, mean_4 <dbl>, median_4 <dbl>, sd_4 <dbl>
由 reprex package (v0.2.0) 创建于 2018-05-24。
我有一个这样的数据框
Step <- c("1","1","4","3","2","2","3","4","4","3","1","3","2","4","3","1","2")
Length <- c(0.1,0.5,0.7,0.8,0.2,0.1,0.3,0.8,0.9,0.15,0.25,0.27,0.28,0.61,0.15,0.37,0.18)
Breadth <- c(0.13,0.35,0.87,0.38,0.52,0.71,0.43,0.8,0.9,0.15,0.45,0.7,0.8,0.11,0.11,0.47,0.28)
Height <- c(0.31,0.35,0.37,0.38,0.32,0.51,0.53,0.48,0.9,0.15,0.35,0.32,0.22,0.11,0.17,0.27,0.38)
Width <- c(0.21,0.25,0.27,0.8,0.2,0.21,0.3,0.28,0.29,0.65,0.55,0.37,0.26,0.31,0.5,0.7,0.8)
df <- data.frame(Step,Length,Breadth,Height,Width)
我正在尝试计算按步骤分组的测量值的 最大值、最小值、平均值、中值、标准差,然后将那些具有测量值的列作为列。
我的期望输出是
Measurement max_1 min_1 mean_1 median_1 sd_1 max_2 min_2 mean_2 median_2 sd_2 max_3 min_3 mean_3 median_3 sd_3 max_4 min_4 mean_4 median_4 sd_4
Length 0.50 0.10 0.3050 0.31 0.17058722 0.28 0.10 0.1900 0.190 0.07393691 0.80 0.15 0.334 0.27 0.2693139 0.90 0.61 0.7525 0.750 0.12526638
Breadth 0.47 0.13 0.3500 0.40 0.15577760 0.80 0.28 0.5775 0.615 0.23012680 0.70 0.11 0.354 0.38 0.2383904 0.90 0.11 0.6700 0.835 0.37567720
Height 0.35 0.27 0.3200 0.33 0.03829708 0.51 0.22 0.3575 0.350 0.12120919 0.53 0.15 0.310 0.32 0.1570032 0.90 0.11 0.4650 0.425 0.32888701
Width 0.70 0.21 0.4275 0.40 0.23669601 0.80 0.20 0.3675 0.235 0.28952547 0.80 0.30 0.524 0.50 0.2040343 0.31 0.27 0.2875 0.285 0.01707825
我正在尝试通过这种方式来计算汇总统计数据,但这不是一种有效的方法。
library(dplyr)
df1 <- df %>%
group_by(Step) %>%
summarise(Length_Mean = mean(Length),
Breadth_Mean = mean(Breadth),
Height_Mean = mean(Height),
Width_Mean = mean(Width))
如何使用最少的代码高效地完成我想要的输出?有人能给我指出正确的方向吗?
您可以使用 summarize
的 "scoped" 版本来计算相同的摘要
一次统计多列。来自 ?scoped
:
The variants suffixed with _if, _at or _all apply an expression (sometimes several) to all variables within a specified subset. This subset can contain all variables (_all variants), a vars() selection (_at variants), or variables selected with a predicate (_if variants).
这里summarize_all
可能是个不错的选择;它选择除
对于分组 columns.You 也可以提供几个汇总函数
计算选择中的每个变量。
library(tidyverse)
# Calculate the summary statistics
sums <- df %>%
group_by(Step) %>%
summarize_all(funs(max, min, mean, median, sd))
sums
#> # A tibble: 4 x 21
#> Step Length_max Breadth_max Height_max Width_max Length_min Breadth_min
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.5 0.47 0.35 0.7 0.1 0.13
#> 2 2 0.28 0.8 0.51 0.8 0.1 0.28
#> 3 3 0.8 0.7 0.53 0.8 0.15 0.11
#> 4 4 0.9 0.9 0.9 0.31 0.61 0.11
#> # ... with 14 more variables: Height_min <dbl>, Width_min <dbl>,
#> # Length_mean <dbl>, Breadth_mean <dbl>, Height_mean <dbl>,
#> # Width_mean <dbl>, Length_median <dbl>, Breadth_median <dbl>,
#> # Height_median <dbl>, Width_median <dbl>, Length_sd <dbl>,
#> # Breadth_sd <dbl>, Height_sd <dbl>, Width_sd <dbl>
现在我们有了汇总统计数据,剩下要做的就是
重塑数据以获得所需的输出。为此,gather
、spread
、
来自 tidyr 的 separate
和 unite
派上用场了:
sums %>%
# Reshape to long format
gather(col, val, -Step) %>%
# Separate the measurement and the summary statistic
separate(col, into = c("Measurement", "stat")) %>%
arrange(Step) %>%
# Create the desired column headings
unite(col, stat, Step) %>%
# Need to use factors to preserve order
mutate_at(vars(col, Measurement), fct_inorder) %>%
# Reshape back to wide format
spread(col, val)
#> # A tibble: 4 x 21
#> Measurement max_1 min_1 mean_1 median_1 sd_1 max_2 min_2 mean_2
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Length 0.5 0.1 0.305 0.31 0.171 0.28 0.1 0.19
#> 2 Breadth 0.47 0.13 0.35 0.4 0.156 0.8 0.28 0.578
#> 3 Height 0.35 0.27 0.32 0.330 0.0383 0.51 0.22 0.358
#> 4 Width 0.7 0.21 0.428 0.4 0.237 0.8 0.2 0.368
#> # ... with 12 more variables: median_2 <dbl>, sd_2 <dbl>, max_3 <dbl>,
#> # min_3 <dbl>, mean_3 <dbl>, median_3 <dbl>, sd_3 <dbl>, max_4 <dbl>,
#> # min_4 <dbl>, mean_4 <dbl>, median_4 <dbl>, sd_4 <dbl>
由 reprex package (v0.2.0) 创建于 2018-05-24。