dplyr:组均值居中(变异+总结)
dplyr: group mean centering (mutate + summarize)
用 dplyr 进行组均值居中的 efficient/preferred 方法是什么,即获取组中的每个元素 (mutate
) 并对其执行操作和汇总统计信息 (summarize
) 该组。以下是如何使用基数 R:
以 mtcars
为中心进行分组均值
do.call(rbind, lapply(split(mtcars, mtcars$cyl), function(x){
x[["cent"]] <- x$mpg - mean(x$mpg)
x
}))
你可以试试
library(dplyr)
mtcars %>%
add_rownames()%>% #if the rownames are needed as a column
group_by(cyl) %>%
mutate(cent= mpg-mean(mpg))
上面的代码似乎使用了全局均值来使 mpg 居中;想以组内均值为中心,即每个cyl组水平的均值不一样怎么办
> mtcars %>%
+ add_rownames()%>% #if the rownames are needed as a column
+ group_by(cyl) %>%
+ mutate(cent= mpg-mean(mpg))%>%
+ dplyr ::select(cent)
Adding missing grouping variables: `cyl`
# A tibble: 32 x 2
# Groups: cyl [3]
cyl cent
<dbl> <dbl>
1 6 0.909
2 6 0.909
3 4 2.71
4 6 1.31
5 8 -1.39
6 6 -1.99
7 8 -5.79
8 4 4.31
9 4 2.71
10 6 -0.891
# … with 22 more rows
Warning message:
Deprecated, use tibble::rownames_to_column() instead.
> mtcars$mpg[1:5]-mean(mtcars$mpg)
[1] 0.909375 0.909375 2.709375 1.309375 -1.390625
您可以试试这个(尽管显示的新变量的名称不同):
mtcars %>%
group_by(cyl) %>%
mutate(gpcent = scale(mpg, scale = F))
用 dplyr 进行组均值居中的 efficient/preferred 方法是什么,即获取组中的每个元素 (mutate
) 并对其执行操作和汇总统计信息 (summarize
) 该组。以下是如何使用基数 R:
mtcars
为中心进行分组均值
do.call(rbind, lapply(split(mtcars, mtcars$cyl), function(x){
x[["cent"]] <- x$mpg - mean(x$mpg)
x
}))
你可以试试
library(dplyr)
mtcars %>%
add_rownames()%>% #if the rownames are needed as a column
group_by(cyl) %>%
mutate(cent= mpg-mean(mpg))
上面的代码似乎使用了全局均值来使 mpg 居中;想以组内均值为中心,即每个cyl组水平的均值不一样怎么办
> mtcars %>%
+ add_rownames()%>% #if the rownames are needed as a column
+ group_by(cyl) %>%
+ mutate(cent= mpg-mean(mpg))%>%
+ dplyr ::select(cent)
Adding missing grouping variables: `cyl`
# A tibble: 32 x 2
# Groups: cyl [3]
cyl cent
<dbl> <dbl>
1 6 0.909
2 6 0.909
3 4 2.71
4 6 1.31
5 8 -1.39
6 6 -1.99
7 8 -5.79
8 4 4.31
9 4 2.71
10 6 -0.891
# … with 22 more rows
Warning message:
Deprecated, use tibble::rownames_to_column() instead.
> mtcars$mpg[1:5]-mean(mtcars$mpg)
[1] 0.909375 0.909375 2.709375 1.309375 -1.390625
您可以试试这个(尽管显示的新变量的名称不同):
mtcars %>%
group_by(cyl) %>%
mutate(gpcent = scale(mpg, scale = F))