R Dplyr mutate,计算每一行的标准差
R Dplyr mutate, calculating standard deviation for each row
我正在尝试计算数据框中某些列的均值和标准差,并将这些值 return 计算到数据框中的新列。我可以让它为平均值工作:
library(dplyr)
mtcars = mutate(mtcars, mean=(hp+drat+wt)/3)
但是,当我尝试对标准偏差执行相同操作时,我遇到了一个问题,因为我无法像对均值所做的那样轻松地对方程式进行硬编码。于是,我尝试使用一个函数,如下:
mtcars = mutate(mtcars, mean=(hp+drat+wt)/3, stdev = sd(hp,drat,wt))
导致错误"Error in sd(hp, drat, wt) : unused argument (wt)"。我怎样才能更正我的语法?谢谢。
你可以试试
library(dplyr)
library(matrixStats)
nm1 <- c('hp', 'drat', 'wt')
res1 <- mtcars %>%
mutate(Mean= rowMeans(.[nm1]), stdev=rowSds(as.matrix(.[nm1])))
head(res1,3)
# mpg cyl disp hp drat wt qsec vs am gear carb Mean stdev
#1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 38.84000 61.62969
#2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 38.92500 61.55489
#3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 33.05667 51.91809
或使用do
res2 <- mtcars %>%
rowwise() %>%
do(data.frame(., Mean=mean(unlist(.[nm1])),
stdev=sd(unlist(.[nm1]))))
head(res2,3)
# mpg cyl disp hp drat wt qsec vs am gear carb Mean stdev
#1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 38.84000 61.62969
#2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 38.92500 61.55489
#3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 33.05667 51.91809
您也可以编写自己的向量化 RowSD
函数,如
RowSD <- function(x) {
sqrt(rowSums((x - rowMeans(x))^2)/(dim(x)[2] - 1))
}
然后
mtcars %>%
mutate(mean = (hp + drat + wt)/3, stdev = RowSD(cbind(hp, drat, wt)))
## mpg cyl disp hp drat wt qsec vs am gear carb mean stdev
## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 38.84000 61.62969
## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 38.92500 61.55489
## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 33.05667 51.91809
## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 38.76500 61.69136
## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 60.53000 99.13403
## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 37.07333 58.82726
## ...
不需要太多更改,只需添加 rowwise()
(感谢@akrun 的评论)并将您的列名称包装在 c(...)
中(以修复错误):
library(dplyr)
mtcars %>%
rowwise() %>%
mutate(mean=(hp+drat+wt)/3, stdev = sd(c(hp,drat,wt)))
## Source: local data frame [32 x 13]
## Groups: <by row>
## mpg cyl disp hp drat wt qsec vs am gear carb mean stdev
## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 38.84000 61.62969
## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 38.92500 61.55489
## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 33.05667 51.91809
## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 38.76500 61.69136
## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 60.53000 99.13403
## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 37.07333 58.82726
## 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 83.92667 139.49371
## 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 22.96000 33.81056
## 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 34.02333 52.80875
## 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 43.45333 68.88985
## .. ... ... ... ... ... ... ... .. .. ... ... ... ...
我正在尝试计算数据框中某些列的均值和标准差,并将这些值 return 计算到数据框中的新列。我可以让它为平均值工作:
library(dplyr)
mtcars = mutate(mtcars, mean=(hp+drat+wt)/3)
但是,当我尝试对标准偏差执行相同操作时,我遇到了一个问题,因为我无法像对均值所做的那样轻松地对方程式进行硬编码。于是,我尝试使用一个函数,如下:
mtcars = mutate(mtcars, mean=(hp+drat+wt)/3, stdev = sd(hp,drat,wt))
导致错误"Error in sd(hp, drat, wt) : unused argument (wt)"。我怎样才能更正我的语法?谢谢。
你可以试试
library(dplyr)
library(matrixStats)
nm1 <- c('hp', 'drat', 'wt')
res1 <- mtcars %>%
mutate(Mean= rowMeans(.[nm1]), stdev=rowSds(as.matrix(.[nm1])))
head(res1,3)
# mpg cyl disp hp drat wt qsec vs am gear carb Mean stdev
#1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 38.84000 61.62969
#2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 38.92500 61.55489
#3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 33.05667 51.91809
或使用do
res2 <- mtcars %>%
rowwise() %>%
do(data.frame(., Mean=mean(unlist(.[nm1])),
stdev=sd(unlist(.[nm1]))))
head(res2,3)
# mpg cyl disp hp drat wt qsec vs am gear carb Mean stdev
#1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 38.84000 61.62969
#2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 38.92500 61.55489
#3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 33.05667 51.91809
您也可以编写自己的向量化 RowSD
函数,如
RowSD <- function(x) {
sqrt(rowSums((x - rowMeans(x))^2)/(dim(x)[2] - 1))
}
然后
mtcars %>%
mutate(mean = (hp + drat + wt)/3, stdev = RowSD(cbind(hp, drat, wt)))
## mpg cyl disp hp drat wt qsec vs am gear carb mean stdev
## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 38.84000 61.62969
## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 38.92500 61.55489
## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 33.05667 51.91809
## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 38.76500 61.69136
## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 60.53000 99.13403
## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 37.07333 58.82726
## ...
不需要太多更改,只需添加 rowwise()
(感谢@akrun 的评论)并将您的列名称包装在 c(...)
中(以修复错误):
library(dplyr)
mtcars %>%
rowwise() %>%
mutate(mean=(hp+drat+wt)/3, stdev = sd(c(hp,drat,wt)))
## Source: local data frame [32 x 13]
## Groups: <by row>
## mpg cyl disp hp drat wt qsec vs am gear carb mean stdev
## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 38.84000 61.62969
## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 38.92500 61.55489
## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 33.05667 51.91809
## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 38.76500 61.69136
## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 60.53000 99.13403
## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 37.07333 58.82726
## 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 83.92667 139.49371
## 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 22.96000 33.81056
## 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 34.02333 52.80875
## 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 43.45333 68.88985
## .. ... ... ... ... ... ... ... .. .. ... ... ... ...