R Dplyr mutate,计算每一行的标准差

R Dplyr mutate, calculating standard deviation for each row

我正在尝试计算数据框中某些列的均值和标准差,并将这些值 return 计算到数据框中的新列。我可以让它为平均值工作:

library(dplyr)
mtcars = mutate(mtcars, mean=(hp+drat+wt)/3)

但是,当我尝试对标准偏差执行相同操作时,我遇到了一个问题,因为我无法像对均值所做的那样轻松地对方程式进行硬编码。于是,我尝试使用一个函数,如下:

mtcars = mutate(mtcars, mean=(hp+drat+wt)/3, stdev = sd(hp,drat,wt))

导致错误"Error in sd(hp, drat, wt) : unused argument (wt)"。我怎样才能更正我的语法?谢谢。

你可以试试

library(dplyr)
library(matrixStats)
nm1 <- c('hp', 'drat', 'wt')
res1 <- mtcars %>% 
           mutate(Mean= rowMeans(.[nm1]), stdev=rowSds(as.matrix(.[nm1])))

head(res1,3)
#   mpg cyl disp  hp drat    wt  qsec vs am gear carb     Mean    stdev
#1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 38.84000 61.62969
#2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 38.92500 61.55489
#3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 33.05667 51.91809

或使用do

res2 <- mtcars %>% 
             rowwise() %>%
             do(data.frame(., Mean=mean(unlist(.[nm1])),
                         stdev=sd(unlist(.[nm1]))))

head(res2,3)
#   mpg cyl disp  hp drat    wt  qsec vs am gear carb     Mean    stdev
#1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 38.84000 61.62969
#2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 38.92500 61.55489
#3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 33.05667 51.91809

您也可以编写自己的向量化 RowSD 函数,如

RowSD <- function(x) {
  sqrt(rowSums((x - rowMeans(x))^2)/(dim(x)[2] - 1))
}

然后

mtcars %>% 
  mutate(mean = (hp + drat + wt)/3, stdev = RowSD(cbind(hp, drat, wt)))
##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb      mean     stdev
## 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  38.84000  61.62969
## 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  38.92500  61.55489
## 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  33.05667  51.91809
## 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1  38.76500  61.69136
## 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2  60.53000  99.13403
## 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1  37.07333  58.82726
## ...

不需要太多更改,只需添加 rowwise()(感谢@akrun 的评论)并将您的列名称包装在 c(...) 中(以修复错误):

library(dplyr)
mtcars %>%
    rowwise() %>%
    mutate(mean=(hp+drat+wt)/3, stdev = sd(c(hp,drat,wt)))
## Source: local data frame [32 x 13]
## Groups: <by row>
##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb     mean     stdev
## 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 38.84000  61.62969
## 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 38.92500  61.55489
## 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 33.05667  51.91809
## 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 38.76500  61.69136
## 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 60.53000  99.13403
## 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1 37.07333  58.82726
## 7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4 83.92667 139.49371
## 8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2 22.96000  33.81056
## 9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2 34.02333  52.80875
## 10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 43.45333  68.88985
## ..  ... ...   ... ...  ...   ...   ... .. ..  ...  ...      ...       ...