如何在 R 中使用 dplyr 包重现此 "apply" 示例?

How to reproduce this "apply" example using dplyr package in R?

我想使用 pastecs 包中提供信息的 stat.desc 函数来按组描述我的数据框中的许多列。我们以 iris 数据集作为 MWE。 所以我对每一列都这样做:

by(iris$Sepal.Length,list(iris$Species),pastecs::stat.desc,norm = TRUE)
by(iris$Sepal.Width,list(iris$Species),pastecs::stat.desc,norm = TRUE)
by(iris$Petal.Length,list(iris$Species),pastecs::stat.desc,norm = TRUE)
by(iris$Petal.Width,list(iris$Species),pastecs::stat.desc,norm = TRUE)

但是当你有很多列时这绝对是乏味的,所以你通常想要对此进行矢量化。经过多次试验,我发现了一种使用 applyby() 函数的方法,如下所示:

apply (iris[,1:4],2,function (x) by (x,list (iris$Species),pastecs::stat.desc,norm=TRUE))

list参数是判断由哪个组,norm=TRUE是属于stat.desc的参数,描述数据的正常性。

结果

$Sepal.Length
: setosa
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      4.30000      5.80000      1.50000    250.30000      5.00000      5.00600      0.04985 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.10018      0.12425      0.35249      0.07041      0.11298      0.16782     -0.45087     -0.34059      0.97770      0.45951 
------------------------------------------------------------------------------------------------------ 
: versicolor
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      4.90000      7.00000      2.10000    296.80000      5.90000      5.93600      0.07300 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.14669      0.26643      0.51617      0.08696      0.09914      0.14727     -0.69391     -0.52418      0.97784      0.46474 
------------------------------------------------------------------------------------------------------ 
: virginica
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      4.90000      7.90000      3.00000    329.40000      6.50000      6.58800      0.08993 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.18071      0.40434      0.63588      0.09652      0.11103      0.16493     -0.20326     -0.15354      0.97118      0.25831 

$Sepal.Width
: setosa
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      2.30000      4.40000      2.10000    171.40000      3.40000      3.42800      0.05361 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.10773      0.14369      0.37906      0.11058      0.03873      0.05753      0.59595      0.45018      0.97172      0.27153 
------------------------------------------------------------------------------------------------------ 
: versicolor
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      2.00000      3.40000      1.40000    138.50000      2.80000      2.77000      0.04438 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.08918      0.09847      0.31380      0.11328     -0.34136     -0.50708     -0.54932     -0.41495      0.97413      0.33800 
------------------------------------------------------------------------------------------------------ 
: virginica
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      2.20000      3.80000      1.60000    148.70000      3.00000      2.97400      0.04561 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.09165      0.10400      0.32250      0.10844      0.34428      0.51141      0.38038      0.28734      0.96739      0.18090 

$Petal.Length
: setosa
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      1.00000      1.90000      0.90000     73.10000      1.50000      1.46200      0.02456 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.04935      0.03016      0.17366      0.11879      0.10010      0.14869      0.65393      0.49397      0.95498      0.05481 
------------------------------------------------------------------------------------------------------ 
: versicolor
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      3.00000      5.10000      2.10000    213.00000      4.35000      4.26000      0.06646 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.13355      0.22082      0.46991      0.11031     -0.57060     -0.84760     -0.19026     -0.14372      0.96600      0.15848 
------------------------------------------------------------------------------------------------------ 
: virginica
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      4.50000      6.90000      2.40000    277.60000      5.55000      5.55200      0.07805 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.15685      0.30459      0.55189      0.09940      0.51692      0.76785     -0.36512     -0.27581      0.96219      0.10978 

$Petal.Width
: setosa
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
   5.000e+01    0.000e+00    0.000e+00    1.000e-01    6.000e-01    5.000e-01    1.230e+01    2.000e-01    2.460e-01    1.490e-02 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
   2.995e-02    1.111e-02    1.054e-01    4.284e-01    1.180e+00    1.752e+00    1.259e+00    9.508e-01    7.998e-01    8.659e-07 
------------------------------------------------------------------------------------------------------ 
: versicolor
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      1.00000      1.80000      0.80000     66.30000      1.30000      1.32600      0.02797 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.05620      0.03911      0.19775      0.14913     -0.02933     -0.04357     -0.58731     -0.44365      0.94763      0.02728 
------------------------------------------------------------------------------------------------------ 
: virginica
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      1.40000      2.50000      1.10000    101.30000      2.00000      2.02600      0.03884 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.07805      0.07543      0.27465      0.13556     -0.12181     -0.18094     -0.75396     -0.56953      0.95977      0.08695 

R> apply (iris[,1:4],2,function (x,y=iris$Species) by (x,list (y),pastecs::stat.desc,norm=TRUE))
$Sepal.Length
: setosa
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      4.30000      5.80000      1.50000    250.30000      5.00000      5.00600      0.04985 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.10018      0.12425      0.35249      0.07041      0.11298      0.16782     -0.45087     -0.34059      0.97770      0.45951 
------------------------------------------------------------------------------------------------------ 
: versicolor
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      4.90000      7.00000      2.10000    296.80000      5.90000      5.93600      0.07300 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.14669      0.26643      0.51617      0.08696      0.09914      0.14727     -0.69391     -0.52418      0.97784      0.46474 
------------------------------------------------------------------------------------------------------ 
: virginica
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      4.90000      7.90000      3.00000    329.40000      6.50000      6.58800      0.08993 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.18071      0.40434      0.63588      0.09652      0.11103      0.16493     -0.20326     -0.15354      0.97118      0.25831 

$Sepal.Width
: setosa
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      2.30000      4.40000      2.10000    171.40000      3.40000      3.42800      0.05361 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.10773      0.14369      0.37906      0.11058      0.03873      0.05753      0.59595      0.45018      0.97172      0.27153 
------------------------------------------------------------------------------------------------------ 
: versicolor
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      2.00000      3.40000      1.40000    138.50000      2.80000      2.77000      0.04438 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.08918      0.09847      0.31380      0.11328     -0.34136     -0.50708     -0.54932     -0.41495      0.97413      0.33800 
------------------------------------------------------------------------------------------------------ 
: virginica
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      2.20000      3.80000      1.60000    148.70000      3.00000      2.97400      0.04561 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.09165      0.10400      0.32250      0.10844      0.34428      0.51141      0.38038      0.28734      0.96739      0.18090 

$Petal.Length
: setosa
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      1.00000      1.90000      0.90000     73.10000      1.50000      1.46200      0.02456 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.04935      0.03016      0.17366      0.11879      0.10010      0.14869      0.65393      0.49397      0.95498      0.05481 
------------------------------------------------------------------------------------------------------ 
: versicolor
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      3.00000      5.10000      2.10000    213.00000      4.35000      4.26000      0.06646 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.13355      0.22082      0.46991      0.11031     -0.57060     -0.84760     -0.19026     -0.14372      0.96600      0.15848 
------------------------------------------------------------------------------------------------------ 
: virginica
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      4.50000      6.90000      2.40000    277.60000      5.55000      5.55200      0.07805 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.15685      0.30459      0.55189      0.09940      0.51692      0.76785     -0.36512     -0.27581      0.96219      0.10978 

$Petal.Width
: setosa
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
   5.000e+01    0.000e+00    0.000e+00    1.000e-01    6.000e-01    5.000e-01    1.230e+01    2.000e-01    2.460e-01    1.490e-02 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
   2.995e-02    1.111e-02    1.054e-01    4.284e-01    1.180e+00    1.752e+00    1.259e+00    9.508e-01    7.998e-01    8.659e-07 
------------------------------------------------------------------------------------------------------ 
: versicolor
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      1.00000      1.80000      0.80000     66.30000      1.30000      1.32600      0.02797 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.05620      0.03911      0.19775      0.14913     -0.02933     -0.04357     -0.58731     -0.44365      0.94763      0.02728 
------------------------------------------------------------------------------------------------------ 
: virginica
     nbr.val     nbr.null       nbr.na          min          max        range          sum       median         mean      SE.mean 
    50.00000      0.00000      0.00000      1.40000      2.50000      1.10000    101.30000      2.00000      2.02600      0.03884 
CI.mean.0.95          var      std.dev     coef.var     skewness     skew.2SE     kurtosis     kurt.2SE   normtest.W   normtest.p 
     0.07805      0.07543      0.27465      0.13556     -0.12181     -0.18094     -0.75396     -0.56953      0.95977      0.08695 

问题

如何使用 dplyr 包重现这些结果?

我的失败试用是:

iris %>%
  group_by (Species) %>%
  summarise_each(funs(pastecs::stat.desc,norm=TRUE))

这是一个使用dplyr

的选项
library(pastecs)
library(dplyr)
res <- iris %>% 
          group_by(Species) %>% 
          do(data.frame(lapply(.[setdiff(names(.), 'Species')],
                           stat.desc, norm = TRUE))) %>%
          mutate(measure = names(stat.desc(Sepal.Length, norm = TRUE)))

编辑:添加了对应于 stat.descnames(基于@Jaap 的建议)

这是使用 dplyr 的解决方案:

library("pastecs")
library(dplyr)
res <- iris %>% group_by(Species) %>%  do(  summary = stat.desc(. ,norm=TRUE) )  

stat.desc() 的结果存储为一个列表,可以像这样访问:

res$summary[res$Species=="setosa"]

我知道你在这里特别要求 "dplyr",但我想我也会分享一个 "data.table" 方法(正如评论中已经提到的)。

基本思路是先把数据做成"long",然后用as.list当运行你的stat.desc函数把数据弄成一个"wide"格式。

library(data.table)
library(pastecs)

melt(setDT(iris), id.vars = "Species")[
  , as.list(stat.desc(value)), .(Species, variable)]

#        Species     variable nbr.val nbr.null nbr.na min max range   sum median
#  1:     setosa Sepal.Length      50        0      0 4.3 5.8   1.5 250.3   5.00
#  2: versicolor Sepal.Length      50        0      0 4.9 7.0   2.1 296.8   5.90
#  3:  virginica Sepal.Length      50        0      0 4.9 7.9   3.0 329.4   6.50
#  4:     setosa  Sepal.Width      50        0      0 2.3 4.4   2.1 171.4   3.40
#  5: versicolor  Sepal.Width      50        0      0 2.0 3.4   1.4 138.5   2.80
#  6:  virginica  Sepal.Width      50        0      0 2.2 3.8   1.6 148.7   3.00
#  7:     setosa Petal.Length      50        0      0 1.0 1.9   0.9  73.1   1.50
#  8: versicolor Petal.Length      50        0      0 3.0 5.1   2.1 213.0   4.35
#  9:  virginica Petal.Length      50        0      0 4.5 6.9   2.4 277.6   5.55
# 10:     setosa  Petal.Width      50        0      0 0.1 0.6   0.5  12.3   0.20
# 11: versicolor  Petal.Width      50        0      0 1.0 1.8   0.8  66.3   1.30
# 12:  virginica  Petal.Width      50        0      0 1.4 2.5   1.1 101.3   2.00
#      mean    SE.mean CI.mean.0.95        var   std.dev   coef.var
#  1: 5.006 0.04984957   0.10017646 0.12424898 0.3524897 0.07041344
#  2: 5.936 0.07299762   0.14669422 0.26643265 0.5161711 0.08695606
#  3: 6.588 0.08992695   0.18071498 0.40434286 0.6358796 0.09652089
#  4: 3.428 0.05360780   0.10772890 0.14368980 0.3790644 0.11057887
#  5: 2.770 0.04437778   0.08918050 0.09846939 0.3137983 0.11328459
#  6: 2.974 0.04560791   0.09165253 0.10400408 0.3224966 0.10843868
#  7: 1.462 0.02455980   0.04935476 0.03015918 0.1736640 0.11878522
#  8: 4.260 0.06645545   0.13354722 0.22081633 0.4699110 0.11030774
#  9: 5.552 0.07804970   0.15684674 0.30458776 0.5518947 0.09940466
# 10: 0.246 0.01490377   0.02995025 0.01110612 0.1053856 0.42839670
# 11: 1.326 0.02796645   0.05620069 0.03910612 0.1977527 0.14913475
# 12: 2.026 0.03884138   0.07805468 0.07543265 0.2746501 0.13556271

我想你可以用这样的方式 dplyr-ify 答案:

library(data.table)
library(dplyr)
library(pastecs)

tbl_dt(iris) %>%
  melt(id.vars = "Species") %>%
  .[, as.list(stat.desc(value)), .(Species, variable)]

更新

如果您想留在 Hadleyverse 中,可以使用 "purrr" 和 "broom" 的组合,而不是 "dplyr":

library(purrr)
library(pastecs)
library(broom)

iris[-5] %>%
  split(iris[5]) %>%
  map_df(~ fix_data_frame(sapply(., stat.desc)), .id = "Species")
# Source: local data frame [42 x 6]
# 
#    Species     term Sepal.Length Sepal.Width Petal.Length Petal.Width
#      (chr)    (chr)        (dbl)       (dbl)        (dbl)       (dbl)
# 1   setosa  nbr.val  50.00000000  50.0000000   50.0000000 50.00000000
# 2   setosa nbr.null   0.00000000   0.0000000    0.0000000  0.00000000
# 3   setosa   nbr.na   0.00000000   0.0000000    0.0000000  0.00000000
# 4   setosa      min   4.30000000   2.3000000    1.0000000  0.10000000
# 5   setosa      max   5.80000000   4.4000000    1.9000000  0.60000000
# 6   setosa    range   1.50000000   2.1000000    0.9000000  0.50000000
# 7   setosa      sum 250.30000000 171.4000000   73.1000000 12.30000000
# 8   setosa   median   5.00000000   3.4000000    1.5000000  0.20000000
# 9   setosa     mean   5.00600000   3.4280000    1.4620000  0.24600000
# 10  setosa  SE.mean   0.04984957   0.0536078    0.0245598  0.01490377
# ..     ...      ...          ...         ...          ...         ...