如何在 R 中使用 dplyr 包重现此 "apply" 示例?
How to reproduce this "apply" example using dplyr package in R?
我想使用 pastecs
包中提供信息的 stat.desc
函数来按组描述我的数据框中的许多列。我们以 iris
数据集作为 MWE。
所以我对每一列都这样做:
by(iris$Sepal.Length,list(iris$Species),pastecs::stat.desc,norm = TRUE)
by(iris$Sepal.Width,list(iris$Species),pastecs::stat.desc,norm = TRUE)
by(iris$Petal.Length,list(iris$Species),pastecs::stat.desc,norm = TRUE)
by(iris$Petal.Width,list(iris$Species),pastecs::stat.desc,norm = TRUE)
但是当你有很多列时这绝对是乏味的,所以你通常想要对此进行矢量化。经过多次试验,我发现了一种使用 apply
和 by()
函数的方法,如下所示:
apply (iris[,1:4],2,function (x) by (x,list (iris$Species),pastecs::stat.desc,norm=TRUE))
list
参数是判断由哪个组,norm=TRUE
是属于stat.desc的参数,描述数据的正常性。
结果
$Sepal.Length
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.30000 5.80000 1.50000 250.30000 5.00000 5.00600 0.04985
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.10018 0.12425 0.35249 0.07041 0.11298 0.16782 -0.45087 -0.34059 0.97770 0.45951
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.90000 7.00000 2.10000 296.80000 5.90000 5.93600 0.07300
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.14669 0.26643 0.51617 0.08696 0.09914 0.14727 -0.69391 -0.52418 0.97784 0.46474
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.90000 7.90000 3.00000 329.40000 6.50000 6.58800 0.08993
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.18071 0.40434 0.63588 0.09652 0.11103 0.16493 -0.20326 -0.15354 0.97118 0.25831
$Sepal.Width
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.30000 4.40000 2.10000 171.40000 3.40000 3.42800 0.05361
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.10773 0.14369 0.37906 0.11058 0.03873 0.05753 0.59595 0.45018 0.97172 0.27153
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.00000 3.40000 1.40000 138.50000 2.80000 2.77000 0.04438
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.08918 0.09847 0.31380 0.11328 -0.34136 -0.50708 -0.54932 -0.41495 0.97413 0.33800
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.20000 3.80000 1.60000 148.70000 3.00000 2.97400 0.04561
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.09165 0.10400 0.32250 0.10844 0.34428 0.51141 0.38038 0.28734 0.96739 0.18090
$Petal.Length
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.00000 1.90000 0.90000 73.10000 1.50000 1.46200 0.02456
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.04935 0.03016 0.17366 0.11879 0.10010 0.14869 0.65393 0.49397 0.95498 0.05481
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 3.00000 5.10000 2.10000 213.00000 4.35000 4.26000 0.06646
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.13355 0.22082 0.46991 0.11031 -0.57060 -0.84760 -0.19026 -0.14372 0.96600 0.15848
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.50000 6.90000 2.40000 277.60000 5.55000 5.55200 0.07805
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.15685 0.30459 0.55189 0.09940 0.51692 0.76785 -0.36512 -0.27581 0.96219 0.10978
$Petal.Width
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
5.000e+01 0.000e+00 0.000e+00 1.000e-01 6.000e-01 5.000e-01 1.230e+01 2.000e-01 2.460e-01 1.490e-02
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
2.995e-02 1.111e-02 1.054e-01 4.284e-01 1.180e+00 1.752e+00 1.259e+00 9.508e-01 7.998e-01 8.659e-07
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.00000 1.80000 0.80000 66.30000 1.30000 1.32600 0.02797
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.05620 0.03911 0.19775 0.14913 -0.02933 -0.04357 -0.58731 -0.44365 0.94763 0.02728
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.40000 2.50000 1.10000 101.30000 2.00000 2.02600 0.03884
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.07805 0.07543 0.27465 0.13556 -0.12181 -0.18094 -0.75396 -0.56953 0.95977 0.08695
R> apply (iris[,1:4],2,function (x,y=iris$Species) by (x,list (y),pastecs::stat.desc,norm=TRUE))
$Sepal.Length
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.30000 5.80000 1.50000 250.30000 5.00000 5.00600 0.04985
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.10018 0.12425 0.35249 0.07041 0.11298 0.16782 -0.45087 -0.34059 0.97770 0.45951
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.90000 7.00000 2.10000 296.80000 5.90000 5.93600 0.07300
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.14669 0.26643 0.51617 0.08696 0.09914 0.14727 -0.69391 -0.52418 0.97784 0.46474
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.90000 7.90000 3.00000 329.40000 6.50000 6.58800 0.08993
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.18071 0.40434 0.63588 0.09652 0.11103 0.16493 -0.20326 -0.15354 0.97118 0.25831
$Sepal.Width
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.30000 4.40000 2.10000 171.40000 3.40000 3.42800 0.05361
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.10773 0.14369 0.37906 0.11058 0.03873 0.05753 0.59595 0.45018 0.97172 0.27153
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.00000 3.40000 1.40000 138.50000 2.80000 2.77000 0.04438
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.08918 0.09847 0.31380 0.11328 -0.34136 -0.50708 -0.54932 -0.41495 0.97413 0.33800
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.20000 3.80000 1.60000 148.70000 3.00000 2.97400 0.04561
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.09165 0.10400 0.32250 0.10844 0.34428 0.51141 0.38038 0.28734 0.96739 0.18090
$Petal.Length
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.00000 1.90000 0.90000 73.10000 1.50000 1.46200 0.02456
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.04935 0.03016 0.17366 0.11879 0.10010 0.14869 0.65393 0.49397 0.95498 0.05481
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 3.00000 5.10000 2.10000 213.00000 4.35000 4.26000 0.06646
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.13355 0.22082 0.46991 0.11031 -0.57060 -0.84760 -0.19026 -0.14372 0.96600 0.15848
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.50000 6.90000 2.40000 277.60000 5.55000 5.55200 0.07805
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.15685 0.30459 0.55189 0.09940 0.51692 0.76785 -0.36512 -0.27581 0.96219 0.10978
$Petal.Width
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
5.000e+01 0.000e+00 0.000e+00 1.000e-01 6.000e-01 5.000e-01 1.230e+01 2.000e-01 2.460e-01 1.490e-02
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
2.995e-02 1.111e-02 1.054e-01 4.284e-01 1.180e+00 1.752e+00 1.259e+00 9.508e-01 7.998e-01 8.659e-07
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.00000 1.80000 0.80000 66.30000 1.30000 1.32600 0.02797
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.05620 0.03911 0.19775 0.14913 -0.02933 -0.04357 -0.58731 -0.44365 0.94763 0.02728
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.40000 2.50000 1.10000 101.30000 2.00000 2.02600 0.03884
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.07805 0.07543 0.27465 0.13556 -0.12181 -0.18094 -0.75396 -0.56953 0.95977 0.08695
问题
如何使用 dplyr
包重现这些结果?
我的失败试用是:
iris %>%
group_by (Species) %>%
summarise_each(funs(pastecs::stat.desc,norm=TRUE))
这是一个使用dplyr
的选项
library(pastecs)
library(dplyr)
res <- iris %>%
group_by(Species) %>%
do(data.frame(lapply(.[setdiff(names(.), 'Species')],
stat.desc, norm = TRUE))) %>%
mutate(measure = names(stat.desc(Sepal.Length, norm = TRUE)))
编辑:添加了对应于 stat.desc
的 names
(基于@Jaap 的建议)
这是使用 dplyr
的解决方案:
library("pastecs")
library(dplyr)
res <- iris %>% group_by(Species) %>% do( summary = stat.desc(. ,norm=TRUE) )
stat.desc()
的结果存储为一个列表,可以像这样访问:
res$summary[res$Species=="setosa"]
我知道你在这里特别要求 "dplyr",但我想我也会分享一个 "data.table" 方法(正如评论中已经提到的)。
基本思路是先把数据做成"long",然后用as.list
当运行你的stat.desc
函数把数据弄成一个"wide"格式。
library(data.table)
library(pastecs)
melt(setDT(iris), id.vars = "Species")[
, as.list(stat.desc(value)), .(Species, variable)]
# Species variable nbr.val nbr.null nbr.na min max range sum median
# 1: setosa Sepal.Length 50 0 0 4.3 5.8 1.5 250.3 5.00
# 2: versicolor Sepal.Length 50 0 0 4.9 7.0 2.1 296.8 5.90
# 3: virginica Sepal.Length 50 0 0 4.9 7.9 3.0 329.4 6.50
# 4: setosa Sepal.Width 50 0 0 2.3 4.4 2.1 171.4 3.40
# 5: versicolor Sepal.Width 50 0 0 2.0 3.4 1.4 138.5 2.80
# 6: virginica Sepal.Width 50 0 0 2.2 3.8 1.6 148.7 3.00
# 7: setosa Petal.Length 50 0 0 1.0 1.9 0.9 73.1 1.50
# 8: versicolor Petal.Length 50 0 0 3.0 5.1 2.1 213.0 4.35
# 9: virginica Petal.Length 50 0 0 4.5 6.9 2.4 277.6 5.55
# 10: setosa Petal.Width 50 0 0 0.1 0.6 0.5 12.3 0.20
# 11: versicolor Petal.Width 50 0 0 1.0 1.8 0.8 66.3 1.30
# 12: virginica Petal.Width 50 0 0 1.4 2.5 1.1 101.3 2.00
# mean SE.mean CI.mean.0.95 var std.dev coef.var
# 1: 5.006 0.04984957 0.10017646 0.12424898 0.3524897 0.07041344
# 2: 5.936 0.07299762 0.14669422 0.26643265 0.5161711 0.08695606
# 3: 6.588 0.08992695 0.18071498 0.40434286 0.6358796 0.09652089
# 4: 3.428 0.05360780 0.10772890 0.14368980 0.3790644 0.11057887
# 5: 2.770 0.04437778 0.08918050 0.09846939 0.3137983 0.11328459
# 6: 2.974 0.04560791 0.09165253 0.10400408 0.3224966 0.10843868
# 7: 1.462 0.02455980 0.04935476 0.03015918 0.1736640 0.11878522
# 8: 4.260 0.06645545 0.13354722 0.22081633 0.4699110 0.11030774
# 9: 5.552 0.07804970 0.15684674 0.30458776 0.5518947 0.09940466
# 10: 0.246 0.01490377 0.02995025 0.01110612 0.1053856 0.42839670
# 11: 1.326 0.02796645 0.05620069 0.03910612 0.1977527 0.14913475
# 12: 2.026 0.03884138 0.07805468 0.07543265 0.2746501 0.13556271
我想你可以用这样的方式 dplyr-ify 答案:
library(data.table)
library(dplyr)
library(pastecs)
tbl_dt(iris) %>%
melt(id.vars = "Species") %>%
.[, as.list(stat.desc(value)), .(Species, variable)]
更新
如果您想留在 Hadleyverse 中,可以使用 "purrr" 和 "broom" 的组合,而不是 "dplyr":
library(purrr)
library(pastecs)
library(broom)
iris[-5] %>%
split(iris[5]) %>%
map_df(~ fix_data_frame(sapply(., stat.desc)), .id = "Species")
# Source: local data frame [42 x 6]
#
# Species term Sepal.Length Sepal.Width Petal.Length Petal.Width
# (chr) (chr) (dbl) (dbl) (dbl) (dbl)
# 1 setosa nbr.val 50.00000000 50.0000000 50.0000000 50.00000000
# 2 setosa nbr.null 0.00000000 0.0000000 0.0000000 0.00000000
# 3 setosa nbr.na 0.00000000 0.0000000 0.0000000 0.00000000
# 4 setosa min 4.30000000 2.3000000 1.0000000 0.10000000
# 5 setosa max 5.80000000 4.4000000 1.9000000 0.60000000
# 6 setosa range 1.50000000 2.1000000 0.9000000 0.50000000
# 7 setosa sum 250.30000000 171.4000000 73.1000000 12.30000000
# 8 setosa median 5.00000000 3.4000000 1.5000000 0.20000000
# 9 setosa mean 5.00600000 3.4280000 1.4620000 0.24600000
# 10 setosa SE.mean 0.04984957 0.0536078 0.0245598 0.01490377
# .. ... ... ... ... ... ...
我想使用 pastecs
包中提供信息的 stat.desc
函数来按组描述我的数据框中的许多列。我们以 iris
数据集作为 MWE。
所以我对每一列都这样做:
by(iris$Sepal.Length,list(iris$Species),pastecs::stat.desc,norm = TRUE)
by(iris$Sepal.Width,list(iris$Species),pastecs::stat.desc,norm = TRUE)
by(iris$Petal.Length,list(iris$Species),pastecs::stat.desc,norm = TRUE)
by(iris$Petal.Width,list(iris$Species),pastecs::stat.desc,norm = TRUE)
但是当你有很多列时这绝对是乏味的,所以你通常想要对此进行矢量化。经过多次试验,我发现了一种使用 apply
和 by()
函数的方法,如下所示:
apply (iris[,1:4],2,function (x) by (x,list (iris$Species),pastecs::stat.desc,norm=TRUE))
list
参数是判断由哪个组,norm=TRUE
是属于stat.desc的参数,描述数据的正常性。
结果
$Sepal.Length
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.30000 5.80000 1.50000 250.30000 5.00000 5.00600 0.04985
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.10018 0.12425 0.35249 0.07041 0.11298 0.16782 -0.45087 -0.34059 0.97770 0.45951
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.90000 7.00000 2.10000 296.80000 5.90000 5.93600 0.07300
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.14669 0.26643 0.51617 0.08696 0.09914 0.14727 -0.69391 -0.52418 0.97784 0.46474
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.90000 7.90000 3.00000 329.40000 6.50000 6.58800 0.08993
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.18071 0.40434 0.63588 0.09652 0.11103 0.16493 -0.20326 -0.15354 0.97118 0.25831
$Sepal.Width
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.30000 4.40000 2.10000 171.40000 3.40000 3.42800 0.05361
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.10773 0.14369 0.37906 0.11058 0.03873 0.05753 0.59595 0.45018 0.97172 0.27153
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.00000 3.40000 1.40000 138.50000 2.80000 2.77000 0.04438
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.08918 0.09847 0.31380 0.11328 -0.34136 -0.50708 -0.54932 -0.41495 0.97413 0.33800
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.20000 3.80000 1.60000 148.70000 3.00000 2.97400 0.04561
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.09165 0.10400 0.32250 0.10844 0.34428 0.51141 0.38038 0.28734 0.96739 0.18090
$Petal.Length
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.00000 1.90000 0.90000 73.10000 1.50000 1.46200 0.02456
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.04935 0.03016 0.17366 0.11879 0.10010 0.14869 0.65393 0.49397 0.95498 0.05481
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 3.00000 5.10000 2.10000 213.00000 4.35000 4.26000 0.06646
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.13355 0.22082 0.46991 0.11031 -0.57060 -0.84760 -0.19026 -0.14372 0.96600 0.15848
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.50000 6.90000 2.40000 277.60000 5.55000 5.55200 0.07805
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.15685 0.30459 0.55189 0.09940 0.51692 0.76785 -0.36512 -0.27581 0.96219 0.10978
$Petal.Width
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
5.000e+01 0.000e+00 0.000e+00 1.000e-01 6.000e-01 5.000e-01 1.230e+01 2.000e-01 2.460e-01 1.490e-02
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
2.995e-02 1.111e-02 1.054e-01 4.284e-01 1.180e+00 1.752e+00 1.259e+00 9.508e-01 7.998e-01 8.659e-07
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.00000 1.80000 0.80000 66.30000 1.30000 1.32600 0.02797
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.05620 0.03911 0.19775 0.14913 -0.02933 -0.04357 -0.58731 -0.44365 0.94763 0.02728
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.40000 2.50000 1.10000 101.30000 2.00000 2.02600 0.03884
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.07805 0.07543 0.27465 0.13556 -0.12181 -0.18094 -0.75396 -0.56953 0.95977 0.08695
R> apply (iris[,1:4],2,function (x,y=iris$Species) by (x,list (y),pastecs::stat.desc,norm=TRUE))
$Sepal.Length
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.30000 5.80000 1.50000 250.30000 5.00000 5.00600 0.04985
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.10018 0.12425 0.35249 0.07041 0.11298 0.16782 -0.45087 -0.34059 0.97770 0.45951
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.90000 7.00000 2.10000 296.80000 5.90000 5.93600 0.07300
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.14669 0.26643 0.51617 0.08696 0.09914 0.14727 -0.69391 -0.52418 0.97784 0.46474
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.90000 7.90000 3.00000 329.40000 6.50000 6.58800 0.08993
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.18071 0.40434 0.63588 0.09652 0.11103 0.16493 -0.20326 -0.15354 0.97118 0.25831
$Sepal.Width
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.30000 4.40000 2.10000 171.40000 3.40000 3.42800 0.05361
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.10773 0.14369 0.37906 0.11058 0.03873 0.05753 0.59595 0.45018 0.97172 0.27153
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.00000 3.40000 1.40000 138.50000 2.80000 2.77000 0.04438
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.08918 0.09847 0.31380 0.11328 -0.34136 -0.50708 -0.54932 -0.41495 0.97413 0.33800
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.20000 3.80000 1.60000 148.70000 3.00000 2.97400 0.04561
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.09165 0.10400 0.32250 0.10844 0.34428 0.51141 0.38038 0.28734 0.96739 0.18090
$Petal.Length
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.00000 1.90000 0.90000 73.10000 1.50000 1.46200 0.02456
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.04935 0.03016 0.17366 0.11879 0.10010 0.14869 0.65393 0.49397 0.95498 0.05481
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 3.00000 5.10000 2.10000 213.00000 4.35000 4.26000 0.06646
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.13355 0.22082 0.46991 0.11031 -0.57060 -0.84760 -0.19026 -0.14372 0.96600 0.15848
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.50000 6.90000 2.40000 277.60000 5.55000 5.55200 0.07805
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.15685 0.30459 0.55189 0.09940 0.51692 0.76785 -0.36512 -0.27581 0.96219 0.10978
$Petal.Width
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
5.000e+01 0.000e+00 0.000e+00 1.000e-01 6.000e-01 5.000e-01 1.230e+01 2.000e-01 2.460e-01 1.490e-02
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
2.995e-02 1.111e-02 1.054e-01 4.284e-01 1.180e+00 1.752e+00 1.259e+00 9.508e-01 7.998e-01 8.659e-07
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.00000 1.80000 0.80000 66.30000 1.30000 1.32600 0.02797
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.05620 0.03911 0.19775 0.14913 -0.02933 -0.04357 -0.58731 -0.44365 0.94763 0.02728
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.40000 2.50000 1.10000 101.30000 2.00000 2.02600 0.03884
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.07805 0.07543 0.27465 0.13556 -0.12181 -0.18094 -0.75396 -0.56953 0.95977 0.08695
问题
如何使用 dplyr
包重现这些结果?
我的失败试用是:
iris %>%
group_by (Species) %>%
summarise_each(funs(pastecs::stat.desc,norm=TRUE))
这是一个使用dplyr
library(pastecs)
library(dplyr)
res <- iris %>%
group_by(Species) %>%
do(data.frame(lapply(.[setdiff(names(.), 'Species')],
stat.desc, norm = TRUE))) %>%
mutate(measure = names(stat.desc(Sepal.Length, norm = TRUE)))
编辑:添加了对应于 stat.desc
的 names
(基于@Jaap 的建议)
这是使用 dplyr
的解决方案:
library("pastecs")
library(dplyr)
res <- iris %>% group_by(Species) %>% do( summary = stat.desc(. ,norm=TRUE) )
stat.desc()
的结果存储为一个列表,可以像这样访问:
res$summary[res$Species=="setosa"]
我知道你在这里特别要求 "dplyr",但我想我也会分享一个 "data.table" 方法(正如评论中已经提到的)。
基本思路是先把数据做成"long",然后用as.list
当运行你的stat.desc
函数把数据弄成一个"wide"格式。
library(data.table)
library(pastecs)
melt(setDT(iris), id.vars = "Species")[
, as.list(stat.desc(value)), .(Species, variable)]
# Species variable nbr.val nbr.null nbr.na min max range sum median
# 1: setosa Sepal.Length 50 0 0 4.3 5.8 1.5 250.3 5.00
# 2: versicolor Sepal.Length 50 0 0 4.9 7.0 2.1 296.8 5.90
# 3: virginica Sepal.Length 50 0 0 4.9 7.9 3.0 329.4 6.50
# 4: setosa Sepal.Width 50 0 0 2.3 4.4 2.1 171.4 3.40
# 5: versicolor Sepal.Width 50 0 0 2.0 3.4 1.4 138.5 2.80
# 6: virginica Sepal.Width 50 0 0 2.2 3.8 1.6 148.7 3.00
# 7: setosa Petal.Length 50 0 0 1.0 1.9 0.9 73.1 1.50
# 8: versicolor Petal.Length 50 0 0 3.0 5.1 2.1 213.0 4.35
# 9: virginica Petal.Length 50 0 0 4.5 6.9 2.4 277.6 5.55
# 10: setosa Petal.Width 50 0 0 0.1 0.6 0.5 12.3 0.20
# 11: versicolor Petal.Width 50 0 0 1.0 1.8 0.8 66.3 1.30
# 12: virginica Petal.Width 50 0 0 1.4 2.5 1.1 101.3 2.00
# mean SE.mean CI.mean.0.95 var std.dev coef.var
# 1: 5.006 0.04984957 0.10017646 0.12424898 0.3524897 0.07041344
# 2: 5.936 0.07299762 0.14669422 0.26643265 0.5161711 0.08695606
# 3: 6.588 0.08992695 0.18071498 0.40434286 0.6358796 0.09652089
# 4: 3.428 0.05360780 0.10772890 0.14368980 0.3790644 0.11057887
# 5: 2.770 0.04437778 0.08918050 0.09846939 0.3137983 0.11328459
# 6: 2.974 0.04560791 0.09165253 0.10400408 0.3224966 0.10843868
# 7: 1.462 0.02455980 0.04935476 0.03015918 0.1736640 0.11878522
# 8: 4.260 0.06645545 0.13354722 0.22081633 0.4699110 0.11030774
# 9: 5.552 0.07804970 0.15684674 0.30458776 0.5518947 0.09940466
# 10: 0.246 0.01490377 0.02995025 0.01110612 0.1053856 0.42839670
# 11: 1.326 0.02796645 0.05620069 0.03910612 0.1977527 0.14913475
# 12: 2.026 0.03884138 0.07805468 0.07543265 0.2746501 0.13556271
我想你可以用这样的方式 dplyr-ify 答案:
library(data.table)
library(dplyr)
library(pastecs)
tbl_dt(iris) %>%
melt(id.vars = "Species") %>%
.[, as.list(stat.desc(value)), .(Species, variable)]
更新
如果您想留在 Hadleyverse 中,可以使用 "purrr" 和 "broom" 的组合,而不是 "dplyr":
library(purrr)
library(pastecs)
library(broom)
iris[-5] %>%
split(iris[5]) %>%
map_df(~ fix_data_frame(sapply(., stat.desc)), .id = "Species")
# Source: local data frame [42 x 6]
#
# Species term Sepal.Length Sepal.Width Petal.Length Petal.Width
# (chr) (chr) (dbl) (dbl) (dbl) (dbl)
# 1 setosa nbr.val 50.00000000 50.0000000 50.0000000 50.00000000
# 2 setosa nbr.null 0.00000000 0.0000000 0.0000000 0.00000000
# 3 setosa nbr.na 0.00000000 0.0000000 0.0000000 0.00000000
# 4 setosa min 4.30000000 2.3000000 1.0000000 0.10000000
# 5 setosa max 5.80000000 4.4000000 1.9000000 0.60000000
# 6 setosa range 1.50000000 2.1000000 0.9000000 0.50000000
# 7 setosa sum 250.30000000 171.4000000 73.1000000 12.30000000
# 8 setosa median 5.00000000 3.4000000 1.5000000 0.20000000
# 9 setosa mean 5.00600000 3.4280000 1.4620000 0.24600000
# 10 setosa SE.mean 0.04984957 0.0536078 0.0245598 0.01490377
# .. ... ... ... ... ... ...