将数据集子集化为列表

Question

我是 R 的新手。我花了几个小时试图解决这个问题并搜索 Google 和 SO，但似乎找不到任何我正在寻找的东西。希望你能帮忙？

我的数据集如下所示：

Site(factor)    Species           Date               Mass       GDD
1               cockerelli      0017-03-14           2.73       252.1
2               doddsii         0017-01-12           3.73       583.4
4               cockerelli      0017-03-14           2.71       385.4
4               doddsii         0018-05-16           2.22       783.2
1               infrequens      0018-05-16           2.89       583.0
etc.

我将我的数据框拆分 () 成一个数据框列表，然后我可以将其传递给 apply() 函数。

splitdata = split(data, paste(data$Species,data$Site))

但是，当我使用如下代码时：

grmodel = lapply(splitdata, function(x){
  grmodel = aov(x$Mass~x$GDD)
  print(summary(grmodel))
 })

我得到了大量的方差分析摘要列表（如下所示），但我不知道它们属于哪个物种和地点。

          Df   Sum Sq   Mean Sq F value Pr(>F)
 x$GDD        1 0.000022 0.0000216   0.044  0.838
 Residuals    9 0.004396 0.0004884               
 1 observation deleted due to missingness
           Df    Sum Sq   Mean Sq F value Pr(>F)
 x$GDD        1 0.0002526 0.0002526    0.65  0.451
 Residuals    6 0.0023319 0.0003887               
 1 observation deleted due to missingness

我想知道是否有人知道如何更改代码以告诉我方差分析 table 属于哪个物种和站点？我找到了一些关于 paste() 和其他函数的答案，但我尝试过的都没有用。

在此先致谢！

Answer 1

据我所知，名称应该是可见的，但我不确定您看到的到底是什么，但也许 reprex 会有用。

你也可以尝试使用tidy::broom看得更清楚：

lapply(split(iris,iris$Species),
       function(x) aov(Petal.Length ~ Petal.Width,x))

# $`setosa`
# Call:
#   aov(formula = Petal.Length ~ Petal.Width, data = x)
# 
# Terms:
#   Petal.Width Residuals
# Sum of Squares    0.1625262 1.3152738
# Deg. of Freedom           1        48
# 
# Residual standard error: 0.1655341
# Estimated effects may be unbalanced
# 
# $versicolor
# Call:
#   aov(formula = Petal.Length ~ Petal.Width, data = x)
# 
# Terms:
#   Petal.Width Residuals
# Sum of Squares     6.695921  4.124079
# Deg. of Freedom           1        48
# 
# Residual standard error: 0.2931183
# Estimated effects may be unbalanced
# 
# $virginica
# Call:
#   aov(formula = Petal.Length ~ Petal.Width, data = x)
# 
# Terms:
#   Petal.Width Residuals
# Sum of Squares     1.548503 13.376297
# Deg. of Freedom           1        48
# 
# Residual standard error: 0.5278947
# Estimated effects may be unbalanced

与tidy::broom :

lapply(split(iris,iris$Species),
       function(x) aov(Petal.Length ~ Petal.Width,x) %>% broom::tidy())

# $`setosa`
#          term df     sumsq     meansq statistic    p.value
# 1 Petal.Width  1 0.1625262 0.16252620   5.93128 0.01863892
# 2   Residuals 48 1.3152738 0.02740154        NA         NA
# 
# $versicolor
#          term df    sumsq     meansq statistic      p.value
# 1 Petal.Width  1 6.695921 6.69592109  77.93357 1.271916e-11
# 2   Residuals 48 4.124079 0.08591831        NA           NA
# 
# $virginica
#          term df     sumsq    meansq statistic    p.value
# 1 Petal.Width  1  1.548503 1.5485033  5.556707 0.02253577
# 2   Residuals 48 13.376297 0.2786728        NA         NA

Answer 2

split 的结果名称是强制转换为 character-class 的第二个参数的值，而 lapply 保留这些名称，因此您不应该需要添加回任何名称，而只是看看：

 names(grmodel)

也许您想对输出执行此操作：

 for( i in names(grmodel) ){ cat(i);
                              cat( : : :\n");
                               print(grmodel[[i]]);
                                cat("\n\n")}

.. 只打印 grmodels 列表中每个项目的名称和一些间距。

将数据集子集化为列表

Subsetting datasets into a list

label

r

paste

lapply

anova