在 R 中将数据格式化为运行 ANOVA

Question

我正在尝试运行 R 中的 3 向方差分析，但我的每个变量的值都在一列中，而不是按行分隔。目前，我的数据框看起来像这样：

Season  Site    Location    Replicate   Lengths
Jan_16  MI      Adj        1.00      ,
Jan_16  MI      Adj        2.00      ,
Jan_16  MI      Adj        3.00      ,
Jan_16  MI     Away        1.00      3,4,
Jan_16  MI     Away        2.00      ,
Jan_16  MI     Away        3.00      ,
Jan_16  MP     Adj         1.00      4,5,6,5,4,5,4,4,4,4,5,4,6,4,
Jan_16  MP     Adj         2.00      4,4,3,3,5,4,3,4,5,3,4,3,4,3,4,6,
Jan_16  MP     Adj         3.00      4,6,5,5,4,
Jan_16  MP     Away        1.00      ,4,4,10,4,5,4,6,5,5,
Jan_16  MP     Away        2.00       3,4,4,4,5,5,4,5,
Jan_16  MP     Away        3.00       4,4,13,4,

Lengths 是我希望运行方差分析的响应变量，我该怎么做？只是一个“，”表示没有数据。

**** 编辑

我试过单独的行

library(tidyr)

separate_rows(data.frame, Season:Replicate, Lengths, convert=numeric )


#Error: All nested columns must have the same number of elements

Lengths 有不同数量的变量，那么有没有办法取消嵌套呢？

Answer 1

解除数据嵌套是解决问题的最佳方法。

运行代码：

library(dplyr)

#Unnest everything so that no longer "," but each has a row


data.frame.new<-data.frame   
  transform(Lengths=strsplit(Lengths,",")) %>%
  unnest(Lengths)

#Gets rid of blanks where there are no data

Set.unnest<-subset(data.frame.new, Lengths!="")

这给出了长度中每个数据点的季节、站点、位置和复制重复行的结果

Answer 2

从你的问题中不清楚你的自变量是什么。在下面的示例中，我假设 Site、Location、Replicate 是您的 IV。

让我们首先将 Lengths 中的条目拆分为不同的行，并删除带有 missing/no Lengths.

的行

library(tidyverse)
df.aov <- df %>%
    mutate(Lengths = str_split(Lengths, ",")) %>%
    unnest() %>%
    filter(Lengths >= 0)

我们现在可以使用 aov

执行 3 向方差分析

res <- aov(Lengths ~ Site * Location * Replicate, data = df.aov)
res
#Call:
#   aov(formula = Lengths ~ Site * Location * Replicate, data = df.aov)
#
#Terms:
#                     Site  Location Replicate Location:Replicate Residuals
#Sum of Squares    2.21675   7.61905   0.11491            0.89526 131.58506
#Deg. of Freedom         1         1         1                  1        53
#
#Residual standard error: 1.57567
#3 out of 8 effects not estimable
#Estimated effects may be unbalanced

请注意，结果不是很明智。我假设您的实际数据集更大。

在 R 中将数据格式化为运行 ANOVA

Format data to run ANOVA in R

statistics

r

comma

dataframe

anova

在 R 中将数据格式化为 运行 ANOVA

Format data to run ANOVA in R

statistics

r

comma

dataframe

anova

在 R 中将数据格式化为运行 ANOVA