将连续函数应用于数据帧并将每个函数的结果输出到 table

Question

我有一个大型数据框，其格式类似于以下内容（运行 ~200 种化合物）。

+-----------+----------+------------+
| Treatment | Compound | Proportion |
+-----------+----------+------------+
| A         | wax      | 0.095      |
| A         | alcohol  | 0.077      |
| A         | ketone   | 0.066      |
| B         | wax      | 0.067      |
| B         | alcohol  | 0.071      |
| B         | ketone   | 0.073      |
| C         | wax      | 0.051      |
| C         | alcohol  | 0.019      |
| C         | ketone   | 0.07       |
| D         | wax      | 0.033      |
| D         | alcohol  | 0.082      |
| D         | ketone   | 0.019      |
+-----------+----------+------------+

我有运行线性模型方差分析

lm(Proportion ~ Treatment)

使用 data.table 方法对每个化合物生成了一个化合物列表，这些化合物的处理是将我的数据子集化为 "t.df".

的重要因素

我现在想使用 TukeyHSD 来确定对于这些化合物中的每一种，哪些处理彼此之间存在显着差异。我意识到 TukeyHSD 需要一个 "aov" 输出，我需要将其包含在我的代码中。我想我想要的是一种 "tapply" 方法运行通过我的化合物列表，应用模型，进行方差分析，然后进行 Tukeys 测试并将格式保存在矩阵列表中。

我一直在尝试尝试以下类似的方法，但没有成功：

mytest <- function(x) { 
  model<-lm(Proportion ~ Treatment, data=t.df)
  aovmodel<-aov(model)
  tuks<-TukeyHSD(aovmodel) 
  } 
tapply((t.df[unique(t.df$Compound)]),mytest)

这个returns错误：

"Error in `[.data.frame`(t.df, unique(t.df$Compound)) : 
  undefined columns selected"

我认为这可能是我在这段代码中遇到的最少的问题。

是否有任何方法可以为每个测试的化合物提取返回的 Tukey "p adj" 值？我很想避免长期这样做，因为我的列表中有大量化合物，并且预计运行在未来的几个数据集上使用不同的化合物名称进行类似的分析。

Answer 1

要获取您指定的每种化合物的 Tukey HSD，请尝试以下操作：

lapply(unique(t.df$Compound),
       function(x, df)
           TukeyHSD(aov(glm(Proportion ~ Treatment,
                            data = df,
                            subset = Compound == x)))[[1]],
       df = t.df)

对于每个独特的化合物，这会在方差分析上调用 TukeyHSD()，以获得适合与化合物对应的数据子集的一般线性模型。它 returns 一个列表，其中每个元素对应一个化合物。

将连续函数应用于数据帧并将每个函数的结果输出到 table

Applying consecutive functions to a dataframe and outputting results of each into a table

r

tapply