如何循环遍历 R 中所有可能的因子水平比较
How to loop through all possible factor level comparisons in R
考虑以下数据框:
type = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D')
val1 = c(.35, .36, .35, .22, .27, .25, .88, .9, .87, .35, .35, .36)
val2 = c(.35, .35, .37, .40, .42, .46, .9, .91, .82, .36, .36, .36)
df = data.frame (type, val1, val2)
我有四个类别(称为类型;A、B、C 和 D)。可以对每种类型的三个观察值进行平均以创建类型多元均值(由 val1 和 val2 的均值组成)。我想使用 Hotelling 的测试来比较所有可能的类型组合(AB、AC、AD、BC、BD、CD),以确定哪种类型意味着(如果有)相同。我可以将其硬编码为:
a = filter (df, type == "A") [,2:3]
b = filter (df, type == "B") [,2:3]
c = filter (df, type == "C") [,2:3]
d = filter (df, type == "D") [,2:3]
然后 运行 每对指定类型的霍特林 T2 检验:
library('Hotelling')
hotelling.test(a, b, shrinkage=FALSE)
hotelling.test(b, c, shrinkage=FALSE)
hotelling.test(a, c, shrinkage=FALSE)
#And so on
考虑到我的实际数据集有 55 种不同的类型,这显然是非常低效和不切实际的。我知道答案在于 for 循环,但我很难弄清楚如何告诉 hotelling.test 比较所有可能类型组合的 val1/val2 多元均值。我对创建 for 循环还很陌生,希望有人能给我指出正确的方向。
比较完所有类型后,理想情况下我能够得到一个输出,其中显示 Hotelling 检验 p 值大于 0.05 的类型对,这意味着这两种类型可能是重复的。在示例数据框中,类型 A 和 D return p 值 >0.05,而其他比较的 p<0.05.
我们可以使用combn
创建成对组合,对数据集进行子集化并应用函数
library(Hotelling)
outlst <- combn(as.character(unique(df$type)), 2,
FUN = function(x) hotelling.test(subset(df, type == x[1], select = -1),
subset(df, type == x[2], select = -1)), simplify = FALSE)
names(outlst) <- combn(as.character(unique(df$type)), 2, FUN = paste, collapse = "_")
outlst[1]
#$A_B
#Test stat: 36.013
#Numerator df: 2
#Denominator df: 3
#P-value: 0.007996
如果你想使用for循环:
type = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D')
val1 = c(.35, .36, .35, .22, .27, .25, .88, .9, .87, .35, .35, .36)
val2 = c(.35, .35, .37, .40, .42, .46, .9, .91, .82, .36, .36, .36)
df = data.frame (type, val1, val2)
for (first in unique(df$type)) {
for (second in unique(df$type)) {
if (first != second) {
print(c(first, second))
}
}
}
[1] "A" "B"
[1] "A" "C"
[1] "A" "D"
[1] "B" "A"
[1] "B" "C"
[1] "B" "D"
[1] "C" "A"
[1] "C" "B"
[1] "C" "D"
[1] "D" "A"
[1] "D" "B"
[1] "D" "C"
考虑以下数据框:
type = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D')
val1 = c(.35, .36, .35, .22, .27, .25, .88, .9, .87, .35, .35, .36)
val2 = c(.35, .35, .37, .40, .42, .46, .9, .91, .82, .36, .36, .36)
df = data.frame (type, val1, val2)
我有四个类别(称为类型;A、B、C 和 D)。可以对每种类型的三个观察值进行平均以创建类型多元均值(由 val1 和 val2 的均值组成)。我想使用 Hotelling 的测试来比较所有可能的类型组合(AB、AC、AD、BC、BD、CD),以确定哪种类型意味着(如果有)相同。我可以将其硬编码为:
a = filter (df, type == "A") [,2:3]
b = filter (df, type == "B") [,2:3]
c = filter (df, type == "C") [,2:3]
d = filter (df, type == "D") [,2:3]
然后 运行 每对指定类型的霍特林 T2 检验:
library('Hotelling')
hotelling.test(a, b, shrinkage=FALSE)
hotelling.test(b, c, shrinkage=FALSE)
hotelling.test(a, c, shrinkage=FALSE)
#And so on
考虑到我的实际数据集有 55 种不同的类型,这显然是非常低效和不切实际的。我知道答案在于 for 循环,但我很难弄清楚如何告诉 hotelling.test 比较所有可能类型组合的 val1/val2 多元均值。我对创建 for 循环还很陌生,希望有人能给我指出正确的方向。
比较完所有类型后,理想情况下我能够得到一个输出,其中显示 Hotelling 检验 p 值大于 0.05 的类型对,这意味着这两种类型可能是重复的。在示例数据框中,类型 A 和 D return p 值 >0.05,而其他比较的 p<0.05.
我们可以使用combn
创建成对组合,对数据集进行子集化并应用函数
library(Hotelling)
outlst <- combn(as.character(unique(df$type)), 2,
FUN = function(x) hotelling.test(subset(df, type == x[1], select = -1),
subset(df, type == x[2], select = -1)), simplify = FALSE)
names(outlst) <- combn(as.character(unique(df$type)), 2, FUN = paste, collapse = "_")
outlst[1]
#$A_B
#Test stat: 36.013
#Numerator df: 2
#Denominator df: 3
#P-value: 0.007996
如果你想使用for循环:
type = c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D')
val1 = c(.35, .36, .35, .22, .27, .25, .88, .9, .87, .35, .35, .36)
val2 = c(.35, .35, .37, .40, .42, .46, .9, .91, .82, .36, .36, .36)
df = data.frame (type, val1, val2)
for (first in unique(df$type)) {
for (second in unique(df$type)) {
if (first != second) {
print(c(first, second))
}
}
}
[1] "A" "B"
[1] "A" "C"
[1] "A" "D"
[1] "B" "A"
[1] "B" "C"
[1] "B" "D"
[1] "C" "A"
[1] "C" "B"
[1] "C" "D"
[1] "D" "A"
[1] "D" "B"
[1] "D" "C"