应用函数在 2 个数据框中获取匹配列,遍历列
Apply function taking matching columns in 2 data frames, looping over columns
我有以下两个数据框
df1 <- as.data.frame(matrix(runif(50), nrow = 10, byrow = TRUE))
colnames(df1) <- c("x1", "x2", "x3", "x4", "x5")
df2 <- as.data.frame(matrix(runif(100), nrow = 20, byrow = TRUE))
colnames(df2) <- c("x1", "x2", "x3", "x4", "x5")
我想测试 2 个 dfs 的 x_j 列的平均值是否相同,对于 j=1,...,5,记录测试统计量和 p 值。
t.test(df1$x1, df2$x1)$statistic
t.test(df1$x1, df2$x1)$p.value
apply() 好像只输入一个df?在 j 上循环以上两行的最佳方法是什么?
提前致谢!
您可以在 R 中使用常规 for
循环,通过遍历列名来实现此目的。
cols <- c("x1", "x2", "x3", "x4", "x5")
df1 <- as.data.frame(matrix(runif(50), nrow = 10, byrow = TRUE))
colnames(df1) <- cols
df2 <- as.data.frame(matrix(runif(100), nrow = 20, byrow = TRUE))
colnames(df2) <- cols
for (col in cols) {
message(paste("Testing column", col, collapse = " "))
print(paste("t-statistic: ", t.test(df1[col], df2[col])$statistic[["t"]]))
print(paste("p-value: ", t.test(df1[col], df2[col])$p.value))
}
#> Testing column x1
#> [1] "t-statistic: 0.419581290015361"
#> [1] "p-value: 0.68029340912263"
#> Testing column x2
#> [1] "t-statistic: -0.343435717107623"
#> [1] "p-value: 0.7361266387073"
#> Testing column x3
#> [1] "t-statistic: 0.248037735890824"
#> [1] "p-value: 0.807107717907307"
#> Testing column x4
#> [1] "t-statistic: 0.992363174130968"
#> [1] "p-value: 0.333989277352541"
#> Testing column x5
#> [1] "t-statistic: 2.06600413500528"
#> [1] "p-value: 0.0527652252424411"
由 reprex package (v0.3.0)
于 2020 年 11 月 2 日创建
apply
、lapply
、vapply
和 sapply
都在单个对象上循环。如果你有 m
倍数,你想要 mapply
或 Map
:
mapply(function(x,y) t.test(x,y)[c("statistic","p.value")], df1, df2)
# x1 x2 x3 x4 x5
#statistic 0.6816886 -1.408304 -0.2598513 -0.890468 -1.097354
#p.value 0.5028386 0.1721202 0.7982655 0.3825847 0.2851621
这假设 df1
和 df2
的列顺序相同。
我有以下两个数据框
df1 <- as.data.frame(matrix(runif(50), nrow = 10, byrow = TRUE))
colnames(df1) <- c("x1", "x2", "x3", "x4", "x5")
df2 <- as.data.frame(matrix(runif(100), nrow = 20, byrow = TRUE))
colnames(df2) <- c("x1", "x2", "x3", "x4", "x5")
我想测试 2 个 dfs 的 x_j 列的平均值是否相同,对于 j=1,...,5,记录测试统计量和 p 值。
t.test(df1$x1, df2$x1)$statistic
t.test(df1$x1, df2$x1)$p.value
apply() 好像只输入一个df?在 j 上循环以上两行的最佳方法是什么?
提前致谢!
您可以在 R 中使用常规 for
循环,通过遍历列名来实现此目的。
cols <- c("x1", "x2", "x3", "x4", "x5")
df1 <- as.data.frame(matrix(runif(50), nrow = 10, byrow = TRUE))
colnames(df1) <- cols
df2 <- as.data.frame(matrix(runif(100), nrow = 20, byrow = TRUE))
colnames(df2) <- cols
for (col in cols) {
message(paste("Testing column", col, collapse = " "))
print(paste("t-statistic: ", t.test(df1[col], df2[col])$statistic[["t"]]))
print(paste("p-value: ", t.test(df1[col], df2[col])$p.value))
}
#> Testing column x1
#> [1] "t-statistic: 0.419581290015361"
#> [1] "p-value: 0.68029340912263"
#> Testing column x2
#> [1] "t-statistic: -0.343435717107623"
#> [1] "p-value: 0.7361266387073"
#> Testing column x3
#> [1] "t-statistic: 0.248037735890824"
#> [1] "p-value: 0.807107717907307"
#> Testing column x4
#> [1] "t-statistic: 0.992363174130968"
#> [1] "p-value: 0.333989277352541"
#> Testing column x5
#> [1] "t-statistic: 2.06600413500528"
#> [1] "p-value: 0.0527652252424411"
由 reprex package (v0.3.0)
于 2020 年 11 月 2 日创建apply
、lapply
、vapply
和 sapply
都在单个对象上循环。如果你有 m
倍数,你想要 mapply
或 Map
:
mapply(function(x,y) t.test(x,y)[c("statistic","p.value")], df1, df2)
# x1 x2 x3 x4 x5
#statistic 0.6816886 -1.408304 -0.2598513 -0.890468 -1.097354
#p.value 0.5028386 0.1721202 0.7982655 0.3825847 0.2851621
这假设 df1
和 df2
的列顺序相同。