在 R 中使用 combn() 查找所有可能的 t 检验关系,如何访问比较的变量?

Using combn() in R to find all possible t-test relationships, how to access the variables compared?

所以,我有一个包含大量变量的 DataFrame,我想用 t 检验交叉检查每个变量。

我的数据样本,名为 trust_news:

row mean polity2 web rsf civil_liberties freedom_of_expression vdem_gov_censorship_effort vdem_self_censorship_effort vdem_freedom_of_expression ciri_freedom_of_speech_and_press media_integrity vdem_critical_press vdem_media_perspective vdem_media_bias vdem_media_corruption vdem_media_freedom
1 2.68 8 87.2661 25.69 0.785599008 0.758906967 0.731895466 0.742219428 1 1 0.81449235 0.889046047 0.782079459 0.693825991 0.733503755 1
2 2.8 8 94.8967 22.23 0.810742702 0.832891911 0.8447733 0.831499528 1 1 0.88417386 0.868772592 0.881994928 0.835622928 0.828566864 1
3 3.22 10 89.7391 14.6 0.821268417 0.83327835 0.883343829 0.805721471 1 1 0.829951651 0.917491749 0.725950972 0.709774199 0.874261064 1
5 2.96 10 74.3872 24.98 0.813949794 0.781986225 0.844615869 0.729330399 0.666666667 0.5 0.878769429 0.872387239 0.919019442 0.841939049 0.810193322 0.5

然后,我运行这段代码就可以了:

trust_news_combos <- combn(trust_news, 1, t.test, simplify = TRUE)

首先,代码是否正确?我不知道在 combn() 函数中为 m 放什么。 AAanyway,那条线给了我这个:

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
1 c(t = 85.1670166474227) c(t = 15.9614095646055) c(t = 29.2365516170159) c(t = 11.0778062107689) c(t = 30.4673329981756) c(t = 26.8521522144486) c(t = 23.160185720972) c(t = 25.1063414199952) c(t = 17.1830959329723) c(t = 11.06502519693) c(t = 33.0841916129404) c(t = 29.3707961673045) c(t = 31.2455551028106) c(t = 39.1490231250879) c(t = 27.6089179039943) c(t = 14.0719508946058)
2 c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32) c(df = 32)
3 2.69E-39 8.55E-17 1.18E-24 1.75E-12 3.29E-25 1.61E-23 1.46E-21 1.26E-22 1.03E-17 1.80E-12 2.55E-26 1.02E-24 1.51E-25 1.32E-28 6.88E-24 2.96E-15
4 c(3.00189912275063 3.14900996815846) c(7.56066019283154 9.77267314050179) c(73.5097801046279 84.5198259559781) c(19.628297122971 28.4729149982411) c(0.682586494865725 0.780396107679729) c(0.639468676034051 0.744449016935646) c(0.664192511270674 0.792289818305084) c(0.665160025455844 0.782621785210823) c(0.676674167771883 0.858679367682662) c(0.543941635486123 0.78939169784721) c(0.739756992152986 0.836824222392469) c(0.730937293702635 0.839876930600395) c(0.729509614919607 0.831257822777363) c(0.709894349786553 0.787820841122538) c(0.708427672557418 0.821287114048642) c(0.647915673315896 0.867235841835619)
5 c(mean of x = 3.07545454545455) c(mean of x = 8.66666666666667) c(mean of x = 79.014803030303) c(mean of x = 24.0506060606061) c(mean of x = 0.731491301272727) c(mean of x = 0.691958846484849) c(mean of x = 0.728241164787879) c(mean of x = 0.723890905333333) c(mean of x = 0.767676767727273) c(mean of x = 0.666666666666667) c(mean of x = 0.788290607272727) c(mean of x = 0.785407112151515) c(mean of x = 0.780383718848485) c(mean of x = 0.748857595454545) c(mean of x = 0.76485739330303) c(mean of x = 0.757575757575758)
6 c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0) c(mean = 0)
7 0.036110864 0.542976272 2.702603374 2.171062176 0.024009036 0.025769214 0.031443667 0.028832991 0.044676278 0.0602499 0.023826806 0.02674109 0.024975831 0.019128385 0.027703273 0.053835873
8 two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided two.sided
9 One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test One Sample t-test
10 x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a] x[a]

它为我提供了第 3 行中要查找的 p 值,但如何检查正在检查的是哪两列?

感谢任何帮助,并将在我的最终代码中表示感谢!

您应该编写一个小函数来准确计算您需要的内容,并使用它代替标准函数 t.test。例如:

# get four column names
cols <- names(mtcars)[1:4]   # use trust_news instead of mtcars, and keep all the names

# compute the pval for a pair of names
pval <- function(pair) {
  value <- t.test(mtcars[, pair[1]], mtcars[, pair[2]])$p.value
  names(value) <- paste(pair, collapse = " vs. ")
  value
}

# do it for all pairs.  Don't simplify, and it will keep the names
combn(cols, 2, pval, simplify = FALSE)
#> [[1]]
#>  mpg vs. cyl 
#> 9.507708e-15 
#> 
#> [[2]]
#> mpg vs. disp 
#> 7.978234e-11 
#> 
#> [[3]]
#>   mpg vs. hp 
#> 1.030354e-11 
#> 
#> [[4]]
#> cyl vs. disp 
#> 1.774454e-11 
#> 
#> [[5]]
#>   cyl vs. hp 
#> 8.321996e-13 
#> 
#> [[6]]
#> disp vs. hp 
#> 0.001545647

reprex package (v2.0.0)

于 2021-05-22 创建

一种方法是在列名

上创建第二个combn
nm1 <-  combn(names(trust_news), 2, FUN = paste, collapse= '-', simplify = TRUE)

然后,我们做

trust_news_combos <- combn(trust_news, 2, t.test, simplify = FALSE)
names(trust_new_combos) <- nm1

最好在数据中获取输出。frame/tibble 结构 broom 使用 tidy(运行 in R 4.1.0

library(broom)
lst1 <- combn(trust_news, 2, \(y) t.test(y[1], y[2]) |>
                             tidy(), simplify = FALSE) |>
         setNames(nm1)

out <- Map(cbind, comparison = names(lst1), lst1) |>
       {\(x) do.call(rbind, x)}()
row.names(out) <- NULL

         

-输出

head(out)
                 comparison   estimate estimate1  estimate2   statistic      p.value parameter    conf.low
1                  row-mean  -0.165000      2.75  2.9150000  -0.1914478 0.8599889461  3.112075  -2.8527609
2               row-polity2  -6.250000      2.75  9.0000000  -6.0633906 0.0014638846  5.268737  -8.8595564
3                   row-web -83.822275      2.75 86.5722750 -18.8602012 0.0002049939  3.229641 -97.4140679
4                   row-rsf -19.125000      2.75 21.8750000  -7.1441517 0.0027953086  3.671029 -26.8277783
5       row-civil_liberties   1.942110      2.75  0.8078900   2.2742727 0.1074862571  3.000494  -0.7752796
6 row-freedom_of_expression   1.948234      2.75  0.8017659   2.2809921 0.1067532047  3.002873  -0.7684766
   conf.high                  method alternative
1   2.522761 Welch Two Sample t-test   two.sided
2  -3.640444 Welch Two Sample t-test   two.sided
3 -70.230482 Welch Two Sample t-test   two.sided
4 -11.422222 Welch Two Sample t-test   two.sided
5   4.659500 Welch Two Sample t-test   two.sided
6   4.664945 Welch Two Sample t-test   two.sided

数据

trust_news <- structure(list(row = c(1L, 2L, 3L, 5L), mean = c(2.68, 2.8, 3.22, 
2.96), polity2 = c(8L, 8L, 10L, 10L), web = c(87.2661, 94.8967, 
89.7391, 74.3872), rsf = c(25.69, 22.23, 14.6, 24.98), civil_liberties = c(0.785599008, 
0.810742702, 0.821268417, 0.813949794), freedom_of_expression = c(0.758906967, 
0.832891911, 0.83327835, 0.781986225), vdem_gov_censorship_effort = c(0.731895466, 
0.8447733, 0.883343829, 0.844615869), vdem_self_censorship_effort = c(0.742219428, 
0.831499528, 0.805721471, 0.729330399), vdem_freedom_of_expression = c(1, 
1, 1, 0.666666667), ciri_freedom_of_speech_and_press = c(1, 1, 
1, 0.5), media_integrity = c(0.81449235, 0.88417386, 0.829951651, 
0.878769429), vdem_critical_press = c(0.889046047, 0.868772592, 
0.917491749, 0.872387239), vdem_media_perspective = c(0.782079459, 
0.881994928, 0.725950972, 0.919019442), vdem_media_bias = c(0.693825991, 
0.835622928, 0.709774199, 0.841939049), vdem_media_corruption = c(0.733503755, 
0.828566864, 0.874261064, 0.810193322), vdem_media_freedom = c(1, 
1, 1, 0.5)), class = "data.frame", row.names = c(NA, -4L))