如何在跨突变后将 t.test() 应用于多对列

Question

这个问题与有关。

数据：

df <- structure(list(Subject = 1:3, PreScoreTestA = c(30L, 15L, 20L
), PostScoreTestA = c(40L, 12L, 22L), PreScoreTestB = c(6L, 9L, 
11L), PostScoreTestB = c(8L, 13L, 12L), PreScoreTestC = c(12L, 
7L, 9L), PostScoreTestC = c(10L, 7L, 10L)), class = "data.frame", row.names = c(NA, 
-3L))

> df
  Subject PreScoreTestA PostScoreTestA PreScoreTestB PostScoreTestB PreScoreTestC PostScoreTestC
1       1            30             40             6              8            12             10
2       2            15             12             9             13             7              7
3       3            20             22            11             12             9             10

此处 OP 询问是否可以将 t.test 应用于 宽格式数据帧 中的成对列。已经提供了使用长格式的解决方案。

不过，我尝试应用以下代码作为以宽格式执行 t.test 的答案。

我的代码使用 + 作为函数（运行良好）：

library(dplyr)
library(stringr)
df %>%
  mutate(across(starts_with('PreScore'), ~ . +
                  get(str_replace(cur_column(), "^PreScore", "PostScore")), .names = "{.col}_TTest")) %>%
  rename_at(vars(ends_with('TTest')), ~ str_remove(., "PreScore"))

# gives:
  Subject PreScoreTestA PostScoreTestA PreScoreTestB PostScoreTestB PreScoreTestC PostScoreTestC
1       1            30             40             6              8            12             10
2       2            15             12             9             13             7              7
3       3            20             22            11             12             9             10
  TestA_TTest TestB_TTest TestC_TTest
1          70          14          22
2          27          22          14
3          42          23          19

现在我想通过 t.test 更改函数 +（这不起作用，我尝试了很多变体）:

library(dplyr)
library(stringr)
df %>%
  mutate(across(starts_with('PreScore'), ~ . t.test
                  get(str_replace(cur_column(), "^PreScore", "PostScore")), .names = "{.col}_TTest")) %>%
  rename_at(vars(ends_with('TTest')), ~ str_remove(., "PreScore"))

我想知道：

是否可以将 t.test 函数应用于 across 之后的预定义列对集，就像 - + / 等一样。 ..

我浏览过的更多资源：

dplyr summarise multiple columns using t.test

Answer 1

t.test 输出是一个 list，因此我们可能需要包装在一个 list 中以便用 mutate

进行容器化

library(dplyr)
library(stringr)
out <- df %>%
  mutate(across(starts_with('PreScore'), 
    ~list(t.test(.,
         get(str_replace(cur_column(), "^PreScore", "PostScore")))), 
        .names = "{.col}_TTest")) %>%
     rename_at(vars(ends_with('TTest')), ~ str_remove(., "PreScore"))

-检查 str

> str(out)
'data.frame':   3 obs. of  10 variables:
 $ Subject       : int  1 2 3
 $ PreScoreTestA : int  30 15 20
 $ PostScoreTestA: int  40 12 22
 $ PreScoreTestB : int  6 9 11
 $ PostScoreTestB: int  8 13 12
 $ PreScoreTestC : int  12 7 9
 $ PostScoreTestC: int  10 7 10
 $ TestA_TTest   :List of 3
  ..$ :List of 10
  .. ..$ statistic  : Named num -0.322
  .. .. ..- attr(*, "names")= chr "t"
  .. ..$ parameter  : Named num 3.07
  .. .. ..- attr(*, "names")= chr "df"
  .. ..$ p.value    : num 0.768
  .. ..$ conf.int   : num  -32.2 26.2
  .. .. ..- attr(*, "conf.level")= num 0.95
  .. ..$ estimate   : Named num  21.7 24.7
  .. .. ..- attr(*, "names")= chr [1:2] "mean of x" "mean of y"
  .. ..$ null.value : Named num 0
  .. .. ..- attr(*, "names")= chr "difference in means"
  .. ..$ stderr     : num 9.3
  .. ..$ alternative: chr "two.sided"
  .. ..$ method     : chr "Welch Two Sample t-test"
  .. ..$ data.name  : chr "PreScoreTestA and get(str_replace(cur_column(), \"^PreScore\", \"PostScore\"))"
  .. ..- attr(*, "class")= chr "htest"
  ..$ :List of 10
...

如果我们只需要提取特定的 list 元素，即 p.value

df %>%
   mutate(across(starts_with('PreScore'),
      ~  t.test(.,
         get(str_replace(cur_column(), "^PreScore", "PostScore")))$p.value, 
     .names = "{.col}_TTest"))
  Subject PreScoreTestA PostScoreTestA PreScoreTestB PostScoreTestB PreScoreTestC PostScoreTestC PreScoreTestA_TTest
1       1            30             40             6              8            12             10            0.767827
2       2            15             12             9             13             7              7            0.767827
3       3            20             22            11             12             9             10            0.767827
  PreScoreTestB_TTest PreScoreTestC_TTest
1            0.330604           0.8604162
2            0.330604           0.8604162
3            0.330604           0.8604162

请注意，通过使用 mutate，我们为所有行存储了相同的信息。相反，我们可以使用 summarise

df %>%
   summarise(across(starts_with('PreScore'), ~  t.test(.,
         get(str_replace(cur_column(), "^PreScore", "PostScore")))$p.value, 
      .names = "{.col}_TTest"))
PreScoreTestA_TTest PreScoreTestB_TTest PreScoreTestC_TTest
1            0.767827            0.330604           0.8604162

如何在跨突变后将 t.test() 应用于多对列

How to apply t.test() to multiple pairs of columns after mutate across

r

dplyr

across