如何创建各种测试结果的数据框？

Question

在我们的实验中，我们有一个包含以下列的数据框：

Participant	Condition	parametricVariables	nonParametricVariables	orderNumber
1	Condition 1	14.7	4	1
1	Condition 2	11.4	1	2
2	Condition 1	8.2	7	2
2	Condition 2	13.0	6	1
...	...	...	...	...

我们有多个参数变量和多个非参数变量，只有两个条件。 orderNumber 列表示给定参与者测试给定条件的顺序 - 因此参与者 1 首先测试条件 1，然后测试条件 2，而参与者 2 以相反的顺序测试它们。

尽管我们尽了最大努力，但我们仍在尝试查看是否存在基于条件顺序的非系统变化。到目前为止，我们一直在使用函数调用并从输出中读取结果，如下所示：

ParticipantOrder1 <- gameSummary %>% filter(orderNumber == 1)
Condition1Order1 <- ParticipantOrder1 %>% filter(条件==condition1_label)
Condition2Order1 <- ParticipantOrder1 %>% filter(条件==condition2_label)

ParticipantOrder2 <- gameSummary %>% filter(orderNumber == 2)
Condition1Order2 <- ParticipantOrder2 %>% filter(条件==condition1_label)
Condition2Order2 <- ParticipantOrder2 %>% filter(条件==condition2_label)

# 检查参数变量的正态性
# ...

# 使用 Welch 的 t 检验检查两个订单中参数变量的差异
t.test(ParticipantOrder1<span class="math-container">$parametric, ParticipantOrder2$</span>parametric)
t.test(Condition1Order1<span class="math-container">$parametric, Condition1Order2$</span>parametric)
t.test(Condition2Order1<span class="math-container">$parametric, Condition2Order2$</span>parametric)

# 使用 Wilcoxon 符号排序检验检查两个订单中非参数变量的差异
wilcox.test(ParticipantOrder1<span class="math-container">$nonParametric, ParticipantOrder1$</span>nonParametric, paired=TRUE,exact=FALSE)
wilcox.test(Condition1Order1<span class="math-container">$nonParametric, Condition1Order2$</span>nonParametric, paired=TRUE,exact=FALSE)
wilcox.test(Condition2Order1<span class="math-container">$nonParametric, Condition2Order2$</span>nonParametric, paired=TRUE,exact=FALSE)

如您所见，当有多个参数和非参数变量时，这种方法会变得相当笨拙。我想知道是否有更好的方法将所有这些测试结果收集到 table 中，如下所示：

Variable	Condition	TestType	statistic	p-value
parametric1	Both	Welch Two Sample t-test	0.10317	0.9185
parametric1	Condition 1	Welch Two Sample t-test	0.625	0.5462
parametric1	Condition 2	Welch Two Sample t-test	-0.69369	0.503
nonParametric1	Both	Wilcoxon signed rank test with continuity correction	18	0.6295
...	...	...	...	...

Answer 1

对数据进行分组

首先，我们应该按 groupingVariable.

对所有数据进行分组

analysisSummary <- gameSummary %>%
  select(parametric1, parametric2, nonparametric1, groupingVariable) %>%
  gather(key = variable, value = value, -groupingVariable) %>%
  group_by(variable, groupingVariable) %>%
  summarise(value=list(value)) %>%
  spread(groupingVariable, value) %>% 
  group_by(variable)

如果您想了解这个查询是如何构建的，我建议您查看 this tutorial by Sebastian Sauer。

这将为我们提供以下带有 groupingValue 的 table，它们是 groupingVariable 的值：

variable	groupingValue1	groupingValue2
parametric1	<dbl [X]>	<dbl [Y]>
parametric1	<dbl [X]>	<dbl [Y]>
nonparametric1	<dbl [X]>	<dbl [Y]>

parametric1、parametric1 和 nonparametric1 是您要在两组之间进行比较的变量。

groupingVariable 是您划分人口的指标。例如，它可能是 sex，在这种情况下 groupingValue 可能是 male 和 female [1]。或者，按照问题中的示例，groupingVariable 可以是 orderNumber，而 groupingValue 可以是 1 和 2。请注意，这些具有数值 - 这给我们带来了一个问题。

数值`groupingVariable`s

R 不会将列的数值视为名称，而是 table 中列的顺序号。如果您想要可读代码，可以使用

将这些列重命名为 order1 和 order2

analysisSummary <- analysisSummary %>% rename(order1 = 2, order2 = 3)

假设 groupingValue1 和 groupingValue2 列分别位于 table 中的第 2 和第 3 个位置。

运行测试

我们可以使用 case_when 有条件地运行对不同的变量进行不同的测试。

analysisSummary %>% mutate(
    # Save the name of the test for convenient reference later
    test = case_when(
        isVariableParametric(variable) ~ "Welch's t test", TRUE ~ "Wilcoxon test"
    ),
    # Run the t-test for parametric variables and Wilcoxon signed rank test for non-parametric ones, save the p-value
    p_value = case_when(
        isVariableParametric(variable) ~ t.test(unlist(groupingVariable1), unlist(groupingVariable2))$p.value,
            TRUE ~ wilcox.test(unlist(groupingVariable1), unlist(groupingVariable2), paired=FALSE, exact=FALSE)$p.value
    ),
    # Run the test again, but now save the effect size
    statistic = case_when(
        isVariableParametric(variable) ~ t.test(unlist(groupingVariable1), unlist(groupingVariable2))$statistic,
            TRUE ~ wilcox.test(unlist(groupingVariable1), unlist(groupingVariable2), paired=FALSE, exact=FALSE)$statistic
    ),
)

您还应该定义一个函数来决定变量是否是参数化的。在我的例子中，我对其进行了硬编码（但长期的、可重用的解决方案是动态解析它）：

isVariableParametric <- function(variable) {
  variable %in% c('parametric1', 'parametric2')
}

这将为我们提供 table 且易于浏览的结果：

variable	groupingValue1	groupingValue2	test	p-value	statistic
parametric1	<dbl [X]>	<dbl [Y]>	Welch's t test	0.19081	0.23504
parametric1	<dbl [X]>	<dbl [Y]>	Welch's t test	0.16398	0.00014
nonparametric1	<dbl [X]>	<dbl [Y]>	Wilcoxon test	0.78727	87.5000

[1] 为简单起见，这里坚持两组，因为测试多个组之间的差异需要额外的统计检查（Bonferroni 校正）或不同的方法（ANOVA）。

如何创建各种测试结果的数据框？

How to create a data frame of various test results?

statistics

inference

r

dataframe

对数据进行分组

数值`groupingVariable`s

运行测试

如何创建各种测试结果的数据框？

How to create a data frame of various test results?

statistics

inference

r

dataframe

对数据进行分组

数值groupingVariables

运行 测试

数值`groupingVariable`s

运行测试