在 R 中使用 t.test 的人口较多吗？如何告诉功能？

Question

我有一个关于使用 t.test 检查总体均值是否大于另一个的问题。

假设我在数据帧 d 中有 2 个变量：

Weight: Numerical variable (weight of people).
Anykids: Categorical variable that can be yes or no.

数据框应该是这样的：

Anykids Weight
yes     70
yes     84
no      66
...     ..

我想检查有 anykids = yes 的人的体重平均值是否大于有 anykids = no 的人。所以我会：

H0: m(weight_yes) = m(weight_no)
H1: m(weight_yes) > m(weight_no)

函数为：

t.test(weight~anykids, data = d, alternative = 'greater')

函数如何知道参数 greater 表示 anykids = yes 的组而不是 anykids = no 的组？

如果我想验证假设：

H0: m(weight_no) = m(weight_yes)
H1: m(weight_no) > m(weight_yes)

该函数将具有相同的参数。我怎么知道更大意味着 anykids = yes o anykids = no？

Answer 1

像许多有因素的事物一样，R 根据因素水平的顺序进行选择。在您的情况下，您可以使用 levels(Anykids) 检查以提前发现哪个将用作 x 与 y t.test() 函数，或者可能用 relevel().

改变顺序

但是 t-test() 结果也只会告诉您考虑了哪一个。在这里，在鸢尾花数据集中，versicolor 级别排在第一位，将考虑 versicolor 的均值是否 Sepal.Width 大于 virginica.

levels(iris$Species)
#> [1] "setosa"     "versicolor" "virginica"
test_data <- iris[iris$Species != 'setosa', ]
t.test(data = test_data, Sepal.Width ~ Species, alternative = "greater")
#> 
#>  Welch Two Sample t-test
#> 
#> data:  Sepal.Width by Species
#> t = -3.2058, df = 97.927, p-value = 0.9991
#> alternative hypothesis: true difference in means is greater than 0
#> 95 percent confidence interval:
#>  -0.3096707        Inf
#> sample estimates:
#> mean in group versicolor  mean in group virginica 
#>                    2.770                    2.974

在 R 中使用 t.test 的人口较多吗？如何告诉功能？

Which is the bigger population using t.test in R? How to tell to the function?

r

mean

population