使用分类变量在 R 中执行 T 检验

performing a T-test in R with categorical variables

大家好,我正在尝试进行 t 检验,但似乎出了点问题... 数据如下:

pot pair    type    height
I   1   Cross   23,5
I   1   Self    17,375
I   2   Cross   12
I   2   Self    20,375

我执行的 t 检验为:

    darwin <- read.table("darwin.txt", header=T)
    plot(darwin$type, darwin$height, ylab="Height")
    darwin.no.outlier = subset(darwin, height>13)
    tapply(darwin.no.outlier$height, darwin.no.outlier$type, var) 
    t.test(darwin$height ~ darwin$type)

R 给我的错误如下:

错误

if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : 
  missing value where TRUE/FALSE needed

另外:警告信息:

1:在 mean.default(x) 中:argument is not numeric or logical: returning NA

2:在 var(x) 中:

Calling var(x) on a factor x is deprecated and will become an error.
  Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.

3:在 mean.default(y) 中:argument is not numeric or logical: returning NA

4:在 var(y) 中:

Calling var(x) on a factor x is deprecated and will become an error.
  Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.

问题出在您的小数位,在您的列 height 中,小数点是逗号而不是点。由于小数点的逗号分隔符,您的列正在转换为因数,因此出现错误。

导入数据时,在read.table中插入"dec = ","(文件中用于小数点的字符)。所以我用你的数据举例:

darwin <- read.table(text = "pot pair    type    height
I   1   Cross   23,5
           I   1   Self    17,375
           I   2   Cross   12
           I   2   Self    20,375", header = TRUE, dec = ",")

然后

的输出
t.test(darwin$height ~ darwin$type)

这是:

    Welch Two Sample t-test

data:  darwin$height by darwin$type
t = -0.18932, df = 1.1355, p-value = 0.878
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -58.34187  56.09187
sample estimates:
mean in group Cross  mean in group Self 
             17.750              18.875