R 中独立 t 检验的错误

Question

我刚刚开始使用 R，我需要你的帮助来执行独立样本 t 检验。我尝试了不同的代码，但我不断收到错误。数据集很大，由我的老师提供，它本质上是关于人们如何看待不同类型的幽默。我的任务是找出男性（编码为 5）和女性（编码为 4）在 imgagg1 变量上的区别。这是我尝试过的：

Xdata<-Xdata[-c(1,2,311,312,313,614,619,808,815),] # I eliminated these rows because of this error that I keep getting even after removing the rows: In mean.default(x) : argument is not numeric or logical: returning NA

Women<-Xdata[which(Xdata$gender=="4"),"imgagg1"]

Men<-Xdata[which(Xdata$gender=="5"),"imgagg1"]

t.test(Xdata$Women,Xdata$Men)

我收到以下错误：

Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In mean.default(x) : argument is not numeric or logical: returning NA
2: In mean.default(y) : argument is not numeric or logical: returning NA

我也试过了，但得到了同样的错误：

Xdata<-Xdata[-c(1,2,311,312,313,614,619,808,815),]
Women<-Xdata%>%
  filter(gender=="4")%>%
  pull(imgagg1)
Men<-Xdata%>%
  filter(gender=="5")%>%
  pull(imgagg1)
t.test(Women,Men)

有人可以告诉我我做错了什么吗？我一直在为此绞尽脑汁，但似乎无法正确处理。

Answer 1

我相信有两件事正在发生。如果你的数据结构是正确的，你的数字实际上被 R 认为是字符。另外，你对 t.test 的应用可能会有些混淆。您创建了两个单独的数据集，Men 和 Women - 然后您使用了 t.test(Xdata$Women,Xdata$Men) - 这是试图在数据集中找到变量 Men 或 Women Xdata，但这些变量不存在（Men 和 Women 是它们自己的数据集，只有一个变量，imgagg1）。

为了运行 t.test() 在您的示例数据上，我做了以下操作：

Xdata <- structure(list(gender = c(NA, "7", NA, "4", "4", "4", "5", "4",  "4", "4", "5", "5", "5", "4", "4", "4", "4", "4", "4", "5", "5",  "4", "6", "4", "4"), imgagg1 = c(NA, NA, NA, "5", "5", "4", "3",  "4", "1", "5", "4", "5", "6", "7", "4", "6", "3", "1", "5", "2",  "5", "6", "5", "7", "2")), row.names = c(NA, 25L), class = c("tbl_df",  "tbl", "data.frame"))

# Colums are currently character, Convert these two columns to numeric. Not the numbers here reflect the position in this simplified dataset. In the real dataset, you will want to identify them as `c(x,y)` assuming `gender` and `imgagg1` are in column number x and y, respectively.
Xdata[1:2] <- lapply(Xdata[1:2], as.numeric)

Women <- Xdata[which(Xdata$gender == 4),"imgagg1"]

Men <- Xdata[which(Xdata$gender == 5),"imgagg1"]

t.test(Women,Men)

# > t.test(Women,Men)
# 
# Welch Two Sample t-test
# 
# data:  Women and Men
# t = 0.21418, df = 12.083, p-value = 0.834
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#   -1.527540  1.860873
# sample estimates:
#   mean of x mean of y 
# 4.333333  4.166667

您也不需要删除步骤 Xdata[-c(1,2,311,312,313,614,619,808,815),] 中丢失的数据 - 函数 na.omit = TRUE 将（如您所猜！）省略 NA 值。数学函数的大多数函数将允许您像这样省略 NA 值（即 sum(x, na.omit = TRUE)）

希望这对您有所帮助，祝您好运！

R 中独立 t 检验的错误

Errors in independent t-test in R

r

t-test