有问题运行组比较 RMANOVA 的 Shapiro-Wilks 检验

Question

我目前正在使用 datarium 包中的“weightloss”数据集来启动运行 RMANOVA。这是输出：

dput(head(weightloss))
structure(list(id = structure(1:6, .Label = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12"), class = "factor"), 
    diet = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("no", 
    "yes"), class = "factor"), exercises = structure(c(1L, 1L, 
    1L, 1L, 1L, 1L), .Label = c("no", "yes"), class = "factor"), 
    t1 = c(10.43, 11.59, 11.35, 11.12, 9.5, 9.5), t2 = c(13.21, 
    10.66, 11.12, 9.5, 9.73, 12.74), t3 = c(11.59, 13.21, 11.35, 
    11.12, 12.28, 10.43)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

这是我到目前为止想出的脚本：

# Create Data Frame for Dataset:

weight <- weightloss
weight

# Pivot Longer Data to Create Factors and Scores：

weight <- weight %>% 
  pivot_longer(names_to = 'trial', # creates factor (x)
               values_to = 'value', # creates value (y)
               cols = t1:t3) # finds which cols to factor

# Plot Means in Boxplot:

ggplot(weight,
       aes(x=trial,y=value))+
  geom_boxplot()+
  labs(title = "Trial Means") # As can be predicted, inc w/time

我得到了这个看起来很正常的箱线图：

现在是时候找出异常值并测试正态性了。

# Identify Outliers (Should be None Given Boxplot):
    
    outlier <- weight %>% 
      group_by(trial) %>% 
      identify_outliers(value)
    outlier_frame <- data.frame(outlier) 
    outlier_frame # none found :)

# Normality (Shapiro-Wilk and QQPlot):

model <- lm(value~trial,
            data = weight) # creates model
shapiro_test(residuals(model)) # measures Shapiro
ggqqplot(residuals(model))+
  labs(title = "QQ Plot of Residuals") # creates QQ

这又给了我一个非常正常的 QQplot:

然后我通过试验包装了数据：

ggqqplot(weight, "value", ggtheme = theme_bw())+
  facet_wrap(~trial)+
labs(title = "QQPlot of Each Trial") #looks normal

据我所知：

但是，当我尝试按组进行 Shapiro Wilk 测试时，我一直遇到此代码的问题：

shapiro_group <- weight %>%
  group_by(trial) %>%
  shapiro_test(value)

它给我这个错误：

Error: Problem with mutate() column data. i data = map(.data$data, .f, ...). x Must group by variables found in .data.

Column variable is not found.

我也试过这个：

shapiro_test(weight, trial$value)

并得到这个错误：

Error: Can't subset columns that don't exist. x Column trial$value doesn't exist.

如果有人知道原因，我将不胜感激！

Answer 1

您收到 shapiro_test 错误的原因是它的实现中有这一行。

shapiro_test
function (data, ..., vars = NULL) 
{
....
....
 data <- data %>% gather(key = "variable", value = "value") %>% 
        filter(!is.na(value))
....
....
}

它使用 gather 以长格式获取数据。因为您已经有一个名为 value 的列，所以这不起作用。

如果您将 value 列的名称更改为其他名称，它会起作用。

library(dplyr)
library(rstatix)

weight %>%
  rename(value1 = value) %>%
  group_by(trial) %>%
  shapiro_test(value1)

#  trial variable statistic     p
#  <chr> <chr>        <dbl> <dbl>
#1 t1    value1       0.869 0.222
#2 t2    value1       0.910 0.440
#3 t3    value1       0.971 0.897

有问题运行组比较 RMANOVA 的 Shapiro-Wilks 检验

Having issues running group comparison Shapiro-Wilks test for RMANOVA

r

anova

rstatix

有问题 运行 组比较 RMANOVA 的 Shapiro-Wilks 检验

Having issues running group comparison Shapiro-Wilks test for RMANOVA

r

anova

rstatix

有问题运行组比较 RMANOVA 的 Shapiro-Wilks 检验