当 运行 输出上的 t-test + tidy() 时,我收到错误消息

I am receiving an error when running a t-test + tidy() on the output

我正在尝试 运行 对以下数据进行 t 检验,但它返回了一条错误消息:

Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) : Calling var(x) on a factor x is defunct. Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.

library(tidyverse)
library(broom)

food_consumption <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv')

food_consumption %>% 
  mutate(vegan = if_else(food_category %in% c("Wheat and Wheat Products", "Rice", "Soybeans", "Nuts inc. Peanut Butter"), "Non-Animal Product", "Animal Product")) %>% 
  select(consumption, co2_emmission, vegan) %>% 
  pivot_longer(!vegan, names_to = "type", values_to = "value") %>%
  mutate(type = as.factor(type),
         vegan = as.factor(vegan)) %>%
  group_by(type) %>% 
  do(test = t.test(value~vegan, data = (.))) %>% 
  tidy(test)

有人知道这里发生了什么吗?以及如何在没有错误的情况下整理 t 检验输出?如果我在末尾排除 tidy(test) 位,则 t-test returns 两个列表对象按预期进行,但如果我尝试调用 tidy() 它 returns 上面的错误。

我正在学习 运行 完全相同代码的教程(除了它使用 gather 而不是 pivot_wider 但两者都产生相同的数据集)。 Timestamped link here.

  1. group_test - 通过 type 变量创建组

  2. map(data, ~t.test(value~vegan, date = .x) %>% tidy - 我们计算每个组t.test

  3. unnest(test) - 将结果展开为列

      mutate(vegan = if_else(food_category %in% c("Wheat and Wheat Products", "Rice", "Soybeans", "Nuts inc. Peanut Butter"), "Non-Animal Product", "Animal Product")) %>% 
      select(consumption, co2_emmission, vegan) %>% 
      pivot_longer(!vegan, names_to = "type", values_to = "value") %>%
      mutate(type = as.factor(type),
             vegan = as.factor(vegan)) %>%
      group_nest(type) %>% 
      transmute(type, test = map(data, ~t.test(value~vegan, data = .x) %>% tidy)) %>% 
      unnest(test)```
    
        # A tibble: 2 x 11
          type      estimate estimate1 estimate2 statistic  p.value parameter conf.low conf.high
          <fct>        <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>    <dbl>     <dbl>
        1 co2_emmi~    93.7      108.       14.7     15.3  1.25e-47      984.    81.7     106.  
        2 consumpt~     2.56      29.0      26.5      1.01 3.12e- 1     1334.    -2.40      7.53
        # ... with 2 more variables: method <chr>, alternative <chr>

你可以试试这个。我只更改了您代码的最后两行。

food_consumption %>% 
  mutate(vegan = if_else(food_category %in% c("Wheat and Wheat Products", "Rice", "Soybeans", "Nuts inc. Peanut Butter"), "Non-Animal Product", "Animal Product")) %>% 
  select(consumption, co2_emmission, vegan) %>% 
  pivot_longer(!vegan, names_to = "type", values_to = "value") %>%
  mutate(type = as.factor(type),
         vegan = as.factor(vegan)) %>%
  
  # here what I've changed!
  nest_by(type) %>% 
  summarise(tidy(t.test(value~vegan, data = data)), .groups = "drop")

#> # A tibble: 2 x 11
#>   type    estimate estimate1 estimate2 statistic  p.value parameter conf.low conf.high method   alternative
#>   <fct>      <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>    <dbl>     <dbl> <chr>    <chr>      
#> 1 co2_em~    93.7      108.       14.7     15.3  1.25e-47      984.    81.7     106.   Welch T~ two.sided  
#> 2 consum~     2.56      29.0      26.5      1.01 3.12e- 1     1334.    -2.40      7.53 Welch T~ two.sided 

正如您在 nest_by 中看到的那样,我 将数据框嵌套在 两行中(每个 type 一行)。然后你有一个包含两列的嵌套数据框。第二列是名为 data.

的数据帧列表

随着 dplyr 版本 > 1.0,summarise 现在更加灵活,您可以直接使用它来对每行进行 return 多于一个值的操作。查看 ?dplyr 了解更多信息。

因此,您可以像我一样直接对结果应用tidy

.groups = "drop" 只是为了删除来自 summarise 的烦人消息。