R 数据框中不同组的卡方检验

Chi-square tests for different groups in a R dataframe

我有一个具有以下基本结构的巨大数据框:

data <- data.frame(species = factor(c(rep("species1", 4), rep("species2", 4), rep("species3", 4))),
                 trap = c(rep(c("A","B","C","D"), 3)),
                 count=c(6,3,7,9,5,3,6,6,5,8,1,3))
data

我想同时对每个单独物种的四个陷阱之间的物种计数数据进行卡方检验,但不是在它们之间。它可以通过以下代码为每个单独的物种解决,但由于我庞大的原始数据框,它不是适合我的解决方案。

chi_species1 <- xtabs(count~trap, data, 
                       subset = species=="species1")
chi_species1
chisq.test(chi_species1)

感谢您的帮助!!

你想要这样的东西:

library(dplyr)
data %>% 
  group_by(species) %>% 
  summarise(pvalue= chisq.test(count, trap)$p.value) 

输出:

# A tibble: 3 × 2
  species  pvalue
  <fct>     <dbl>
1 species1  0.213
2 species2  0.238
3 species3  0.213

基础

df <- data.frame(species = factor(c(rep("species1", 4), rep("species2", 4), rep("species3", 4))),
                   trap = c(rep(c("A","B","C","D"), 3)),
                   count=c(6,3,7,9,5,3,6,6,5,8,1,3))
df
#>     species trap count
#> 1  species1    A     6
#> 2  species1    B     3
#> 3  species1    C     7
#> 4  species1    D     9
#> 5  species2    A     5
#> 6  species2    B     3
#> 7  species2    C     6
#> 8  species2    D     6
#> 9  species3    A     5
#> 10 species3    B     8
#> 11 species3    C     1
#> 12 species3    D     3

species <- unique(df$species)

chi_species <- lapply(species, function(x) xtabs(count~trap, df, 
                      subset = species== x))

chi_species <- setNames(chi_species, species)

lapply(chi_species, chisq.test)

#> $species1
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  X[[i]]
#> X-squared = 3, df = 3, p-value = 0.3916
#> 
#> 
#> $species2
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  X[[i]]
#> X-squared = 1.2, df = 3, p-value = 0.753
#> 
#> 
#> $species3
#> 
#>  Chi-squared test for given probabilities
#> 
#> data:  X[[i]]
#> X-squared = 6.2941, df = 3, p-value = 0.09815

reprex package (v2.0.1)

于 2022-04-25 创建

tidyverse

df %>% 
  group_by(species, trap) %>% 
  summarise(count = sum(count)) %>% 
  summarise(pvalue= chisq.test(count)$p.value) 

# A tibble: 3 × 2
  species  pvalue
  <fct>     <dbl>
1 species1 0.392 
2 species2 0.753 
3 species3 0.0981