对数据框执行 chisq.test 以进行多重成对比较
Doing chisq.test on data frame for multiple pairwise comparisons
我有以下数据框:
species <- c("a","a","a","b","b","b","c","c","c","d","d","d","e","e","e","f","f","f","g","h","h","h","i","i","i")
category <- c("h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","l","h","l","m","h","l","m")
minus <- c(31,14,260,100,70,200,91,152,842,16,25,75,60,97,300,125,80,701,104,70,7,124,24,47,251)
plus <- c(2,0,5,0,1,1,4,4,30,1,0,0,2,0,5,0,0,3,0,0,0,0,0,0,4)
df <- cbind(species, category, minus, plus)
df<-as.data.frame(df)
我想为每个类别-物种组合做一个 chisq.test,如下所示:
物种 a,类别 h 和 l:p 值
物种 a,类别 h 和 m:p 值
物种 a,类别 l 和 m:p 值
物种 b,...等等
使用以下 chisq.test(虚拟代码):
chisq.test(c(minus(cat1, cat2),plus(cat1, cat2)))$p.value
我想以 table 结束,它显示每个比较的每个 chisq.test p 值,如下所示:
Species Category1 Category2 p-value
a h l 0.05
a h m 0.2
a l m 0.1
b...
其中类别和类别 2 是 chisq.test 中比较的类别。
这可以使用 dplyr 来实现吗?我已经尝试调整 and here 中提到的内容,但它们并不真正适用于这个问题,正如我所看到的那样。
编辑:我还想看看如何为以下数据集完成此操作:
species <- c(1:11)
minus <- c(132,78,254,12,45,76,89,90,100,42,120)
plus <- c(1,2,0,0,0,3,2,5,6,4,0)
我想做一个chisq。测试 table 中的每个物种与 table 中的每个其他物种的比较(所有物种的每个物种之间的成对比较)。我想以这样的方式结束:
species1 species2 p-value
1 2 0.5
1 3 0.7
1 4 0.2
...
11 10 0.02
我尝试将上面的代码更改为以下代码:
species_chisq %>%
do(data_frame(species1 = first(.$species),
species2 = last(.$species),
data = list(matrix(c(.$minus, .$plus), ncol = 2)))) %>%
mutate(chi_test = map(data, chisq.test, correct = FALSE)) %>%
mutate(p.value = map_dbl(chi_test, "p.value")) %>%
ungroup() %>%
select(species1, species2, p.value) %>%
然而,这只会创建一个 table,其中每个物种只与自身进行比较,而不是与其他物种进行比较。我不太明白@ycw 给出的原始代码中它指定比较的位置。
编辑 2:
我设法通过找到的代码做到了 。
首先,您应该使用 data.frame
创建 data.frame
,否则 minus
和 plus
列将变成 factor
。
species <- c("a","a","a","b","b","b","c","c","c","d","d","d","e","e","e","f","f","f","g","h","h","h","i","i","i")
category <- c("h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","l","h","l","m","h","l","m")
minus <- c(31,14,260,100,70,200,91,152,842,16,25,75,60,97,300,125,80,701,104,70,7,124,24,47,251)
plus <- c(2,0,5,0,1,1,4,4,30,1,0,0,2,0,5,0,0,3,0,0,0,0,0,0,4)
df <- data.frame(species=species, category=category, minus=minus, plus=plus)
然后,我不确定是否有一种纯粹的 dplyr
方式来做到这一点(很高兴看到相反的情况),但我认为这是一种部分 - dplyr
的方式这样做:
df_combinations <-
# create a df with all interactions
expand.grid(df$species, df$category, df$category)) %>%
# rename columns
`colnames<-`(c("species", "category1", "category2")) %>%
# 3 lines below:
# manage to only retain within a species, category(1 and 2) columns
# with different values
unique %>%
group_by(species) %>%
filter(category1 != category2) %>%
# cosmetics
arrange(species, category1, category2) %>%
ungroup() %>%
# prepare an empty column
mutate(p.value=NA)
# now we loop to fill your result data.frame
for (i in 1:nrow(df_combinations)){
# filter appropriate lines
cat1 <- filter(df,
species==df_combinations$species[i],
category==df_combinations$category1[i])
cat2 <- filter(df,
species==df_combinations$species[i],
category==df_combinations$category2[i])
# calculate the chisq.test and assign its p-value to the right line
df_combinations$p.value[i] <- chisq.test(c(cat1$minus, cat2$minus,
cat1$plus, cat2$plus))$p.value
}
让我们看看结果 data.frame
:
head(df_combinations)
# A tibble: 6 x 4
# A tibble: 6 x 4
# Groups: species [1]
species category1 category2 p.value
<fctr> <fctr> <fctr> <dbl>
1 a h l 3.290167e-11
2 a h m 1.225872e-134
3 a l h 3.290167e-11
4 a l m 5.824842e-150
5 a m h 1.225872e-134
6 a m l 5.824842e-150
检查第一行:
chisq.test(c(31, 14, 2, 0))$p.value
[1] 3.290167e-11
这是你想要的吗?
来自 dplyr
和 purrr
的解决方案。请注意,我不熟悉卡方检验,但我遵循您在@Vincent Bonhomme 的 post: chisq.test(test, correct = FALSE)
中指定的方式。
此外,要创建示例数据框,不需要使用cbind
,只需data.frame
就足够了。 stringsAsFactors = FALSE
对于防止列成为因素很重要。
# Create example data frame
species <- c("a","a","a","b","b","b","c","c","c","d","d","d","e","e","e","f","f","f","g","h","h","h","i","i","i")
category <- c("h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","l","h","l","m","h","l","m")
minus <- c(31,14,260,100,70,200,91,152,842,16,25,75,60,97,300,125,80,701,104,70,7,124,24,47,251)
plus <- c(2,0,5,0,1,1,4,4,30,1,0,0,2,0,5,0,0,3,0,0,0,0,0,0,4)
df <- data.frame(species, category, minus, plus, stringsAsFactors = FALSE)
# Load packages
library(dplyr)
library(purrr)
# Process the data
df2 <- df %>%
group_by(species) %>%
slice(c(1, 2, 1, 3, 2, 3)) %>%
mutate(test = rep(1:(n()/2), each = 2)) %>%
group_by(species, test) %>%
do(data_frame(species = first(.$species),
test = first(.$test[1]),
category1 = first(.$category),
category2 = last(.$category),
data = list(matrix(c(.$minus, .$plus), ncol = 2)))) %>%
mutate(chi_test = map(data, chisq.test, correct = FALSE)) %>%
mutate(p.value = map_dbl(chi_test, "p.value")) %>%
ungroup() %>%
select(species, category1, category2, p.value)
df2
# A tibble: 25 x 4
species category1 category2 p.value
<chr> <chr> <chr> <dbl>
1 a h l 0.3465104
2 a h m 0.1354680
3 a l m 0.6040227
4 b h l 0.2339414
5 b h m 0.4798647
6 b l m 0.4399181
7 c h l 0.4714005
8 c h m 0.6987413
9 c l m 0.5729834
10 d h l 0.2196806
# ... with 15 more rows
我有以下数据框:
species <- c("a","a","a","b","b","b","c","c","c","d","d","d","e","e","e","f","f","f","g","h","h","h","i","i","i")
category <- c("h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","l","h","l","m","h","l","m")
minus <- c(31,14,260,100,70,200,91,152,842,16,25,75,60,97,300,125,80,701,104,70,7,124,24,47,251)
plus <- c(2,0,5,0,1,1,4,4,30,1,0,0,2,0,5,0,0,3,0,0,0,0,0,0,4)
df <- cbind(species, category, minus, plus)
df<-as.data.frame(df)
我想为每个类别-物种组合做一个 chisq.test,如下所示:
物种 a,类别 h 和 l:p 值
物种 a,类别 h 和 m:p 值
物种 a,类别 l 和 m:p 值
物种 b,...等等
使用以下 chisq.test(虚拟代码):
chisq.test(c(minus(cat1, cat2),plus(cat1, cat2)))$p.value
我想以 table 结束,它显示每个比较的每个 chisq.test p 值,如下所示:
Species Category1 Category2 p-value
a h l 0.05
a h m 0.2
a l m 0.1
b...
其中类别和类别 2 是 chisq.test 中比较的类别。
这可以使用 dplyr 来实现吗?我已经尝试调整
编辑:我还想看看如何为以下数据集完成此操作:
species <- c(1:11)
minus <- c(132,78,254,12,45,76,89,90,100,42,120)
plus <- c(1,2,0,0,0,3,2,5,6,4,0)
我想做一个chisq。测试 table 中的每个物种与 table 中的每个其他物种的比较(所有物种的每个物种之间的成对比较)。我想以这样的方式结束:
species1 species2 p-value
1 2 0.5
1 3 0.7
1 4 0.2
...
11 10 0.02
我尝试将上面的代码更改为以下代码:
species_chisq %>%
do(data_frame(species1 = first(.$species),
species2 = last(.$species),
data = list(matrix(c(.$minus, .$plus), ncol = 2)))) %>%
mutate(chi_test = map(data, chisq.test, correct = FALSE)) %>%
mutate(p.value = map_dbl(chi_test, "p.value")) %>%
ungroup() %>%
select(species1, species2, p.value) %>%
然而,这只会创建一个 table,其中每个物种只与自身进行比较,而不是与其他物种进行比较。我不太明白@ycw 给出的原始代码中它指定比较的位置。
编辑 2:
我设法通过找到的代码做到了
首先,您应该使用 data.frame
创建 data.frame
,否则 minus
和 plus
列将变成 factor
。
species <- c("a","a","a","b","b","b","c","c","c","d","d","d","e","e","e","f","f","f","g","h","h","h","i","i","i")
category <- c("h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","l","h","l","m","h","l","m")
minus <- c(31,14,260,100,70,200,91,152,842,16,25,75,60,97,300,125,80,701,104,70,7,124,24,47,251)
plus <- c(2,0,5,0,1,1,4,4,30,1,0,0,2,0,5,0,0,3,0,0,0,0,0,0,4)
df <- data.frame(species=species, category=category, minus=minus, plus=plus)
然后,我不确定是否有一种纯粹的 dplyr
方式来做到这一点(很高兴看到相反的情况),但我认为这是一种部分 - dplyr
的方式这样做:
df_combinations <-
# create a df with all interactions
expand.grid(df$species, df$category, df$category)) %>%
# rename columns
`colnames<-`(c("species", "category1", "category2")) %>%
# 3 lines below:
# manage to only retain within a species, category(1 and 2) columns
# with different values
unique %>%
group_by(species) %>%
filter(category1 != category2) %>%
# cosmetics
arrange(species, category1, category2) %>%
ungroup() %>%
# prepare an empty column
mutate(p.value=NA)
# now we loop to fill your result data.frame
for (i in 1:nrow(df_combinations)){
# filter appropriate lines
cat1 <- filter(df,
species==df_combinations$species[i],
category==df_combinations$category1[i])
cat2 <- filter(df,
species==df_combinations$species[i],
category==df_combinations$category2[i])
# calculate the chisq.test and assign its p-value to the right line
df_combinations$p.value[i] <- chisq.test(c(cat1$minus, cat2$minus,
cat1$plus, cat2$plus))$p.value
}
让我们看看结果 data.frame
:
head(df_combinations)
# A tibble: 6 x 4
# A tibble: 6 x 4
# Groups: species [1]
species category1 category2 p.value
<fctr> <fctr> <fctr> <dbl>
1 a h l 3.290167e-11
2 a h m 1.225872e-134
3 a l h 3.290167e-11
4 a l m 5.824842e-150
5 a m h 1.225872e-134
6 a m l 5.824842e-150
检查第一行: chisq.test(c(31, 14, 2, 0))$p.value [1] 3.290167e-11
这是你想要的吗?
来自 dplyr
和 purrr
的解决方案。请注意,我不熟悉卡方检验,但我遵循您在@Vincent Bonhomme 的 post: chisq.test(test, correct = FALSE)
中指定的方式。
此外,要创建示例数据框,不需要使用cbind
,只需data.frame
就足够了。 stringsAsFactors = FALSE
对于防止列成为因素很重要。
# Create example data frame
species <- c("a","a","a","b","b","b","c","c","c","d","d","d","e","e","e","f","f","f","g","h","h","h","i","i","i")
category <- c("h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","h","l","m","l","h","l","m","h","l","m")
minus <- c(31,14,260,100,70,200,91,152,842,16,25,75,60,97,300,125,80,701,104,70,7,124,24,47,251)
plus <- c(2,0,5,0,1,1,4,4,30,1,0,0,2,0,5,0,0,3,0,0,0,0,0,0,4)
df <- data.frame(species, category, minus, plus, stringsAsFactors = FALSE)
# Load packages
library(dplyr)
library(purrr)
# Process the data
df2 <- df %>%
group_by(species) %>%
slice(c(1, 2, 1, 3, 2, 3)) %>%
mutate(test = rep(1:(n()/2), each = 2)) %>%
group_by(species, test) %>%
do(data_frame(species = first(.$species),
test = first(.$test[1]),
category1 = first(.$category),
category2 = last(.$category),
data = list(matrix(c(.$minus, .$plus), ncol = 2)))) %>%
mutate(chi_test = map(data, chisq.test, correct = FALSE)) %>%
mutate(p.value = map_dbl(chi_test, "p.value")) %>%
ungroup() %>%
select(species, category1, category2, p.value)
df2
# A tibble: 25 x 4
species category1 category2 p.value
<chr> <chr> <chr> <dbl>
1 a h l 0.3465104
2 a h m 0.1354680
3 a l m 0.6040227
4 b h l 0.2339414
5 b h m 0.4798647
6 b l m 0.4399181
7 c h l 0.4714005
8 c h m 0.6987413
9 c l m 0.5729834
10 d h l 0.2196806
# ... with 15 more rows