隔离具有显着 p 值的图
Isolating graphs with significant p values
首先,数据来自us_contagious_diseases数据集,包是tidyverse和ggpubr
library(dslabs)
library(ggpubr)
library(tidyverse)
data("us_contagious_diseases")
我通过以下代码修改了这个数据集:
sdf <- us_contagious_diseases %>% filter(., disease == 'Rubella' | disease == 'Mumps') %>% transmute(., disease, count, population, state)
然后我创建了一个箱线图来比较每个州的风疹和腮腺炎病例数:
sdf_plot <- ggplot(sdf, mapping = aes(x = disease, y = count)) + geom_boxplot(outlier.shape = NA) + facet_wrap('state', scales = 'free') + stat_compare_means(method = 't.test', label.y.npc = 0.8)
问题是,这个图中有五十一个情节!!!包含在我的报告中的内容太庞大了。更重要的是,其中许多比较没有显着的 p 值。有什么方法可以只提取 p 值小于 0.01 的图吗?
我猜你需要 pre-calculate p-values:
library(broom)
res = sdf %>% group_by(state) %>% do(tidy(t.test(count~disease,data=.)))
head(res)
# A tibble: 6 x 11
# Groups: state [6]
state estimate estimate1 estimate2 statistic p.value parameter conf.low
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Alab… 125. 180. 55.4 2.46 0.0181 42.7 22.4
2 Alas… 66.1 104. 38.2 1.52 0.136 45.4 -21.7
3 Ariz… 78.6 266. 187. 0.657 0.513 68.0 -160.
4 Arka… 84.3 113. 28.4 2.87 0.00628 45.5 25.1
5 Cali… 386. 1915. 1529. 0.540 0.592 59.3 -1046.
6 Colo… 95.0 314. 219. 0.762 0.449 62.6 -154.
keep = res$state[res$p.value<0.01]
[1] Arkansas District Of Columbia Georgia
[4] Kansas Maryland Nevada
[7] Ohio
然后使用这个过滤器绘图:
sdf_plot <- ggplot(subset(sdf,state %in% keep),aes(x = disease, y = count)) +
geom_boxplot(outlier.shape = NA) +
facet_wrap('state', scales = 'free') +
stat_compare_means(method = 't.test', label.y.npc = 0.8)
首先,数据来自us_contagious_diseases数据集,包是tidyverse和ggpubr
library(dslabs)
library(ggpubr)
library(tidyverse)
data("us_contagious_diseases")
我通过以下代码修改了这个数据集:
sdf <- us_contagious_diseases %>% filter(., disease == 'Rubella' | disease == 'Mumps') %>% transmute(., disease, count, population, state)
然后我创建了一个箱线图来比较每个州的风疹和腮腺炎病例数:
sdf_plot <- ggplot(sdf, mapping = aes(x = disease, y = count)) + geom_boxplot(outlier.shape = NA) + facet_wrap('state', scales = 'free') + stat_compare_means(method = 't.test', label.y.npc = 0.8)
问题是,这个图中有五十一个情节!!!包含在我的报告中的内容太庞大了。更重要的是,其中许多比较没有显着的 p 值。有什么方法可以只提取 p 值小于 0.01 的图吗?
我猜你需要 pre-calculate p-values:
library(broom)
res = sdf %>% group_by(state) %>% do(tidy(t.test(count~disease,data=.)))
head(res)
# A tibble: 6 x 11
# Groups: state [6]
state estimate estimate1 estimate2 statistic p.value parameter conf.low
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Alab… 125. 180. 55.4 2.46 0.0181 42.7 22.4
2 Alas… 66.1 104. 38.2 1.52 0.136 45.4 -21.7
3 Ariz… 78.6 266. 187. 0.657 0.513 68.0 -160.
4 Arka… 84.3 113. 28.4 2.87 0.00628 45.5 25.1
5 Cali… 386. 1915. 1529. 0.540 0.592 59.3 -1046.
6 Colo… 95.0 314. 219. 0.762 0.449 62.6 -154.
keep = res$state[res$p.value<0.01]
[1] Arkansas District Of Columbia Georgia
[4] Kansas Maryland Nevada
[7] Ohio
然后使用这个过滤器绘图:
sdf_plot <- ggplot(subset(sdf,state %in% keep),aes(x = disease, y = count)) +
geom_boxplot(outlier.shape = NA) +
facet_wrap('state', scales = 'free') +
stat_compare_means(method = 't.test', label.y.npc = 0.8)