使用 R 进行多变量统计分析。当行和列都是组时如何查看显着差异(按类别排序)
Multivariate statistical analysis using R.. How to see significant difference when rows and columns both are groups (Categories-Ordered)
我的数据如下所示 其中 DFD 是我的数据框。
DFD
Names BP jobcode bp_Category
1 A 100 Doctor low_BP
2 B 150 Doctor Medium_BP
3 C 200 Engineer High_BP
4 D 110 Engineer low_BP
5 E 160 Student Medium_BP
以下是我如何获得患有低、高和中 BP 的每个工作代码的百分比。
tabLE<-table(DFD$bp_Category,DFD$jobcode)
> prop.table(tabLE,2)*100
Doctor Engineer Student
low_BP 50 50 0
Medium_BP 50 0 100
High_BP 0 50 0
我想问一下,对于所有三个 bp_categories,我如何以及通过哪个统计测试可以分别看到三个工作代码之间的显着差异。例如,我想看看工程师在博士和学生中的 Medium_BP 百分比是否显着最高?
Data
Names<-c("A","B","C","D","E")
BP<-c(100,150,200,110,160)
jobcode<-c("Doctor","Doctor","Engineer","Engineer","Student")
jobcode<-ordered(jobcode)
DFD<-data.frame(Names,BP,jobcode)
DFD$bp_Category[DFD$BP<140]<-"low_BP"
DFD$bp_Category[DFD$BP<170 & DFD$BP>140]<-"Medium_BP"
DFD$bp_Category[DFD$BP<201 & DFD$BP>170]<-"High_BP"
DFD$bp_Category<-ordered(DFD$bp_Category, levels = c("low_BP","Medium_BP","High_BP"))
tabDFD <- with(DFD, table(DFD$bp_Category,DFD$jobcode))
tabLE<-table(DFD$bp_Category,DFD$jobcode)
prop.table(tabLE,2)*100
使用模拟数据集,其中BP和职业的比例大致相等:
set.seed(111)
DFD = data.frame(jobcode = sample(c("Doctor","Engineer","Student"),10000,replace=TRUE),
bp_Category = sample(c("low_BP","Medium_BP","High_BP"),10000,replace=TRUE)
)
由于这是在 null 下模拟的,您会看到它大约为 33%:
tabDFD <- with(DFD, table(DFD$bp_Category,DFD$jobcode))
tabLE<-table(DFD$bp_Category,DFD$jobcode)
prop.table(tabLE,2)*100
Doctor Engineer Student
High_BP 32.81156 33.89058 32.96930
low_BP 33.68453 32.73556 33.82527
Medium_BP 33.50391 33.37386 33.20543
我们可以对每一行进行卡方检验,但我们需要知道博士、工程师、学生的预期比例,所以我们得到:
probs = colSums(tabLE)/sum(tabLE)
然后对于每一行,我们测试每个单元格偏离我们预期的程度:
library(broom)
library(purrr)
results = split(as.matrix(tabLE),rownames(tabLE)) %>%
map_dfr(~tidy(chisq.test(.x,p=probs)),.id="BP")
results
# A tibble: 3 x 5
BP statistic p.value parameter method
<chr> <dbl> <dbl> <dbl> <chr>
1 High_BP 0.676 0.713 2 Chi-squared test for given probabilities
2 low_BP 0.697 0.706 2 Chi-squared test for given probabilities
3 Medium_BP 0.0451 0.978 2 Chi-squared test for given probabilities
我的数据如下所示 其中 DFD 是我的数据框。
DFD
Names BP jobcode bp_Category
1 A 100 Doctor low_BP
2 B 150 Doctor Medium_BP
3 C 200 Engineer High_BP
4 D 110 Engineer low_BP
5 E 160 Student Medium_BP
以下是我如何获得患有低、高和中 BP 的每个工作代码的百分比。
tabLE<-table(DFD$bp_Category,DFD$jobcode)
> prop.table(tabLE,2)*100
Doctor Engineer Student
low_BP 50 50 0
Medium_BP 50 0 100
High_BP 0 50 0
我想问一下,对于所有三个 bp_categories,我如何以及通过哪个统计测试可以分别看到三个工作代码之间的显着差异。例如,我想看看工程师在博士和学生中的 Medium_BP 百分比是否显着最高?
Data
Names<-c("A","B","C","D","E")
BP<-c(100,150,200,110,160)
jobcode<-c("Doctor","Doctor","Engineer","Engineer","Student")
jobcode<-ordered(jobcode)
DFD<-data.frame(Names,BP,jobcode)
DFD$bp_Category[DFD$BP<140]<-"low_BP"
DFD$bp_Category[DFD$BP<170 & DFD$BP>140]<-"Medium_BP"
DFD$bp_Category[DFD$BP<201 & DFD$BP>170]<-"High_BP"
DFD$bp_Category<-ordered(DFD$bp_Category, levels = c("low_BP","Medium_BP","High_BP"))
tabDFD <- with(DFD, table(DFD$bp_Category,DFD$jobcode))
tabLE<-table(DFD$bp_Category,DFD$jobcode)
prop.table(tabLE,2)*100
使用模拟数据集,其中BP和职业的比例大致相等:
set.seed(111)
DFD = data.frame(jobcode = sample(c("Doctor","Engineer","Student"),10000,replace=TRUE),
bp_Category = sample(c("low_BP","Medium_BP","High_BP"),10000,replace=TRUE)
)
由于这是在 null 下模拟的,您会看到它大约为 33%:
tabDFD <- with(DFD, table(DFD$bp_Category,DFD$jobcode))
tabLE<-table(DFD$bp_Category,DFD$jobcode)
prop.table(tabLE,2)*100
Doctor Engineer Student
High_BP 32.81156 33.89058 32.96930
low_BP 33.68453 32.73556 33.82527
Medium_BP 33.50391 33.37386 33.20543
我们可以对每一行进行卡方检验,但我们需要知道博士、工程师、学生的预期比例,所以我们得到:
probs = colSums(tabLE)/sum(tabLE)
然后对于每一行,我们测试每个单元格偏离我们预期的程度:
library(broom)
library(purrr)
results = split(as.matrix(tabLE),rownames(tabLE)) %>%
map_dfr(~tidy(chisq.test(.x,p=probs)),.id="BP")
results
# A tibble: 3 x 5
BP statistic p.value parameter method
<chr> <dbl> <dbl> <dbl> <chr>
1 High_BP 0.676 0.713 2 Chi-squared test for given probabilities
2 low_BP 0.697 0.706 2 Chi-squared test for given probabilities
3 Medium_BP 0.0451 0.978 2 Chi-squared test for given probabilities