pairwise.prop.test 在具有多个类别的 R 中
pairwise.prop.test in R with Multiple Categories
我有以下数据集,其中包含 19 家公司按种族划分的员工人数。
data <- matrix(c(6073,1033,1711,3920,3431,2178,357,757,301,332,4204,
364,1006,337,553,7352,690,1356,1910,2066,4695,776,
1267,575,454,3761,352,529,130,658,5523,468,652,146,
312,5027,657,356,107,804,4650,311,674,78,599,4581,
192,581,114,335,1176,65,121,67,195,3841,274,289,71,
425,6489,1912,1784,1041,1434,1487,148,121,62,72,
4130,170,365,353,479,5181,2260,1023,219,502,1286,
1288,890,423,285,2536,289,359,61,424,6237,1504,
1117,179,911),ncol=5,byrow=TRUE)
colnames(data) <- c("White","Black","Hispanic","Asian","Unknown")
rownames(data) <- c("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S")
data <- as.table(data)
我正在尝试使用 R 中的 pairwise.prop.table 测试来测试公司种族的差异,看看哪些差异显着。
当我运行:
pairwise.prop.test(数据[c("White","Black","Hispanic","Asian","Unknown")])
我得到 "Error in pairwise.prop.test(smoke[, c("WHITE_COUNT", "BLACK_COUNT", "HISP_COUNT", 'x' 必须有 2 列"
还有其他功能可以使用吗?我想比较每对公司的所有 5 场比赛。
如有任何帮助,我将不胜感激。谢谢!
正如成对文档所说,您的数据必须是
Vector of counts of successes or a matrix with 2 columns giving the
counts of successes and failures, respectively
如果按照错误中的说明将列数减少到两列,就会得到一个结果。
pairwise.prop.test(data[,c("White","Black")])
将导致:
Pairwise comparisons using Pairwise comparison of proportions
data: data[, c("White", "Black")]
A B C D E F G H I J K L M N
B 1.00000 - - - - - - - - - - - - -
C < 2e-16 3.2e-14 - - - - - - - - - - - -
D < 2e-16 6.1e-14 1.00000 - - - - - - - - - - -
E 1.00000 1.00000 < 2e-16 < 2e-16 - - - - - - - - - -
F < 2e-16 1.2e-10 1.00000 1.00000 2.8e-15 - - - - - - - - -
G < 2e-16 < 2e-16 1.00000 1.00000 < 2e-16 1.00000 - - - - - - - -
H 4.2e-05 0.04460 1.2e-07 5.2e-07 0.00159 7.6e-05 5.6e-10 - - - - - - -
I < 2e-16 < 2e-16 0.04410 8.2e-05 < 2e-16 0.00152 0.05631 < 2e-16 - - - - - -
J < 2e-16 < 2e-16 8.0e-14 < 2e-16 < 2e-16 < 2e-16 4.1e-14 < 2e-16 3.4e-05 - - - - -
K < 2e-16 6.1e-14 0.04410 0.00308 1.0e-15 0.00616 0.05631 3.6e-09 1.00000 1.00000 - - - -
L < 2e-16 < 2e-16 0.50026 0.00834 < 2e-16 0.04410 0.70329 3.3e-14 1.00000 2.0e-06 1.00000 - - -
M < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 - -
N 3.7e-07 6.8e-05 1.00000 1.00000 4.2e-06 1.00000 1.00000 0.12875 0.00597 5.4e-13 0.00571 0.05631 < 2e-16 -
O < 2e-16 < 2e-16 2.0e-13 < 2e-16 < 2e-16 2.5e-16 1.2e-13 < 2e-16 3.4e-05 1.00000 1.00000 2.1e-06 < 2e-16 7.2e-13
P < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16
Q < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16
R 8.3e-07 0.00079 0.03436 0.23508 2.0e-05 0.48752 0.00659 1.00000 2.4e-08 < 2e-16 1.4e-05 5.8e-06 < 2e-16 1.00000
S 2.1e-13 9.6e-08 < 2e-16 < 2e-16 3.2e-13 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 1.3e-05 < 2e-16
O P Q R
B - - - -
C - - - -
D - - - -
E - - - -
F - - - -
G - - - -
H - - - -
I - - - -
J - - - -
K - - - -
L - - - -
M - - - -
N - - - -
O - - - -
P < 2e-16 - - -
Q < 2e-16 < 2e-16 - -
R < 2e-16 < 2e-16 < 2e-16 -
S < 2e-16 < 2e-16 < 2e-16 < 2e-16
P value adjustment method: holm
我希望这可以是在黑暗中拍摄的。通过这种方式,您应该能够针对每个种族对公司之间的成对比较进行比较。事实上,您需要在多项分布之间执行多重比较。
脚步:
- 数据从宽格式转换为长格式;
- Poisson GLM 以频率作为结果,公司和种族作为协变量;
- emmeans 包用于成对比较
最终输出是每场比赛的公司之间的对数赔率差异。
data <- matrix(c(6073,1033,1711,3920,3431,2178,357,757,301,332,4204,
364,1006,337,553,7352,690,1356,1910,2066,4695,776,
1267,575,454,3761,352,529,130,658,5523,468,652,146,
312,5027,657,356,107,804,4650,311,674,78,599,4581,
192,581,114,335,1176,65,121,67,195,3841,274,289,71,
425,6489,1912,1784,1041,1434,1487,148,121,62,72,
4130,170,365,353,479,5181,2260,1023,219,502,1286,
1288,890,423,285,2536,289,359,61,424,6237,1504,
1117,179,911),ncol=5,byrow=TRUE)
colnames(data) <- c("White","Black","Hispanic","Asian","Unknown")
rownames(data) <- c("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S")
data
typeof(data)
data <- as.data.frame(data)
library(tidyverse)
data2 <- data %>%
rownames_to_column(var="Firm") %>%
gather(key = Race, value = "n", White:Unknown, factor_key=F)
data2
fit <- glm(n ~ Firm+Race, data = data2, family = poisson)
fit
library(emmeans)
pairs(emmeans(fit, ~ Firm|Race))
我有以下数据集,其中包含 19 家公司按种族划分的员工人数。
data <- matrix(c(6073,1033,1711,3920,3431,2178,357,757,301,332,4204,
364,1006,337,553,7352,690,1356,1910,2066,4695,776,
1267,575,454,3761,352,529,130,658,5523,468,652,146,
312,5027,657,356,107,804,4650,311,674,78,599,4581,
192,581,114,335,1176,65,121,67,195,3841,274,289,71,
425,6489,1912,1784,1041,1434,1487,148,121,62,72,
4130,170,365,353,479,5181,2260,1023,219,502,1286,
1288,890,423,285,2536,289,359,61,424,6237,1504,
1117,179,911),ncol=5,byrow=TRUE)
colnames(data) <- c("White","Black","Hispanic","Asian","Unknown")
rownames(data) <- c("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S")
data <- as.table(data)
我正在尝试使用 R 中的 pairwise.prop.table 测试来测试公司种族的差异,看看哪些差异显着。
当我运行: pairwise.prop.test(数据[c("White","Black","Hispanic","Asian","Unknown")])
我得到 "Error in pairwise.prop.test(smoke[, c("WHITE_COUNT", "BLACK_COUNT", "HISP_COUNT", 'x' 必须有 2 列"
还有其他功能可以使用吗?我想比较每对公司的所有 5 场比赛。
如有任何帮助,我将不胜感激。谢谢!
正如成对文档所说,您的数据必须是
Vector of counts of successes or a matrix with 2 columns giving the counts of successes and failures, respectively
如果按照错误中的说明将列数减少到两列,就会得到一个结果。
pairwise.prop.test(data[,c("White","Black")])
将导致:
Pairwise comparisons using Pairwise comparison of proportions
data: data[, c("White", "Black")]
A B C D E F G H I J K L M N
B 1.00000 - - - - - - - - - - - - -
C < 2e-16 3.2e-14 - - - - - - - - - - - -
D < 2e-16 6.1e-14 1.00000 - - - - - - - - - - -
E 1.00000 1.00000 < 2e-16 < 2e-16 - - - - - - - - - -
F < 2e-16 1.2e-10 1.00000 1.00000 2.8e-15 - - - - - - - - -
G < 2e-16 < 2e-16 1.00000 1.00000 < 2e-16 1.00000 - - - - - - - -
H 4.2e-05 0.04460 1.2e-07 5.2e-07 0.00159 7.6e-05 5.6e-10 - - - - - - -
I < 2e-16 < 2e-16 0.04410 8.2e-05 < 2e-16 0.00152 0.05631 < 2e-16 - - - - - -
J < 2e-16 < 2e-16 8.0e-14 < 2e-16 < 2e-16 < 2e-16 4.1e-14 < 2e-16 3.4e-05 - - - - -
K < 2e-16 6.1e-14 0.04410 0.00308 1.0e-15 0.00616 0.05631 3.6e-09 1.00000 1.00000 - - - -
L < 2e-16 < 2e-16 0.50026 0.00834 < 2e-16 0.04410 0.70329 3.3e-14 1.00000 2.0e-06 1.00000 - - -
M < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 - -
N 3.7e-07 6.8e-05 1.00000 1.00000 4.2e-06 1.00000 1.00000 0.12875 0.00597 5.4e-13 0.00571 0.05631 < 2e-16 -
O < 2e-16 < 2e-16 2.0e-13 < 2e-16 < 2e-16 2.5e-16 1.2e-13 < 2e-16 3.4e-05 1.00000 1.00000 2.1e-06 < 2e-16 7.2e-13
P < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16
Q < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16
R 8.3e-07 0.00079 0.03436 0.23508 2.0e-05 0.48752 0.00659 1.00000 2.4e-08 < 2e-16 1.4e-05 5.8e-06 < 2e-16 1.00000
S 2.1e-13 9.6e-08 < 2e-16 < 2e-16 3.2e-13 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 1.3e-05 < 2e-16
O P Q R
B - - - -
C - - - -
D - - - -
E - - - -
F - - - -
G - - - -
H - - - -
I - - - -
J - - - -
K - - - -
L - - - -
M - - - -
N - - - -
O - - - -
P < 2e-16 - - -
Q < 2e-16 < 2e-16 - -
R < 2e-16 < 2e-16 < 2e-16 -
S < 2e-16 < 2e-16 < 2e-16 < 2e-16
P value adjustment method: holm
我希望这可以是在黑暗中拍摄的。通过这种方式,您应该能够针对每个种族对公司之间的成对比较进行比较。事实上,您需要在多项分布之间执行多重比较。 脚步: - 数据从宽格式转换为长格式; - Poisson GLM 以频率作为结果,公司和种族作为协变量; - emmeans 包用于成对比较 最终输出是每场比赛的公司之间的对数赔率差异。
data <- matrix(c(6073,1033,1711,3920,3431,2178,357,757,301,332,4204,
364,1006,337,553,7352,690,1356,1910,2066,4695,776,
1267,575,454,3761,352,529,130,658,5523,468,652,146,
312,5027,657,356,107,804,4650,311,674,78,599,4581,
192,581,114,335,1176,65,121,67,195,3841,274,289,71,
425,6489,1912,1784,1041,1434,1487,148,121,62,72,
4130,170,365,353,479,5181,2260,1023,219,502,1286,
1288,890,423,285,2536,289,359,61,424,6237,1504,
1117,179,911),ncol=5,byrow=TRUE)
colnames(data) <- c("White","Black","Hispanic","Asian","Unknown")
rownames(data) <- c("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S")
data
typeof(data)
data <- as.data.frame(data)
library(tidyverse)
data2 <- data %>%
rownames_to_column(var="Firm") %>%
gather(key = Race, value = "n", White:Unknown, factor_key=F)
data2
fit <- glm(n ~ Firm+Race, data = data2, family = poisson)
fit
library(emmeans)
pairs(emmeans(fit, ~ Firm|Race))