如何交叉表观察具有多个类别的成员资格?
How do I crosstab observations with membership in multiple categories?
我有一个数据集,其中包含混合了互斥和非互斥类别的观察结果。例如,假设不存在混血种族而存在多重国籍,那么数据集看起来像这样:
id white hispanic asian usa canada uk
1 0 1 0 1 0 1
2 1 0 0 0 1 0
3 0 0 1 1 0 1
4 1 0 0 1 1 0
5 0 1 0 0 0 1
6 0 0 1 0 0 1
如您所见,任何人 person/observation 只有一个种族,但可以拥有多个国籍。我希望通过公民身份来分解种族并产生这样的东西:
usa canada uk total
white 1 (33%) 2 (66%) 0 3
hispanic 1 (33%) 0 2 (66%) 3
asian 1 (33%) 0 2 (66%) 3
total 3 2 3
我如何编写一个循环来汇总所有类别,以便我可以在种族和公民身份之间进行交叉表(可以重复计算)?
任何关于此类数据可视化的 advice/suggestion 将不胜感激。非常感谢您的帮助!
根据我的理解,您可以将您的数据突变为整洁的格式,然后使用 janitor
获得交叉 table:
数据:
df <- data.frame(id = seq(1,6),
white = c(0,1,0,1,0,0),
hispanic = c(1,0,0,0,1,0),
asian = c(0,0,1,0,0,1),
usa = c(1,0,1,1,0,0),
canada = c(0,1,0,1,0,0),
uk = c(1,0,1,0,1,1))
代码:
library(tidyverse)
library(janitor)
df %>%
pivot_longer(cols = 2:4,names_to = "Origin") %>%
filter(value == 1) %>%
select(-value) %>%
pivot_longer(cols = 2:4, names_to = "ethnicity") %>%
filter(value == 1) %>%
select(-value) %>%
tabyl(Origin, ethnicity) %>%
adorn_totals(where = c("row","col")) %>%
adorn_percentages(denominator = "col") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front")
输出:
Origin canada uk usa Total
asian 0 (0%) 2 (50%) 1 (33%) 3 (33%)
hispanic 0 (0%) 2 (50%) 1 (33%) 3 (33%)
white 2 (100%) 0 (0%) 1 (33%) 3 (33%)
Total 2 (100%) 4 (100%) 3 (100%) 9 (100%)
我有一个数据集,其中包含混合了互斥和非互斥类别的观察结果。例如,假设不存在混血种族而存在多重国籍,那么数据集看起来像这样:
id white hispanic asian usa canada uk
1 0 1 0 1 0 1
2 1 0 0 0 1 0
3 0 0 1 1 0 1
4 1 0 0 1 1 0
5 0 1 0 0 0 1
6 0 0 1 0 0 1
如您所见,任何人 person/observation 只有一个种族,但可以拥有多个国籍。我希望通过公民身份来分解种族并产生这样的东西:
usa canada uk total
white 1 (33%) 2 (66%) 0 3
hispanic 1 (33%) 0 2 (66%) 3
asian 1 (33%) 0 2 (66%) 3
total 3 2 3
我如何编写一个循环来汇总所有类别,以便我可以在种族和公民身份之间进行交叉表(可以重复计算)?
任何关于此类数据可视化的 advice/suggestion 将不胜感激。非常感谢您的帮助!
根据我的理解,您可以将您的数据突变为整洁的格式,然后使用 janitor
获得交叉 table:
数据:
df <- data.frame(id = seq(1,6),
white = c(0,1,0,1,0,0),
hispanic = c(1,0,0,0,1,0),
asian = c(0,0,1,0,0,1),
usa = c(1,0,1,1,0,0),
canada = c(0,1,0,1,0,0),
uk = c(1,0,1,0,1,1))
代码:
library(tidyverse)
library(janitor)
df %>%
pivot_longer(cols = 2:4,names_to = "Origin") %>%
filter(value == 1) %>%
select(-value) %>%
pivot_longer(cols = 2:4, names_to = "ethnicity") %>%
filter(value == 1) %>%
select(-value) %>%
tabyl(Origin, ethnicity) %>%
adorn_totals(where = c("row","col")) %>%
adorn_percentages(denominator = "col") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front")
输出:
Origin canada uk usa Total
asian 0 (0%) 2 (50%) 1 (33%) 3 (33%)
hispanic 0 (0%) 2 (50%) 1 (33%) 3 (33%)
white 2 (100%) 0 (0%) 1 (33%) 3 (33%)
Total 2 (100%) 4 (100%) 3 (100%) 9 (100%)