有没有办法绘制两个变量出现在 R 中的实例?
Is there a way to plot the instances that two variables appear in R?
我有一个如下所示的数据集:
english math science history art geography
<fct> <fct> <fct> <fct> <fct> <fct>
1 1 1 0 1 1 0
2 0 0 0 1 0 1
3 1 0 1 0 0 1
4 0 1 0 1 1 0
5 1 1 0 0 0 0
6 1 1 1 0 1 1
7 1 1 0 0 1 1
8 1 1 0 0 0 1
9 0 0 0 1 0 0
10 1 0 1 1 1 0
11 1 0 0 1 1 0
我正在尝试计算整个数据框中出现两个变量的实例,例如:数学和英语的值都为 1,有 5 个实例。
我可以使用此代码计算所有实例:,并且可以对所有主题执行此操作
sum(df$english==1 & df$math==1)
但是,我正在尝试创建一个看起来像这样的图表 graph;这可能在 R 中做吗?我尝试过使用 ggplot,但不确定如何创建它?
数据帧的代码是这样的:
structure(list(english = structure(c(2L, 1L, 2L, 1L, 2L, 2L,
2L, 2L, 1L, 2L, 2L), .Label = c("0", "1"), class = "factor"),
math = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L), .Label = c("0", "1"), class = "factor"), science = structure(c(1L,
1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L), .Label = c("0",
"1"), class = "factor"), history = structure(c(2L, 2L, 1L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor"),
art = structure(c(2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L,
2L), .Label = c("0", "1"), class = "factor"), geography = structure(c(1L,
2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("0",
"1"), class = "factor")), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
实现你想要的结果的一个选择是通过 widyr
包,这使得通过 widyr::pairwise_count
和 returns 计算计数变得容易,结果以一种整洁的数据格式可以通过 ggplot2
:
轻松绘制
- 为观察结果添加标识符变量
- 将您的数据框转换为长或整齐的格式,例如使用
tidyr::pivot_longer
- 过滤数据并计算计数
- 情节
library(widyr)
library(dplyr)
library(tidyr)
library(ggplot2)
dd <- d %>%
mutate(id = row_number()) %>%
pivot_longer(-id) %>%
filter(value == 1) %>%
pairwise_count(name, id)
ggplot(dd, aes(item1, item2)) +
geom_point(aes(size = n), color = "steelblue") +
geom_text(aes(label = n), show.legend = FALSE) +
scale_size_area(max_size = 10) +
guides(size = "none")
我有一个如下所示的数据集:
english math science history art geography
<fct> <fct> <fct> <fct> <fct> <fct>
1 1 1 0 1 1 0
2 0 0 0 1 0 1
3 1 0 1 0 0 1
4 0 1 0 1 1 0
5 1 1 0 0 0 0
6 1 1 1 0 1 1
7 1 1 0 0 1 1
8 1 1 0 0 0 1
9 0 0 0 1 0 0
10 1 0 1 1 1 0
11 1 0 0 1 1 0
我正在尝试计算整个数据框中出现两个变量的实例,例如:数学和英语的值都为 1,有 5 个实例。
我可以使用此代码计算所有实例:,并且可以对所有主题执行此操作
sum(df$english==1 & df$math==1)
但是,我正在尝试创建一个看起来像这样的图表 graph;这可能在 R 中做吗?我尝试过使用 ggplot,但不确定如何创建它?
数据帧的代码是这样的:
structure(list(english = structure(c(2L, 1L, 2L, 1L, 2L, 2L,
2L, 2L, 1L, 2L, 2L), .Label = c("0", "1"), class = "factor"),
math = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L), .Label = c("0", "1"), class = "factor"), science = structure(c(1L,
1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L), .Label = c("0",
"1"), class = "factor"), history = structure(c(2L, 2L, 1L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor"),
art = structure(c(2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L,
2L), .Label = c("0", "1"), class = "factor"), geography = structure(c(1L,
2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("0",
"1"), class = "factor")), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
实现你想要的结果的一个选择是通过 widyr
包,这使得通过 widyr::pairwise_count
和 returns 计算计数变得容易,结果以一种整洁的数据格式可以通过 ggplot2
:
- 为观察结果添加标识符变量
- 将您的数据框转换为长或整齐的格式,例如使用
tidyr::pivot_longer
- 过滤数据并计算计数
- 情节
library(widyr)
library(dplyr)
library(tidyr)
library(ggplot2)
dd <- d %>%
mutate(id = row_number()) %>%
pivot_longer(-id) %>%
filter(value == 1) %>%
pairwise_count(name, id)
ggplot(dd, aes(item1, item2)) +
geom_point(aes(size = n), color = "steelblue") +
geom_text(aes(label = n), show.legend = FALSE) +
scale_size_area(max_size = 10) +
guides(size = "none")