在给定其他变量的情况下显示一个变量的存在百分比
Show percentage of presence of one variable given other variables
我目前正在研究一些情节,但我遇到了一个问题,根据我目前的 ggplot2 知识,我现在无法解决。
我将尝试使用我在 R 中创建的虚构数据来解释我的问题。下面我将 str 命令的输出留在我虚构的数据框上:
'data.frame': 15 obs. of 4 variables:
$ x: Factor w/ 2 levels "0","1": 2 1 2 1 1 1 2 2 1 2 ...
$ y: Factor w/ 2 levels "0","1": 1 2 2 1 2 2 2 1 1 1 ...
$ w: Factor w/ 2 levels "0","1": 2 1 2 2 1 1 1 1 2 2 ...
$ z: Factor w/ 2 levels "0","1": 2 1 2 2 1 1 2 2 1 2 ...
如您所见,这些都是二分变量。让我们考虑我的因变量是 y。我想要做的情节是如下图所示的条形图:
很想拍这样的剧情。另一个像这样,但也添加了一个带有 y 流行度的条形图,将自变量(x、w 和 z)为 1 的组与自变量(x、w 和 z)为 0 的组进行比较。因此,在第二个想法中,它将是 6 个小节而不是 3 个小节。但是这两个想法中的任何一个都可以很好地满足我需要做的事情。预先感谢社区,你总是很有帮助。
示例数据:
d <- structure(list(x = structure(c(2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L), .Label = c("0", "1"), class = "factor"), y = structure(c(1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("0", "1"), class = "factor"), w = structure(c(2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label = c("0", "1"), class = "factor"), z = structure(c(2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA, -15L), class = "data.frame")
更新 - 其实我一开始误解了这个问题。 很棒,而我的 table 方法(留给后人)太笨拙了。
下面是将 zx8754 的答案翻译成 tidyverse。
library(tidyverse)
d <- structure(list(x = structure(c(2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L), .Label = c("0", "1"), class = "factor"), y = structure(c(1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("0", "1"), class = "factor"), w = structure(c(2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label = c("0", "1"), class = "factor"), z = structure(c(2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA, -15L), class = "data.frame")
d %>%
pivot_longer(-y, names_to = "var", values_to = "val") %>%
group_by(var, val) %>%
summarise(perc = sum(y == 1)/ n())%>%
ggplot(aes(var, perc)) +
geom_col(aes(fill = as.factor(val)), position = "dodge") +
scale_y_continuous(labels = scales::percent)
#> `summarise()` has grouped output by 'var'. You can override using the `.groups` argument.
由 reprex package (v1.0.0)
于 2021-04-07 创建
我会先 table,然后用它来绘制你的柱状图。
library(tidyverse)
d <- structure(list(x = structure(c(2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L), .Label = c("0", "1"), class = "factor"), y = structure(c(1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("0", "1"), class = "factor"), w = structure(c(2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label = c("0", "1"), class = "factor"), z = structure(c(2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA, -15L), class = "data.frame")
y <- d$y
tab_df <- data.frame(apply(d[c("x", "w", "z")], 2, function(x) {
tab <- table(x[x != 0], y[x != 0])# first row will be y = 0
tab / sum(tab) # for percentages
}))
tab_df %>%
mutate(y = 0:1) %>%
pivot_longer(-y, names_to = "var", values_to = "percentage") %>%
ggplot(aes(var, percentage)) +
geom_col(aes(fill = as.factor(y))) +
scale_y_continuous(labels = scales::percent)
由 reprex package (v1.0.0)
于 2021-04-07 创建
将宽转换为长,然后在每组 y 为一个时获取摘要:
library(data.table)
library(ggplot2)
# wide to long
setDT(d)
plotDat <- melt(d, id.vars = "y"
)[ , .(yPC = sum(y == "1")/.N * 100),
by = .(variable, value)]
ggplot(plotDat, aes(variable, yPC, fill = value)) +
geom_bar(stat = "identity", position = "dodge")
我和 stat_summary
一起做第一个问题时觉得有点有趣 - 不过看起来已经有一些很好的答案了,所以只是出于兴趣而注意到这个问题。
(经过编辑以更好地纠正百分比文本的垂直位置)
library(ggplot2)
library(dplyr)
library(tidyr)
d %>%
pivot_longer(-y, names_to = "variable", values_to = "values") %>%
ggplot(aes(variable, y = as.numeric(values))) +
stat_summary(
aes(label = scales::percent(after_stat(y))),
geom = "text",
fun = ~ sum(.x == 1) / 15,
vjust = -1
) +
stat_summary(geom = "bar", fun = ~ sum(.x == 1) / 15) +
scale_y_continuous("Prevalence of 1s", labels = scales::percent,
expand = expansion(add = c(NA, 0.05)))
由 reprex package (v2.0.0)
于 2021-04-07 创建
我目前正在研究一些情节,但我遇到了一个问题,根据我目前的 ggplot2 知识,我现在无法解决。
我将尝试使用我在 R 中创建的虚构数据来解释我的问题。下面我将 str 命令的输出留在我虚构的数据框上:
'data.frame': 15 obs. of 4 variables:
$ x: Factor w/ 2 levels "0","1": 2 1 2 1 1 1 2 2 1 2 ...
$ y: Factor w/ 2 levels "0","1": 1 2 2 1 2 2 2 1 1 1 ...
$ w: Factor w/ 2 levels "0","1": 2 1 2 2 1 1 1 1 2 2 ...
$ z: Factor w/ 2 levels "0","1": 2 1 2 2 1 1 2 2 1 2 ...
如您所见,这些都是二分变量。让我们考虑我的因变量是 y。我想要做的情节是如下图所示的条形图:
很想拍这样的剧情。另一个像这样,但也添加了一个带有 y 流行度的条形图,将自变量(x、w 和 z)为 1 的组与自变量(x、w 和 z)为 0 的组进行比较。因此,在第二个想法中,它将是 6 个小节而不是 3 个小节。但是这两个想法中的任何一个都可以很好地满足我需要做的事情。预先感谢社区,你总是很有帮助。
示例数据:
d <- structure(list(x = structure(c(2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L), .Label = c("0", "1"), class = "factor"), y = structure(c(1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("0", "1"), class = "factor"), w = structure(c(2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label = c("0", "1"), class = "factor"), z = structure(c(2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA, -15L), class = "data.frame")
更新 - 其实我一开始误解了这个问题。
下面是将 zx8754 的答案翻译成 tidyverse。
library(tidyverse)
d <- structure(list(x = structure(c(2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L), .Label = c("0", "1"), class = "factor"), y = structure(c(1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("0", "1"), class = "factor"), w = structure(c(2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label = c("0", "1"), class = "factor"), z = structure(c(2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA, -15L), class = "data.frame")
d %>%
pivot_longer(-y, names_to = "var", values_to = "val") %>%
group_by(var, val) %>%
summarise(perc = sum(y == 1)/ n())%>%
ggplot(aes(var, perc)) +
geom_col(aes(fill = as.factor(val)), position = "dodge") +
scale_y_continuous(labels = scales::percent)
#> `summarise()` has grouped output by 'var'. You can override using the `.groups` argument.
由 reprex package (v1.0.0)
于 2021-04-07 创建我会先 table,然后用它来绘制你的柱状图。
library(tidyverse)
d <- structure(list(x = structure(c(2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L), .Label = c("0", "1"), class = "factor"), y = structure(c(1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L), .Label = c("0", "1"), class = "factor"), w = structure(c(2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label = c("0", "1"), class = "factor"), z = structure(c(2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA, -15L), class = "data.frame")
y <- d$y
tab_df <- data.frame(apply(d[c("x", "w", "z")], 2, function(x) {
tab <- table(x[x != 0], y[x != 0])# first row will be y = 0
tab / sum(tab) # for percentages
}))
tab_df %>%
mutate(y = 0:1) %>%
pivot_longer(-y, names_to = "var", values_to = "percentage") %>%
ggplot(aes(var, percentage)) +
geom_col(aes(fill = as.factor(y))) +
scale_y_continuous(labels = scales::percent)
由 reprex package (v1.0.0)
于 2021-04-07 创建将宽转换为长,然后在每组 y 为一个时获取摘要:
library(data.table)
library(ggplot2)
# wide to long
setDT(d)
plotDat <- melt(d, id.vars = "y"
)[ , .(yPC = sum(y == "1")/.N * 100),
by = .(variable, value)]
ggplot(plotDat, aes(variable, yPC, fill = value)) +
geom_bar(stat = "identity", position = "dodge")
我和 stat_summary
一起做第一个问题时觉得有点有趣 - 不过看起来已经有一些很好的答案了,所以只是出于兴趣而注意到这个问题。
(经过编辑以更好地纠正百分比文本的垂直位置)
library(ggplot2)
library(dplyr)
library(tidyr)
d %>%
pivot_longer(-y, names_to = "variable", values_to = "values") %>%
ggplot(aes(variable, y = as.numeric(values))) +
stat_summary(
aes(label = scales::percent(after_stat(y))),
geom = "text",
fun = ~ sum(.x == 1) / 15,
vjust = -1
) +
stat_summary(geom = "bar", fun = ~ sum(.x == 1) / 15) +
scale_y_continuous("Prevalence of 1s", labels = scales::percent,
expand = expansion(add = c(NA, 0.05)))
由 reprex package (v2.0.0)
于 2021-04-07 创建