R & ggplot2 - 如何绘制二元变量分类分割的相对频率

R & ggplot2 - how to plot relative frequency of a categorical split by a binary variable

我可以轻松地绘制一个相对频率图,其中一个 'base' 类别沿 x 轴,另一个类别的频率为 y:

library(ggplot2)
ggplot(diamonds) +
  aes(x = cut, fill = color) +
  geom_bar(position = "fill")

现在假设我用二进制变量以某种方式拆分了分类变量:

diamonds <- data.frame(diamonds)
diamonds$binary_dummy <- sample(c(0,1), nrow(diamonds), replace = T)

如何绘制原始分类但现在显示颜色 ('color') 变量的拆分。最好这将由原始颜色的两种不同色调表示。

基本上我正在尝试重现这个情节:

正如您从图例中看到的那样,每个类别由 "NonSyn"/"Syn" 分开,并且每个分开的颜色为另一种不同颜色的 dark/light 阴影(例如 "regulatory proteins NonSyn" = 深粉色,"regulatory proteins Syn" = 浅粉色)。

如果您不介意手动设置调色板,您可以这样做:

library(ggplot2)
library(colorspace)

df <- data.frame(diamonds)
df$binary_dummy <- sample(c(0,1), nrow(df), replace = T)

pal <- scales::brewer_pal(palette = "Set1")(nlevels(df$color))
pal <- c(rbind(pal, darken(pal, amount = 0.2)))

ggplot(df, aes(x = cut, fill = interaction(binary_dummy, color))) +
  geom_bar(position = "fill") +
  scale_fill_manual(values = pal)

reprex package (v0.3.0)

于 2020-04-14 创建

编辑:要修复交互颜色关系,您可以设置命名调色板,例如:

pal <- setNames(pal, levels(interaction(df$binary_dummy, df$color)))

# Miss a level
df <- df[!(df$binary_dummy == 0 & df$color == "E"),]

ggplot(df, aes(x = cut, fill = interaction(binary_dummy, color))) +
  geom_bar(position = "fill") +
  scale_fill_manual(values = pal)

或者,您也可以设置音阶的间断点:

ggplot(df, aes(x = cut, fill = interaction(binary_dummy, color))) +
  geom_bar(position = "fill") +
  scale_fill_manual(values = pal, breaks = levels(interaction(df$binary_dummy, df$color)))