ggplot：在具有多个级别的分类变量图表中显示 % 而不是计数

Question

我想创建一个这样的条形图：

library(ggplot2)

# Dodged bar charts
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar(position="dodge")

但是，我希望通过切割类别 ('fair'、'good'、'very good' 来获得属于每个 'clarity' 类别的观察百分比，而不是计数。 ..).

有了这个...

# Dodged bar charts
ggplot(diamonds, aes(clarity, fill=cut)) + 
geom_bar(aes(y = (..count..)/sum(..count..)), position="dodge")

我在 y 轴上得到百分比，但这些百分比忽略了切割因子。我希望所有红色条的总和为 1，所有黄色条的总和为 1，依此类推。

是否有一种无需手动准备数据即可完成这项工作的简单方法？

谢谢！

P.S.: 这是Whosebug question

的后续

Answer 1

您可以使用 sjPlot-package 中的 sjp.xtab 为此：

sjp.xtab(diamonds$clarity, 
         diamonds$cut, 
         showValueLabels = F, 
         tableIndex = "row", 
         barPosition = "stack")

总和为100%的stacked group-percentages的数据准备应该是：

data.frame(prop.table(table(diamonds$clarity, diamonds$cut),1))

因此，你可以这样写

mydf <- data.frame(prop.table(table(diamonds$clarity, diamonds$cut),1))
ggplot(mydf, aes(Var1, Freq, fill = Var2)) + 
  geom_bar(position = "stack", stat = "identity") +
  scale_y_continuous(labels=scales::percent)

编辑：这个将每个类别（一般，良好...）相加到 100%，使用 2 in prop.table 和 position = "dodge":

mydf <- data.frame(prop.table(table(diamonds$clarity, diamonds$cut),2))
ggplot(mydf, aes(Var1, Freq, fill = Var2)) + 
    geom_bar(position = "dodge", stat = "identity") +
    scale_y_continuous(labels=scales::percent)

或

sjp.xtab(diamonds$clarity, 
         diamonds$cut, 
         showValueLabels = F, 
         tableIndex = "col")

用 dplyr 验证最后一个例子，总结每组内的百分比：

library(dplyr)
mydf %>% group_by(Var2) %>% summarise(percsum = sum(Freq))

>        Var2 percsum
> 1      Fair       1
> 2      Good       1
> 3 Very Good       1
> 4   Premium       1
> 5     Ideal       1

（请参阅 this page 了解更多情节选项和来自 sjp.xtab 的示例...）

ggplot：在具有多个级别的分类变量图表中显示 % 而不是计数

ggplot: showing % instead of counts in charts of categorical variables with multiple levels

r

bar-chart

ggplot2

sjplot