ggplot2 中的分组条形图

Grouped bar plot in ggplot2

我正在尝试制作一个包含长格式数据的分组条形图。

这是数据:

structure(list(group = c("group1", "group2", "group3", "group1", 
"group2", "group1", "group1", "group1", "group4", "group1", "group4", 
"group4", "group1", "group4", "group1", "group1", "group2", "group1", 
"group4", "group2", "group4", "group2", "group3", "group3", "group1", 
"group1", "group3", "group3", "group1", "group1", "group3", "group1", 
"group4", "group3", "group3", "group1", "group2", "group1", "group4", 
"group1", "group3", "group3", "group3", "group2", "group2", "group4", 
"group3", "group3", "group3", "group2", "group3", "group2", "group1", 
"group1", "group3", "group1", "group1", "group2", "group4", "group1", 
"group4", "group1", "group1", "group4", "group1", "group3", "group4", 
"group1", "group4", "group2", "group4", "group1", "group2", "group4", 
"group1", "group4", "group1", "group2", "group1", "group1", "group1", 
"group1", "group2", "group1", "group3", "group1", "group1", "group1", 
"group3", "group4", "group1", "group3", "group1", "group3", "group4", 
"group1", "group2", "group1", "group3", "group1"), category = c("category4", 
"category5", "category2", "category4", "category3", "category6", 
"category3", "category1", "category4", "category2", "category6", 
"category6", "category5", "category5", "category4", "category4", 
"category1", "category6", "category1", "category4", "category6", 
"category6", "category2", "category6", "category3", "category2", 
"category6", "category3", "category6", "category1", "category6", 
"category2", "category2", "category2", "category5", "category1", 
"category1", "category4", "category3", "category4", "category4", 
"category5", "category1", "category3", "category5", "category2", 
"category2", "category5", "category5", "category2", "category6", 
"category6", "category5", "category1", "category4", "category3", 
"category6", "category1", "category6", "category3", "category2", 
"category2", "category3", "category2", "category2", "category5", 
"category4", "category4", "category4", "category4", "category1", 
"category5", "category6", "category5", "category4", "category5", 
"category1", "category2", "category3", "category5", "category3", 
"category2", "category4", "category6", "category4", "category6", 
"category1", "category4", "category4", "category3", "category4", 
"category5", "category5", "category6", "category4", "category3", 
"category5", "category3", "category3", "category1"), count = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-100L), class = c("tbl_df", "tbl", "data.frame"))

当我运行以下内容时:

pivot_sample %>% 
  ggplot(aes(x=group,fill=category))+
  geom_bar()

stat_count() 默认函数似乎与默认 position="stack" 一起工作得很好 但是,当我在下面的代码中切换到 position="dodge" 时:

pivot_sample %>% 
  ggplot(aes(x=group,y=count,fill=category))+
  geom_bar(position = "dodge",stat = "identity")

它不会计算 count 变量。

我确信我缺少一些基本的东西,可以使用另一个视角。 我是否需要为 aes() 中的 y= 参数使用 count 函数?

我们将不胜感激!

OP,这里的简单答案就是将 position="dodge" 添加到您的原始情节代码中,并且可以根据组审美(未指定,因此默认为bar geom 使用 fill 美学作为分组依据):

pivot_sample %>%
  ggplot(aes(x=group, fill=category)) +
  geom_bar(position='dodge')

原因是 geom_barstat 参数的默认选项是 stat="count"。这将计算所有观察值并沿 y 轴绘制“计数”。要访问它,您可以使用 .. 表示法:..count..,但对于 geom_bar() 则没有必要。因此,下面的代码向您展示了一种显示相同情节的长表格:

pivot_sample %>%
ggplot(aes(x=group, fill=category)) +
  geom_bar(position='dodge', aes(y=..count..), stat="count")

请注意,您的数据框有一个名为“count”的列,但是 pivot_sample$count 不是您指定和使用 ..count.. 时访问的内容。 stat="count" 函数后的结果是 运行.

使用 stat="identity" 时发生了什么?好吧,"identity" 统计数据在 y 轴上绘制了实际值。您指定了 y=count,这意味着 pivot_sample$count 列的值绘制在每个分组和类别中。 geom_barstat="identity" 与使用 geom_col() 相同(在这种情况下应该使用),这将需要定义 xy 美学。在这种情况下,“身份”将导致将 y 美学的所有值相加 - 或 pivot_sample$count.

在您使用 stat="identity" 显示的图中,您看到 count 的值表示为条高度,等于每个条的所有 pivot_sample$count 值的总和。对于数据中的该列,您没有很多值 = 1,所以这就是它看起来像它的样子的原因。

请注意,geom_bar() 使用 stat="count" 计数 观察 ,而 stat="identity" 总计 .