为什么堆积条形图与 table 中的值不匹配？

Question

我有一个数据集，其中包含城市和这些城市中特定物品的价格（例如出租车、饮料、晚餐等）- 可以在此处找到数据集：https://data.world/makeovermonday/2018w48

我计算了派对之夜和约会之夜的总费用：

    CostNightPrepared <- CostNight  %>%
  group_by(City, Category) %>%
  mutate(TotalCost = sum(Cost, na.rm = TRUE))%>%
  arrange(desc(Category), TotalCost)

绘制出来：

Visual <- ggplot(CostNightPrepared, aes(TotalCost, fct_rev(fct_reorder(City, TotalCost)), fill=Category)) + 
geom_col(position = "stack") +
geom_text(aes(label = round(TotalCost, 1)), position = position_dodge(1))

它给了我以下输出：

如果您注意到，例如，最后一个城市苏黎世的“派对之夜”的值为 179，但是，该列在 x 轴上达到 800 左右 ！所有其他列也是如此 - 它们与“约会之夜”和“派对之夜”的值不匹配。这里有什么问题？

如果我执行相同的代码，但对 geom_col() 使用 position = dodge，则它有效：

Visual <- ggplot(CostNightPrepared, aes(TotalCost, fct_rev(fct_reorder(City, TotalCost)), fill=Category)) + 
  geom_col(position = "dodge") +
  geom_text(aes(label = round(TotalCost, 1)), position = position_dodge(1))

这是输出：

如您所见，这些值与其在 x 轴上的相应列大小（长度）相匹配。

那么，为什么在使用 position = "dodge" 时，我的列与数据集中的实际值不匹配，并且在 x 轴上具有任意值?

Answer 1

我想你想要 summarize 而不是 mutate。通过使用 mutate，你得到了每一行的总计 City/Category，然后将这些行中的每一行输入到 ggplot2 中。您确实需要每个 City/Category 组合一行，这就是 summarize 产生的结果。

可重现的例子：

mtcars %>%
head() %>%
  group_by(carb, gear) %>%
  mutate(total_wt = sum(wt)) %>%
  ungroup() -> mtcars_summary
    
#mtcars_summary
## A tibble: 6 x 12
#    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb total_wt
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
#1  21       6   160   110  3.9   2.62  16.5     0     1     4     4     5.50
#2  21       6   160   110  3.9   2.88  17.0     0     1     4     4     5.50
#3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1     2.32
#4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1     6.68
#5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2     3.44
#6  18.1     6   225   105  2.76  3.46  20.2     1     0     3     1     6.68

请注意，上面 mutate 给出了每行其组的总权重。 ggplot2::geom_col 然后将堆叠它收到的所有行，导致比您想要的更长的条。（另一个提示是文本看起来“过度绘制”——这是因为每个字符在组中的每一行都被打印一次——也就是说，你可能有十个相同文本的副本在彼此之上，导致糟糕的抗-混叠外观。)

ggplot(mtcars_summary, aes(total_wt, 
                           carb %>% as_factor %>% fct_reorder(total_wt), 
                           fill = as.character(gear))) +
  geom_col(position = "stack") +
  geom_text(aes(label = round(total_wt, 1)), position = position_dodge(1))

如果我们将 mutate 替换为 summarize，我们会得到更多您所期望的，其中进入条形的输入不会针对原始数据中的每个元素重复：

#mtcars_summary
## A tibble: 4 x 3
#   carb  gear total_wt
#  <dbl> <dbl>    <dbl>
#1     1     3     6.68
#2     1     4     2.32
#3     2     3     3.44
#4     4     4     5.50

为什么堆积条形图与 table 中的值不匹配？

Why does the stacked bar chart not match values in table?

r

bar-chart

ggplot2

stacked-chart

tidyverse