如何使ggplot2在数据子集上保留未使用的级别

Question

我的问题显然不是新问题，但我无法找到我确切的编码问题的答案。我正在使用我的数据子集（可用 here），并且一直在尝试 scale_x_discrete(drop=FALSE) 和 scale_fill_discrete(drop=FALSE) 的所有可能组合以尝试让 ggplot2 包含 space 条形代表花栗鼠（n=0 事件 "CF" - n.b. 这对应于变量 "forage"数据）。

我使用的代码如下：

require(ggplot2)
library(ggthemes)

#excluding MICROs from my plot
ggplot(data[data$sps=="MAMO" | data$sps=="TAST" | data$sps=="MUVI"|    data$sps=="MUXX" | data$sps=="TAHU",], 
      aes(sps, fill=forage))+geom_bar(position="dodge") +
    labs(x = "Species", y = "Number of observations") +
    scale_x_discrete(labels = c("Marmot","American Mink", "Weasel Spp.", "Red squirrel", "Chipmunk")) +
    theme_classic() + 
    scale_fill_manual(values = c("#000000", "#666666", "#999999","#CCCCCC"), name = "Event")

然后我得到一个这样的情节：

当我添加 scale_x_discrete(drop = FALSE) 我得到这个：代码似乎在做的是包括我之前排除的 MICRO 数据（因此在 Marmots 和 Chipmunks 仍然只有 3 个条之后，所有内容都转移了 1）。

当我尝试 scale_fill_discrete(drop = FALSE) 时，结果图与第一个图完全没有变化。当我同时尝试 scale_x_discrete(drop = FALSE) 和 scale_fill_discrete(drop = FALSE) 时，情节看起来像第二个情节。

我想我可以手动去制作一个小 table 每个级别（事件）的频率，但我想先尝试在 R 中正确编码。

有没有人对我可以 add/change 在我的代码中执行此操作有什么建议？

更新： 我尝试了下面建议的代码：

df1 %>% 
  filter(sps != "MICRO") %>% 
  group_by(sps) %>% 
  count(forage) %>% 
  ungroup %>% 
  complete(sps, forage, fill = list(n = 0)) %>% 
ggplot(aes(sps, n)) + geom_col(aes(fill = forage), position = "dodge") +
  scale_x_discrete(labels=c("Marmot","American Mink", "Weasel Spp.", "Red squirrel", "Chipmunk")) + 
  theme_classic() + 
  scale_fill_manual(values=c("#000000", "#666666", "#999999","#CCCCCC"), name = "Event") + 
  labs(x = "Species", y = "Number of observations")

结果图有 space（是的！）但仍然有一个空的 space 表示 MICRO 的位置：

Answer 1

这里的问题是没有为 sps = TAST、forage = CF 生成零计数。您可以使用 tidyr::complete 创建该计数。我还添加了一些 dplyr 函数以使代码更清晰。假设您的数据框被命名为 df1（与 data 相反，这是一个基本函数名称，因此不是一个好的选择）：

更新：stringsAsFactors = FALSE 解决评论中的问题。

library(dplyr)
library(tidyr)
library(ggplot2)

df1 <- read.table("data.txt", header = TRUE, stringsAsFactors = FALSE)
df1 %>% 
  filter(sps != "MICRO") %>% 
  group_by(sps) %>% 
  count(forage) %>% 
  ungroup %>% 
  complete(sps, forage, fill = list(n = 0)) %>% 
  ggplot(aes(sps, n)) + geom_col(aes(fill = forage), position = "dodge") +
    scale_x_discrete(labels=c("Marmot","American Mink", "Weasel Spp.", "Red squirrel", "Chipmunk")) + 
    theme_classic() + 
    scale_fill_manual(values=c("#000000", "#666666", "#999999","#CCCCCC"), name = "Event") + 
    labs(x = "Species", y = "Number of observations")

结果：

如何使ggplot2在数据子集上保留未使用的级别

How to make ggplot2 keep unused levels on data subset

plot

r

levels

bar-chart

ggplot2