添加未使用的因子水平

Question

我有一个无法解决的简单问题：我想用因子绘制 data.frame（一个月），有时会缺少水平。 R 仅对现有级别赋予属性，因此如果存在一个、两个或更多级别，我的绘图会有所不同。

举个例子：

    library(ggplot2)
    library(reshape2)

f             <- factor(c("Free", "Work"))
mon           <- as.data.frame(matrix(as.factor(rep(f[2], times = 8)), nrow = 4)) 
colnames(mon) <- c("A", "B")

mt    <- t(as.matrix(rev(data.frame(as.matrix(mon))))) #  change order of y
m     <- melt(mt)

col   <- c("azure",  "orange")

ggplot(m, aes(x = Var2, y = Var1, fill = value)) +
  geom_tile(colour="grey10") +
  scale_fill_manual(values = col, labels = f, name = NULL) +
  theme(panel.background = element_rect(fill = "white"), axis.ticks = element_blank()) +
  theme(axis.title.x = element_blank(), axis.title.y = element_blank())

正如你所看到的，我将 2 个因素的第二个元素 "Work" 归因于元素，但它绘制了 "Free"。令人不安的是，mon 的因子只有 1 个水平，而不是 2 个可能的水平。如果我将几个级别归因于 mon:

，它会给出另一个图

mon   <- as.data.frame(matrix(as.factor(rep(c(f[1], f[2]), times = 4)), nrow = 4))

.. 并重新运行上面的情节。也不可能分配另一个级别，即使它是从最初的 2 个级别中选择的：

mon[1,1] <- f[1]

我尝试了很多 levels、relevel、order 等，但都没有成功。有人有想法吗？

Answer 1

矩阵不能包含因子。当您将 factor 放入 matrix 时，它会被强制转换为 character，并且未使用的级别将丢失。 as.data.frame(matrix(...))) 是这个（以及其他 class 转换）原因的坏习惯。

这是一种在不丢失因子水平的情况下尽可能接近地复制数据转换的方法：

f <- factor(c("Free", "Work"))
x= rep(f[2], 4)
mon <- data.frame(A = x, B = x)
str(mon)
# 'data.frame': 4 obs. of  2 variables:
#  $ A: Factor w/ 2 levels "Free","Work": 2 2 2 2
#  $ B: Factor w/ 2 levels "Free","Work": 2 2 2 2
## looks good

# What is y? What's the point?
#mt    <- t(as.matrix(rev(data.frame(as.matrix(mon))))) #  change order of y

mon$id = 1:nrow(mon)
m     <- reshape2::melt(mon, id.vars = "id", factorsAsStrings = FALSE)

levels(m$value)
# [1] "Free" "Work"
## looks good

现在，当我们开始绘图时，在比例尺中指定 drop = FALSE 以在图例中包含未使用的级别。（如果您不想显示未使用的级别，请使用默认值 drop = TRUE。）由于级别已经存在，我们不需要自定义 labels。

col   <- c("azure",  "orange")

ggplot(m, aes(x = id, y = variable, fill = value)) +
  geom_tile(colour="grey10") +
  scale_fill_manual(values = col, name = NULL, drop = FALSE) +
  theme(panel.background = element_rect(fill = "white"), axis.ticks = element_blank()) +
  theme(axis.title.x = element_blank(), axis.title.y = element_blank())

如果您想使用色标更加安全，可以在将其放入色标之前将 names 添加到 values 向量：

names(col) = levels(f)

另一种获取数据的方法是在转换过程中不用担心级别，并在最后用适当的级别重构：

# your original code:
f             <- factor(c("Free", "Work"))
mon           <- as.data.frame(matrix(as.factor(rep(f[2], times = 8)), nrow = 4)) 
colnames(mon) <- c("A", "B")

mt    <- t(as.matrix(rev(data.frame(as.matrix(mon))))) #  change order of y
m     <- melt(mt)

# add this at the end
m$value = factor(m$value, levels = levels(f))

# check that it looks good:
str(m$value)
# Factor w/ 2 levels "Free","Work": 2 2 2 2 2 2 2 2

添加未使用的因子水平

add factor levels that are not in use

r

ggplot2

r-factor