将日期格式化为字符向量后保持顺序

Keep the order after formatting date to character vector

我想绘制数据框中日期的频率图。该图应按年份分面,日期应以 "Apr 01".

格式显示

这是5月的数据

x = as.POSIXct(c("2018-04-01", "2018-04-15", "2018-05-01", "2018-05-15",
      "2019-04-01", "2019-04-15", "2019-05-01", "2019-05-15"))

df = data.frame(date = sample(x,30, replace = TRUE))
df$year <-  format(df$date, "%Y")

如果我用原始日期变量创建多面图,两个图都不匹配,因为整个日期范围都显示在 x 轴上。但是,我想匹配日期和月份信息。

库(ggplot2)

ggplot(df, aes(x=as.Date(date), y = ..count..)) + 
  geom_bar() +
  facet_grid(year ~ ., scales = "free_x") + 
  scale_x_date(date_breaks = "weeks" , date_labels = "%b-%d") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

现在我创建一个字符向量,保留日期和月份信息。这很好,但是日期的格式不漂亮。

df$date_working <- format(df$date, "%m-%d")

ggplot(df, aes(x=date_working, y = ..count..)) + 
  geom_bar() +
  facet_grid(year ~ ., scales = "free_x") +
  labs(title="right order")

因此,我创建了另一个日期变量。然而,问题是,这个变量没有保持正确的顺序。

df$date_appreciated <- format(df$date, "%d %b")

ggplot(df, aes(x=date_appreciated, y = ..count..)) + 
  geom_bar() +
  facet_grid(year ~ ., scales = "free_x") +
  labs(title="wrong order")

有没有人有解决办法。我需要创建 "date_appreciated" 变量,同时保持 "date_working" 变量的顺序。

通过使用 {forcats} 包将 date_working 列格式化为因子变量,您可以轻松实现这一点(此包包含在 {tidyverse} 中。

base::as.factor() 根据基础变量的字母排序自动创建因子水平相反,forcats::as_factor() 默认情况下根据数据的当前排序顺序创建水平。这允许您生成 "nicely formatted" 日期标签,同时保持正确的排序顺序:

# load required libraries
library(tidyverse)

# your original code
x = as.POSIXct(c("2018-04-01", "2018-04-15", "2018-05-01", "2018-05-15",
                 "2019-04-01", "2019-04-15", "2019-05-01", "2019-05-15"))

df = data.frame(date = sample(x,30, replace = TRUE))
df$year <-  format(df$date, "%Y")

# sort df by date using dplyr::arrange %>% create a new column called
# date_working which is equal to the date column, but with"nicer" formatting and
# then convert the column  to factor using forcats::as_factor date factor
# version of date variable that is sorted appropriately using forcats
df <- df %>% 
  arrange(date) %>% 
  mutate(date_working = format(date, "%d %b") %>% forcats::as_factor())

# generate the plot output as before, except now it should be ordered correctly
ggplot(df, aes(x=date_working, y = ..count..)) + 
  geom_bar() +
  facet_grid(year ~ ., scales = "free_x") +
  labs(title="right order")

事实上,如果您愿意,您可以在 ggplot 调用期间创建此格式 "on the fly"。以下代码块应产生与上图相同的图:

df %>% 
  arrange(date) %>% 
  ggplot(aes(x = format(date, "%d %b") %>% forcats::as_factor(), y = ..count..)) + 
  geom_bar() +
  facet_grid(year ~ ., scales = "free_x") +
  labs(title="right order")