将日期格式化为字符向量后保持顺序
Keep the order after formatting date to character vector
我想绘制数据框中日期的频率图。该图应按年份分面,日期应以 "Apr 01".
格式显示
这是5月的数据
x = as.POSIXct(c("2018-04-01", "2018-04-15", "2018-05-01", "2018-05-15",
"2019-04-01", "2019-04-15", "2019-05-01", "2019-05-15"))
df = data.frame(date = sample(x,30, replace = TRUE))
df$year <- format(df$date, "%Y")
如果我用原始日期变量创建多面图,两个图都不匹配,因为整个日期范围都显示在 x 轴上。但是,我想匹配日期和月份信息。
库(ggplot2)
ggplot(df, aes(x=as.Date(date), y = ..count..)) +
geom_bar() +
facet_grid(year ~ ., scales = "free_x") +
scale_x_date(date_breaks = "weeks" , date_labels = "%b-%d") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
现在我创建一个字符向量,保留日期和月份信息。这很好,但是日期的格式不漂亮。
df$date_working <- format(df$date, "%m-%d")
ggplot(df, aes(x=date_working, y = ..count..)) +
geom_bar() +
facet_grid(year ~ ., scales = "free_x") +
labs(title="right order")
因此,我创建了另一个日期变量。然而,问题是,这个变量没有保持正确的顺序。
df$date_appreciated <- format(df$date, "%d %b")
ggplot(df, aes(x=date_appreciated, y = ..count..)) +
geom_bar() +
facet_grid(year ~ ., scales = "free_x") +
labs(title="wrong order")
有没有人有解决办法。我需要创建 "date_appreciated" 变量,同时保持 "date_working" 变量的顺序。
通过使用 {forcats}
包将 date_working
列格式化为因子变量,您可以轻松实现这一点(此包包含在 {tidyverse}
中。
与 base::as.factor()
根据基础变量的字母排序自动创建因子水平相反,forcats::as_factor()
默认情况下根据数据的当前排序顺序创建水平。这允许您生成 "nicely formatted" 日期标签,同时保持正确的排序顺序:
# load required libraries
library(tidyverse)
# your original code
x = as.POSIXct(c("2018-04-01", "2018-04-15", "2018-05-01", "2018-05-15",
"2019-04-01", "2019-04-15", "2019-05-01", "2019-05-15"))
df = data.frame(date = sample(x,30, replace = TRUE))
df$year <- format(df$date, "%Y")
# sort df by date using dplyr::arrange %>% create a new column called
# date_working which is equal to the date column, but with"nicer" formatting and
# then convert the column to factor using forcats::as_factor date factor
# version of date variable that is sorted appropriately using forcats
df <- df %>%
arrange(date) %>%
mutate(date_working = format(date, "%d %b") %>% forcats::as_factor())
# generate the plot output as before, except now it should be ordered correctly
ggplot(df, aes(x=date_working, y = ..count..)) +
geom_bar() +
facet_grid(year ~ ., scales = "free_x") +
labs(title="right order")
事实上,如果您愿意,您可以在 ggplot 调用期间创建此格式 "on the fly"。以下代码块应产生与上图相同的图:
df %>%
arrange(date) %>%
ggplot(aes(x = format(date, "%d %b") %>% forcats::as_factor(), y = ..count..)) +
geom_bar() +
facet_grid(year ~ ., scales = "free_x") +
labs(title="right order")
我想绘制数据框中日期的频率图。该图应按年份分面,日期应以 "Apr 01".
格式显示这是5月的数据
x = as.POSIXct(c("2018-04-01", "2018-04-15", "2018-05-01", "2018-05-15",
"2019-04-01", "2019-04-15", "2019-05-01", "2019-05-15"))
df = data.frame(date = sample(x,30, replace = TRUE))
df$year <- format(df$date, "%Y")
如果我用原始日期变量创建多面图,两个图都不匹配,因为整个日期范围都显示在 x 轴上。但是,我想匹配日期和月份信息。
库(ggplot2)
ggplot(df, aes(x=as.Date(date), y = ..count..)) +
geom_bar() +
facet_grid(year ~ ., scales = "free_x") +
scale_x_date(date_breaks = "weeks" , date_labels = "%b-%d") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
现在我创建一个字符向量,保留日期和月份信息。这很好,但是日期的格式不漂亮。
df$date_working <- format(df$date, "%m-%d")
ggplot(df, aes(x=date_working, y = ..count..)) +
geom_bar() +
facet_grid(year ~ ., scales = "free_x") +
labs(title="right order")
因此,我创建了另一个日期变量。然而,问题是,这个变量没有保持正确的顺序。
df$date_appreciated <- format(df$date, "%d %b")
ggplot(df, aes(x=date_appreciated, y = ..count..)) +
geom_bar() +
facet_grid(year ~ ., scales = "free_x") +
labs(title="wrong order")
有没有人有解决办法。我需要创建 "date_appreciated" 变量,同时保持 "date_working" 变量的顺序。
通过使用 {forcats}
包将 date_working
列格式化为因子变量,您可以轻松实现这一点(此包包含在 {tidyverse}
中。
与 base::as.factor()
根据基础变量的字母排序自动创建因子水平相反,forcats::as_factor()
默认情况下根据数据的当前排序顺序创建水平。这允许您生成 "nicely formatted" 日期标签,同时保持正确的排序顺序:
# load required libraries
library(tidyverse)
# your original code
x = as.POSIXct(c("2018-04-01", "2018-04-15", "2018-05-01", "2018-05-15",
"2019-04-01", "2019-04-15", "2019-05-01", "2019-05-15"))
df = data.frame(date = sample(x,30, replace = TRUE))
df$year <- format(df$date, "%Y")
# sort df by date using dplyr::arrange %>% create a new column called
# date_working which is equal to the date column, but with"nicer" formatting and
# then convert the column to factor using forcats::as_factor date factor
# version of date variable that is sorted appropriately using forcats
df <- df %>%
arrange(date) %>%
mutate(date_working = format(date, "%d %b") %>% forcats::as_factor())
# generate the plot output as before, except now it should be ordered correctly
ggplot(df, aes(x=date_working, y = ..count..)) +
geom_bar() +
facet_grid(year ~ ., scales = "free_x") +
labs(title="right order")
事实上,如果您愿意,您可以在 ggplot 调用期间创建此格式 "on the fly"。以下代码块应产生与上图相同的图:
df %>%
arrange(date) %>%
ggplot(aes(x = format(date, "%d %b") %>% forcats::as_factor(), y = ..count..)) +
geom_bar() +
facet_grid(year ~ ., scales = "free_x") +
labs(title="right order")