R 中的完整函数
Complete Funcion in R
我想在 R 中完成一个 df,但它错过了一个月的日期,例如,如果我有一年的按月和日期的信息,就像这样。
df = data.frame(Date = c("2020-01-01","2020-02-01",
"2020-03-02","2020-04-01","2020-09-01","2020-10-01",
"2020-11-01","2020-12-01"))
当我使用功能完成时,我是这样使用的
df = df%>%
mutate(Date = as.Date(Date)) %>%
complete(Date= seq.Date("2020-01-01", "2020-12-31", by = "month"))
问题是我的最终 df 完成了所有日期,例如 5 月、6 月、7 月,这没问题,但也完成了 3 月,因为 3 月没有第一天,开始于 2020-03-02。
df = data.frame(Date = c("2020-01-01","2020-02-01",
"2020-03-01","2020-03-02","2020-04-01","2020-05-01",
"2020-06-01","2020-07-01","2020-08-01","2020-09-01",
"2020-10-01","2020-11-01","2020-12-01"))
你知道只有当 df 没有任何日期时如何完成 df 吗?
就我而言,我不想完成三月,因为三月已经有日期了。
非常感谢。
一个可能的解决方案是仅通过 zoo
包中的 yearmon
完成,因此与实际的月几无关。
library(dplyr)
library(zoo) # for as.yearmon
library(tidyr) # for complete
df <- data.frame(Date = c("2020-01-01","2020-02-01",
"2020-03-02","2020-04-01",
"2020-09-01","2020-10-01",
"2020-11-01","2020-12-01"),
id = 1:8)
df
#> Date id
#> 1 2020-01-01 1
#> 2 2020-02-01 2
#> 3 2020-03-02 3
#> 4 2020-04-01 4
#> 5 2020-09-01 5
#> 6 2020-10-01 6
#> 7 2020-11-01 7
#> 8 2020-12-01 8
df %>%
mutate(Date = as.Date(Date),
year_mon = as.yearmon(Date)) %>%
complete(
year_mon = seq.Date(as.Date("2020-01-01"),
as.Date("2020-12-31"),
by = "month") %>% as.yearmon()
)
#> # A tibble: 12 x 3
#> year_mon Date id
#> <yearmon> <date> <int>
#> 1 Jan 2020 2020-01-01 1
#> 2 Feb 2020 2020-02-01 2
#> 3 Mar 2020 2020-03-02 3
#> 4 Apr 2020 2020-04-01 4
#> 5 May 2020 NA NA
#> 6 Jun 2020 NA NA
#> 7 Jul 2020 NA NA
#> 8 Aug 2020 NA NA
#> 9 Sep 2020 2020-09-01 5
#> 10 Oct 2020 2020-10-01 6
#> 11 Nov 2020 2020-11-01 7
#> 12 Dec 2020 2020-12-01 8
由 reprex package (v2.0.0)
于 2021-06-25 创建
您可以从日期中提取年份和月份值,并在其上使用 complete
。
library(dplyr)
library(lubridate)
library(tidyr)
df %>%
mutate(Date = as.Date(Date),
year = year(Date),
month = month(Date)) %>%
complete(year, month = 1:12) %>%
mutate(Date = if_else(is.na(Date),
as.Date(paste(year, month, 1, sep = '-')), Date)) %>%
select(Date)
# Date
# <date>
# 1 2020-01-01
# 2 2020-02-01
# 3 2020-03-02
# 4 2020-04-01
# 5 2020-05-01
# 6 2020-06-01
# 7 2020-07-01
# 8 2020-08-01
# 9 2020-09-01
#10 2020-10-01
#11 2020-11-01
#12 2020-12-01
我想在 R 中完成一个 df,但它错过了一个月的日期,例如,如果我有一年的按月和日期的信息,就像这样。
df = data.frame(Date = c("2020-01-01","2020-02-01",
"2020-03-02","2020-04-01","2020-09-01","2020-10-01",
"2020-11-01","2020-12-01"))
当我使用功能完成时,我是这样使用的
df = df%>%
mutate(Date = as.Date(Date)) %>%
complete(Date= seq.Date("2020-01-01", "2020-12-31", by = "month"))
问题是我的最终 df 完成了所有日期,例如 5 月、6 月、7 月,这没问题,但也完成了 3 月,因为 3 月没有第一天,开始于 2020-03-02。
df = data.frame(Date = c("2020-01-01","2020-02-01",
"2020-03-01","2020-03-02","2020-04-01","2020-05-01",
"2020-06-01","2020-07-01","2020-08-01","2020-09-01",
"2020-10-01","2020-11-01","2020-12-01"))
你知道只有当 df 没有任何日期时如何完成 df 吗?
就我而言,我不想完成三月,因为三月已经有日期了。
非常感谢。
一个可能的解决方案是仅通过 zoo
包中的 yearmon
完成,因此与实际的月几无关。
library(dplyr)
library(zoo) # for as.yearmon
library(tidyr) # for complete
df <- data.frame(Date = c("2020-01-01","2020-02-01",
"2020-03-02","2020-04-01",
"2020-09-01","2020-10-01",
"2020-11-01","2020-12-01"),
id = 1:8)
df
#> Date id
#> 1 2020-01-01 1
#> 2 2020-02-01 2
#> 3 2020-03-02 3
#> 4 2020-04-01 4
#> 5 2020-09-01 5
#> 6 2020-10-01 6
#> 7 2020-11-01 7
#> 8 2020-12-01 8
df %>%
mutate(Date = as.Date(Date),
year_mon = as.yearmon(Date)) %>%
complete(
year_mon = seq.Date(as.Date("2020-01-01"),
as.Date("2020-12-31"),
by = "month") %>% as.yearmon()
)
#> # A tibble: 12 x 3
#> year_mon Date id
#> <yearmon> <date> <int>
#> 1 Jan 2020 2020-01-01 1
#> 2 Feb 2020 2020-02-01 2
#> 3 Mar 2020 2020-03-02 3
#> 4 Apr 2020 2020-04-01 4
#> 5 May 2020 NA NA
#> 6 Jun 2020 NA NA
#> 7 Jul 2020 NA NA
#> 8 Aug 2020 NA NA
#> 9 Sep 2020 2020-09-01 5
#> 10 Oct 2020 2020-10-01 6
#> 11 Nov 2020 2020-11-01 7
#> 12 Dec 2020 2020-12-01 8
由 reprex package (v2.0.0)
于 2021-06-25 创建您可以从日期中提取年份和月份值,并在其上使用 complete
。
library(dplyr)
library(lubridate)
library(tidyr)
df %>%
mutate(Date = as.Date(Date),
year = year(Date),
month = month(Date)) %>%
complete(year, month = 1:12) %>%
mutate(Date = if_else(is.na(Date),
as.Date(paste(year, month, 1, sep = '-')), Date)) %>%
select(Date)
# Date
# <date>
# 1 2020-01-01
# 2 2020-02-01
# 3 2020-03-02
# 4 2020-04-01
# 5 2020-05-01
# 6 2020-06-01
# 7 2020-07-01
# 8 2020-08-01
# 9 2020-09-01
#10 2020-10-01
#11 2020-11-01
#12 2020-12-01