我可以使用什么功能来完成和填充缺失的时间序列观察,避免在系列开始日期之前完成?
What function can I use to complete and fill missing time series observations, avoiding completion before the series start date?
我有按 id 分组的长时间序列数据框。该系列有不同的开始日期,也缺少观察结果。我想通过完成日期和 ID 并用 0 填充它来完成缺失的观察。
我想在这个过程中避免的是,在开始时完成缺失的观察,因为这只是一个指标,时间序列有一个较晚的起点(例如不同的产品发布日期)。
在我的 reprex 中,我使用了 tidyr
中的 complete
。它与我想要的相反。它不是用“2015-01-04”完成 id“A1”,而是用“2015-01-01”完成 id“B1”,在这种情况下不需要。 complete 是否总是创建相同大小的组?也许那是错误的功能。
如何在下面的例子中实现相反的效果?
library(tidyr)
data <- data.frame (id = as.character(c(rep("A1",6),rep("B1",5))),
value = c(seq( 1, 9, length.out = 11)),
date = as.Date(c(c("2015-01-01","2015-01-02","2015-01-03",
"2015-01-05","2015-01-06","2015-01-07"),
c("2015-01-02","2015-01-03","2015-01-05",
"2015-01-06","2015-01-07")
)
)
)
data %>% complete(date, id, fill = list(value = 0))
做矩形最容易表达。
您可以按如下方式重新引入缺失:
data %>%
tidyr::complete(date, id, fill = list(value = 0)) %>%
dplyr::group_by(id) %>%
dplyr::arrange(date) %>% # Ensure it's sorted by date
dplyr::filter(!cumall(value == 0)) %>% # Don't keep zeros that didn't have non-0 rows before
dplyr::ungroup()
您需要提供要明确填写的日期:
data %>%
group_by(id) %>%
complete(date = seq(min(date), max(date), by = 1), fill = list(value = 0))
library(tidyverse)
data <- data.frame(
id = as.character(c(rep("A1", 6), rep("B1", 5))),
value = c(seq(1, 9, length.out = 11)),
date = as.Date(c(
c(
"2015-01-01", "2015-01-02", "2015-01-03",
"2015-01-05", "2015-01-06", "2015-01-07"
),
c(
"2015-01-02", "2015-01-03", "2015-01-05",
"2015-01-06", "2015-01-07"
)
))
)
all_dates <- seq(min(data$date), max(data$date), by = "day") %>% as.character()
# complete all dates for each id
data %>%
as_tibble() %>%
group_by(id) %>%
mutate(date = date %>% as.character() %>% factor(levels = all_dates)) %>%
complete(date, fill = list(value = 0)) %>%
mutate(date = date %>% as.Date())
#> # A tibble: 14 × 3
#> # Groups: id [2]
#> id date value
#> <chr> <date> <dbl>
#> 1 A1 2015-01-01 1
#> 2 A1 2015-01-02 1.8
#> 3 A1 2015-01-03 2.6
#> 4 A1 2015-01-04 0
#> 5 A1 2015-01-05 3.4
#> 6 A1 2015-01-06 4.2
#> 7 A1 2015-01-07 5
#> 8 B1 2015-01-01 0
#> 9 B1 2015-01-02 5.8
#> 10 B1 2015-01-03 6.6
#> 11 B1 2015-01-04 0
#> 12 B1 2015-01-05 7.4
#> 13 B1 2015-01-06 8.2
#> 14 B1 2015-01-07 9
由 reprex package (v2.0.0)
创建于 2022-04-01
这不是很优雅,但很管用。
data.frame(date = rep(dates, length(id)),
id = rep(ids, each = length(dates))) |>
full_join(data) |>
arrange(id, date) |>
group_by(id) |>
filter(!is.na(value) | row_number() > 1) |>
mutate(value = replace_na(value, 0)) |>
ungroup()
我有按 id 分组的长时间序列数据框。该系列有不同的开始日期,也缺少观察结果。我想通过完成日期和 ID 并用 0 填充它来完成缺失的观察。
我想在这个过程中避免的是,在开始时完成缺失的观察,因为这只是一个指标,时间序列有一个较晚的起点(例如不同的产品发布日期)。
在我的 reprex 中,我使用了 tidyr
中的 complete
。它与我想要的相反。它不是用“2015-01-04”完成 id“A1”,而是用“2015-01-01”完成 id“B1”,在这种情况下不需要。 complete 是否总是创建相同大小的组?也许那是错误的功能。
如何在下面的例子中实现相反的效果?
library(tidyr)
data <- data.frame (id = as.character(c(rep("A1",6),rep("B1",5))),
value = c(seq( 1, 9, length.out = 11)),
date = as.Date(c(c("2015-01-01","2015-01-02","2015-01-03",
"2015-01-05","2015-01-06","2015-01-07"),
c("2015-01-02","2015-01-03","2015-01-05",
"2015-01-06","2015-01-07")
)
)
)
data %>% complete(date, id, fill = list(value = 0))
做矩形最容易表达。 您可以按如下方式重新引入缺失:
data %>%
tidyr::complete(date, id, fill = list(value = 0)) %>%
dplyr::group_by(id) %>%
dplyr::arrange(date) %>% # Ensure it's sorted by date
dplyr::filter(!cumall(value == 0)) %>% # Don't keep zeros that didn't have non-0 rows before
dplyr::ungroup()
您需要提供要明确填写的日期:
data %>%
group_by(id) %>%
complete(date = seq(min(date), max(date), by = 1), fill = list(value = 0))
library(tidyverse)
data <- data.frame(
id = as.character(c(rep("A1", 6), rep("B1", 5))),
value = c(seq(1, 9, length.out = 11)),
date = as.Date(c(
c(
"2015-01-01", "2015-01-02", "2015-01-03",
"2015-01-05", "2015-01-06", "2015-01-07"
),
c(
"2015-01-02", "2015-01-03", "2015-01-05",
"2015-01-06", "2015-01-07"
)
))
)
all_dates <- seq(min(data$date), max(data$date), by = "day") %>% as.character()
# complete all dates for each id
data %>%
as_tibble() %>%
group_by(id) %>%
mutate(date = date %>% as.character() %>% factor(levels = all_dates)) %>%
complete(date, fill = list(value = 0)) %>%
mutate(date = date %>% as.Date())
#> # A tibble: 14 × 3
#> # Groups: id [2]
#> id date value
#> <chr> <date> <dbl>
#> 1 A1 2015-01-01 1
#> 2 A1 2015-01-02 1.8
#> 3 A1 2015-01-03 2.6
#> 4 A1 2015-01-04 0
#> 5 A1 2015-01-05 3.4
#> 6 A1 2015-01-06 4.2
#> 7 A1 2015-01-07 5
#> 8 B1 2015-01-01 0
#> 9 B1 2015-01-02 5.8
#> 10 B1 2015-01-03 6.6
#> 11 B1 2015-01-04 0
#> 12 B1 2015-01-05 7.4
#> 13 B1 2015-01-06 8.2
#> 14 B1 2015-01-07 9
由 reprex package (v2.0.0)
创建于 2022-04-01这不是很优雅,但很管用。
data.frame(date = rep(dates, length(id)),
id = rep(ids, each = length(dates))) |>
full_join(data) |>
arrange(id, date) |>
group_by(id) |>
filter(!is.na(value) | row_number() > 1) |>
mutate(value = replace_na(value, 0)) |>
ungroup()