dplyr::complete/fill 一个时间序列,但只适用于有限的时间段
dplyr::complete/fill a time sequence, but only for limited stretches of time
我正在尝试使用 dplyr::complete
和 fill
来填补动物体重时间序列中的空白(大部分时间大约每周称重),但我只想在一定范围内做到这一点。
在以下示例数据集中,缺少几个日期:2020 年 1 月 29 日的一次称重和 March/April 中的一系列 4 周缺失。我们可以接受缺少 1 周的称重(例如在 1/29)并且可以 "filling" 降低两周的原始体重,但不想再进一步了。第二组缺失数据应该只需要再补13天,然后剩下的缺口应该是wt_g.
的NA
library(tidyverse)
library(lubridate)
animalwts <- tibble::tribble(
~Animal, ~WtDate, ~Wt_g,
"A", "1/1/2020", 20L,
"A", "1/8/2020", 21L,
"A", "1/15/2020", 21L,
"A", "1/22/2020", 23L,
"A", "2/5/2020", 25L,
"A", "2/12/2020", 23L,
"A", "2/19/2020", 24L,
"A", "2/26/2020", 23L,
"A", "3/4/2020", 22L,
"A", "4/8/2020", 24L
) %>%
mutate(WtDate = mdy(WtDate))
以下代码用于完成日期系列并填写所有缺失数据
animalwts %>%
group_by(Animal) %>%
complete(WtDate = seq.Date(min(WtDate), max(WtDate), by = "day")) %>%
fill(Wt_g)
但我正在尝试弄清楚如何 complete
所有日期,但仅 fill
从任何给定日期起最多两周的权重,并为任何进一步缺失的数据添加 NA .
如果可能,我想留下 "in the pipe"。
像这样?
library(tidyverse)
library(lubridate)
animalwts %>%
group_by(Animal) %>%
mutate(NA_lag = WtDate - lag(WtDate),
last_measurement_date = WtDate) %>%
complete(WtDate = seq.Date(min(WtDate), max(WtDate), by = "day")) %>%
fill(Wt_g) %>%
fill(last_measurement_date) %>%
group_by(last_measurement_date, NA_lag) %>%
mutate(days_missing = row_number()) %>%
mutate(Wt_g = if_else(days_missing > 14, NA_integer_, Wt_g))
数据
animalwts <- tibble::tribble(
~Animal, ~WtDate, ~Wt_g,
"A", "1/1/2020", 20L,
"A", "1/8/2020", 21L,
"A", "1/15/2020", 21L,
"A", "1/22/2020", 23L,
"A", "2/5/2020", 25L,
"A", "2/12/2020", 23L,
"A", "2/19/2020", 24L,
"A", "2/26/2020", 23L,
"A", "3/4/2020", 22L,
"A", "4/8/2020", 24L
) %>%
mutate(WtDate = mdy(WtDate))
我正在尝试使用 dplyr::complete
和 fill
来填补动物体重时间序列中的空白(大部分时间大约每周称重),但我只想在一定范围内做到这一点。
在以下示例数据集中,缺少几个日期:2020 年 1 月 29 日的一次称重和 March/April 中的一系列 4 周缺失。我们可以接受缺少 1 周的称重(例如在 1/29)并且可以 "filling" 降低两周的原始体重,但不想再进一步了。第二组缺失数据应该只需要再补13天,然后剩下的缺口应该是wt_g.
的NAlibrary(tidyverse)
library(lubridate)
animalwts <- tibble::tribble(
~Animal, ~WtDate, ~Wt_g,
"A", "1/1/2020", 20L,
"A", "1/8/2020", 21L,
"A", "1/15/2020", 21L,
"A", "1/22/2020", 23L,
"A", "2/5/2020", 25L,
"A", "2/12/2020", 23L,
"A", "2/19/2020", 24L,
"A", "2/26/2020", 23L,
"A", "3/4/2020", 22L,
"A", "4/8/2020", 24L
) %>%
mutate(WtDate = mdy(WtDate))
以下代码用于完成日期系列并填写所有缺失数据
animalwts %>%
group_by(Animal) %>%
complete(WtDate = seq.Date(min(WtDate), max(WtDate), by = "day")) %>%
fill(Wt_g)
但我正在尝试弄清楚如何 complete
所有日期,但仅 fill
从任何给定日期起最多两周的权重,并为任何进一步缺失的数据添加 NA .
如果可能,我想留下 "in the pipe"。
像这样?
library(tidyverse)
library(lubridate)
animalwts %>%
group_by(Animal) %>%
mutate(NA_lag = WtDate - lag(WtDate),
last_measurement_date = WtDate) %>%
complete(WtDate = seq.Date(min(WtDate), max(WtDate), by = "day")) %>%
fill(Wt_g) %>%
fill(last_measurement_date) %>%
group_by(last_measurement_date, NA_lag) %>%
mutate(days_missing = row_number()) %>%
mutate(Wt_g = if_else(days_missing > 14, NA_integer_, Wt_g))
数据
animalwts <- tibble::tribble(
~Animal, ~WtDate, ~Wt_g,
"A", "1/1/2020", 20L,
"A", "1/8/2020", 21L,
"A", "1/15/2020", 21L,
"A", "1/22/2020", 23L,
"A", "2/5/2020", 25L,
"A", "2/12/2020", 23L,
"A", "2/19/2020", 24L,
"A", "2/26/2020", 23L,
"A", "3/4/2020", 22L,
"A", "4/8/2020", 24L
) %>%
mutate(WtDate = mdy(WtDate))