如何使用dplyr填写日期?
How to fill in date using dplyr?
我有这样的数据,我需要为 DATE
填写缺失的 NA 值。
ID DAY TIME DATE
<dbl> <dbl> <dbl> <date>
1 1 1 1 NA
2 1 1 2 NA
3 1 1 3 NA
4 1 1 4 NA
5 1 1 5 NA
6 1 2 1 2021-09-25
7 1 2 2 2021-09-25
8 1 2 3 2021-09-25
9 1 2 4 2021-09-25
10 1 2 5 2021-09-25
11 1 3 1 NA
12 1 3 2 NA
13 1 3 3 NA
14 1 3 4 NA
15 1 3 5 NA
16 2 1 1 2022-02-26
17 2 1 2 2022-02-26
18 2 1 3 2022-02-26
19 2 1 4 2022-02-26
20 2 1 5 2022-02-26
21 2 2 1 NA
22 2 2 2 2022-02-27
23 2 2 3 2022-02-27
24 2 2 4 2022-02-27
25 2 2 5 2022-02-27
与DAY
对应的DATE
值对于每个ID
都是不同的。最终数据集应如下所示:
# A tibble: 25 × 4
ID DAY TIME DATE
<dbl> <dbl> <dbl> <chr>
1 1 1 1 2021-09-24
2 1 1 2 2021-09-24
3 1 1 3 2021-09-24
4 1 1 4 2021-09-24
5 1 1 5 2021-09-24
6 1 2 1 2021-09-25
7 1 2 2 2021-09-25
8 1 2 3 2021-09-25
9 1 2 4 2021-09-25
10 1 2 5 2021-09-25
11 1 3 1 2021-09-26
12 1 3 2 2021-09-26
13 1 3 3 2021-09-26
14 1 3 4 2021-09-26
15 1 3 5 2021-09-26
16 2 1 1 2022-02-26
17 2 1 2 2022-02-26
18 2 1 3 2022-02-26
19 2 1 4 2022-02-26
20 2 1 5 2022-02-26
21 2 2 1 2022-02-27
22 2 2 2 2022-02-27
23 2 2 3 2022-02-27
24 2 2 4 2022-02-27
25 2 2 5 2022-02-27
一种方法是根据已知日期确定一个“零”日期,并在给定的 ID
中使用具有共同参考日期的 fill
。然后,可以用参考日期加上DAY
来确定最终的DATE
.
library(tidyverse)
df %>%
mutate(DATE = as.Date(DATE),
DATE0 = DATE - DAY) %>%
group_by(ID) %>%
fill(DATE0, .direction = "updown") %>%
mutate(DATE = DATE0 + DAY) %>%
select(-DATE0)
输出
ID DAY TIME DATE
<int> <int> <int> <date>
1 1 1 1 2021-09-24
2 1 1 2 2021-09-24
3 1 1 3 2021-09-24
4 1 1 4 2021-09-24
5 1 1 5 2021-09-24
6 1 2 1 2021-09-25
7 1 2 2 2021-09-25
8 1 2 3 2021-09-25
9 1 2 4 2021-09-25
10 1 2 5 2021-09-25
11 1 3 1 2021-09-26
12 1 3 2 2021-09-26
13 1 3 3 2021-09-26
14 1 3 4 2021-09-26
15 1 3 5 2021-09-26
16 2 1 1 2022-02-26
17 2 1 2 2022-02-26
18 2 1 3 2022-02-26
19 2 1 4 2022-02-26
20 2 1 5 2022-02-26
21 2 2 1 2022-02-27
22 2 2 2 2022-02-27
23 2 2 3 2022-02-27
24 2 2 4 2022-02-27
25 2 2 5 2022-02-27
数据
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
DAY = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), TIME = c(1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), DATE = c(NA, NA, NA,
NA, NA, "2021-09-25", "2021-09-25", "2021-09-25", "2021-09-25",
"2021-09-25", NA, NA, NA, NA, NA, "2022-02-26", "2022-02-26",
"2022-02-26", "2022-02-26", "2022-02-26", NA, "2022-02-27",
"2022-02-27", "2022-02-27", "2022-02-27")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25"))
lm
.
有一个大胆的方法
dat$dateFilled <- as.Date(predict(lm(DATE~DAY+ID, dat), newdata = dat), origin = "1970-01-01")
> dat
ID DAY TIME DATE dateFilled
1 1 1 1 <NA> 2021-09-24
2 1 1 2 <NA> 2021-09-24
3 1 1 3 <NA> 2021-09-24
4 1 1 4 <NA> 2021-09-24
5 1 1 5 <NA> 2021-09-24
6 1 2 1 2021-09-25 2021-09-25
7 1 2 2 2021-09-25 2021-09-25
8 1 2 3 2021-09-25 2021-09-25
9 1 2 4 2021-09-25 2021-09-25
10 1 2 5 2021-09-25 2021-09-25
11 1 3 1 <NA> 2021-09-26
12 1 3 2 <NA> 2021-09-26
13 1 3 3 <NA> 2021-09-26
14 1 3 4 <NA> 2021-09-26
15 1 3 5 <NA> 2021-09-26
16 2 1 1 2022-02-26 2022-02-26
17 2 1 2 2022-02-26 2022-02-26
18 2 1 3 2022-02-26 2022-02-26
19 2 1 4 2022-02-26 2022-02-26
20 2 1 5 2022-02-26 2022-02-26
21 2 2 1 <NA> 2022-02-27
22 2 2 2 2022-02-27 2022-02-27
23 2 2 3 2022-02-27 2022-02-27
24 2 2 4 2022-02-27 2022-02-27
25 2 2 5 2022-02-27 2022-02-27
我有这样的数据,我需要为 DATE
填写缺失的 NA 值。
ID DAY TIME DATE
<dbl> <dbl> <dbl> <date>
1 1 1 1 NA
2 1 1 2 NA
3 1 1 3 NA
4 1 1 4 NA
5 1 1 5 NA
6 1 2 1 2021-09-25
7 1 2 2 2021-09-25
8 1 2 3 2021-09-25
9 1 2 4 2021-09-25
10 1 2 5 2021-09-25
11 1 3 1 NA
12 1 3 2 NA
13 1 3 3 NA
14 1 3 4 NA
15 1 3 5 NA
16 2 1 1 2022-02-26
17 2 1 2 2022-02-26
18 2 1 3 2022-02-26
19 2 1 4 2022-02-26
20 2 1 5 2022-02-26
21 2 2 1 NA
22 2 2 2 2022-02-27
23 2 2 3 2022-02-27
24 2 2 4 2022-02-27
25 2 2 5 2022-02-27
与DAY
对应的DATE
值对于每个ID
都是不同的。最终数据集应如下所示:
# A tibble: 25 × 4
ID DAY TIME DATE
<dbl> <dbl> <dbl> <chr>
1 1 1 1 2021-09-24
2 1 1 2 2021-09-24
3 1 1 3 2021-09-24
4 1 1 4 2021-09-24
5 1 1 5 2021-09-24
6 1 2 1 2021-09-25
7 1 2 2 2021-09-25
8 1 2 3 2021-09-25
9 1 2 4 2021-09-25
10 1 2 5 2021-09-25
11 1 3 1 2021-09-26
12 1 3 2 2021-09-26
13 1 3 3 2021-09-26
14 1 3 4 2021-09-26
15 1 3 5 2021-09-26
16 2 1 1 2022-02-26
17 2 1 2 2022-02-26
18 2 1 3 2022-02-26
19 2 1 4 2022-02-26
20 2 1 5 2022-02-26
21 2 2 1 2022-02-27
22 2 2 2 2022-02-27
23 2 2 3 2022-02-27
24 2 2 4 2022-02-27
25 2 2 5 2022-02-27
一种方法是根据已知日期确定一个“零”日期,并在给定的 ID
中使用具有共同参考日期的 fill
。然后,可以用参考日期加上DAY
来确定最终的DATE
.
library(tidyverse)
df %>%
mutate(DATE = as.Date(DATE),
DATE0 = DATE - DAY) %>%
group_by(ID) %>%
fill(DATE0, .direction = "updown") %>%
mutate(DATE = DATE0 + DAY) %>%
select(-DATE0)
输出
ID DAY TIME DATE
<int> <int> <int> <date>
1 1 1 1 2021-09-24
2 1 1 2 2021-09-24
3 1 1 3 2021-09-24
4 1 1 4 2021-09-24
5 1 1 5 2021-09-24
6 1 2 1 2021-09-25
7 1 2 2 2021-09-25
8 1 2 3 2021-09-25
9 1 2 4 2021-09-25
10 1 2 5 2021-09-25
11 1 3 1 2021-09-26
12 1 3 2 2021-09-26
13 1 3 3 2021-09-26
14 1 3 4 2021-09-26
15 1 3 5 2021-09-26
16 2 1 1 2022-02-26
17 2 1 2 2022-02-26
18 2 1 3 2022-02-26
19 2 1 4 2022-02-26
20 2 1 5 2022-02-26
21 2 2 1 2022-02-27
22 2 2 2 2022-02-27
23 2 2 3 2022-02-27
24 2 2 4 2022-02-27
25 2 2 5 2022-02-27
数据
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
DAY = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), TIME = c(1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), DATE = c(NA, NA, NA,
NA, NA, "2021-09-25", "2021-09-25", "2021-09-25", "2021-09-25",
"2021-09-25", NA, NA, NA, NA, NA, "2022-02-26", "2022-02-26",
"2022-02-26", "2022-02-26", "2022-02-26", NA, "2022-02-27",
"2022-02-27", "2022-02-27", "2022-02-27")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25"))
lm
.
dat$dateFilled <- as.Date(predict(lm(DATE~DAY+ID, dat), newdata = dat), origin = "1970-01-01")
> dat
ID DAY TIME DATE dateFilled
1 1 1 1 <NA> 2021-09-24
2 1 1 2 <NA> 2021-09-24
3 1 1 3 <NA> 2021-09-24
4 1 1 4 <NA> 2021-09-24
5 1 1 5 <NA> 2021-09-24
6 1 2 1 2021-09-25 2021-09-25
7 1 2 2 2021-09-25 2021-09-25
8 1 2 3 2021-09-25 2021-09-25
9 1 2 4 2021-09-25 2021-09-25
10 1 2 5 2021-09-25 2021-09-25
11 1 3 1 <NA> 2021-09-26
12 1 3 2 <NA> 2021-09-26
13 1 3 3 <NA> 2021-09-26
14 1 3 4 <NA> 2021-09-26
15 1 3 5 <NA> 2021-09-26
16 2 1 1 2022-02-26 2022-02-26
17 2 1 2 2022-02-26 2022-02-26
18 2 1 3 2022-02-26 2022-02-26
19 2 1 4 2022-02-26 2022-02-26
20 2 1 5 2022-02-26 2022-02-26
21 2 2 1 <NA> 2022-02-27
22 2 2 2 2022-02-27 2022-02-27
23 2 2 3 2022-02-27 2022-02-27
24 2 2 4 2022-02-27 2022-02-27
25 2 2 5 2022-02-27 2022-02-27