使用已知信息按顺序执行计算并在 R 中按组填充未来值
Use known information to sequentially perform calculations and populate future values by group in R
我有这样的数据:
intial<-tibble(start_date=rep(seq.Date(as.Date("2021-06-01"),as.Date("2021-10-01"),by="months"),4),end_date=rep(seq.Date(as.Date("2021-07-01"),as.Date("2021-11-01"),by="months"),4),id=rep(c(rep(1,5),rep(2,5)),2),group=c(rep("a",10),rep("b",10)),increase=c(c(4:8),c(5:9),c(6:10),c(7:11)),decrease=c(c(1:5),c(2:6),c(3:7),c(4:8)),start_count=c(c(10,13,16,19,NA),c(15,18,21,24,NA),c(20,23,26,29,NA),c(20,23,26,29,NA)),end_count=c(c(13,16,19,NA,NA),c(18,21,24,NA,NA),c(23,26,29,NA,NA),c(23,26,29,NA,NA)))
print(initial)
start_date end_date id group increase decrease start_count end_count
<date> <date> <dbl> <chr> <int> <int> <dbl> <dbl>
1 2021-06-01 2021-07-01 1 a 4 1 10 13
2 2021-07-01 2021-08-01 1 a 5 2 13 16
3 2021-08-01 2021-09-01 1 a 6 3 16 19
4 2021-09-01 2021-10-01 1 a 7 4 19 NA
5 2021-10-01 2021-11-01 1 a 8 5 NA NA
6 2021-06-01 2021-07-01 2 a 5 2 15 18
7 2021-07-01 2021-08-01 2 a 6 3 18 21
8 2021-08-01 2021-09-01 2 a 7 4 21 24
9 2021-09-01 2021-10-01 2 a 8 5 24 NA
10 2021-10-01 2021-11-01 2 a 9 6 NA NA
11 2021-06-01 2021-07-01 1 b 6 3 20 23
12 2021-07-01 2021-08-01 1 b 7 4 23 26
13 2021-08-01 2021-09-01 1 b 8 5 26 29
14 2021-09-01 2021-10-01 1 b 9 6 29 NA
15 2021-10-01 2021-11-01 1 b 10 7 NA NA
16 2021-06-01 2021-07-01 2 b 7 4 20 23
17 2021-07-01 2021-08-01 2 b 8 5 23 26
18 2021-08-01 2021-09-01 2 b 9 6 26 29
19 2021-09-01 2021-10-01 2 b 10 7 29 NA
20 2021-10-01 2021-11-01 2 b 11 8 NA NA
其中 id
和 group
的每个唯一组合都包含一个日期索引。我需要使用 increase
和 decrease
字段中的值来计算并在每个 id
和 [=15] 中的每个日期按顺序填充 start_count
和 end_count
=] 组合.
end_count
的计算公式为:start_count
+ increase
- decrease
,而 start_count
是前一个日期的 end_count
.
基本上,我想得到如下所示的内容 example: final<-tibble(start_date=rep(seq.Date(as.Date("2021-06-01"),as.Date("2021-10-01"),by="months"),4),end_date=rep(seq.Date(as.Date("2021-07-01"),as.Date("2021-11-01"),by="months"),4),id=rep(c(rep(1,5),rep(2,5)),2),group=c(rep("a",10),rep("b",10)),increase=c(c(4:8),c(5:9),c(6:10),c(7:11)),decrease=c(c(1:5),c(2:6),c(3:7),c(4:8)),start_count=c(c(10,13,16,19,21),c(15,18,21,24,27),c(20,23,26,29,32),c(20,23,26,29,31)),end_count=c(c(13,16,19,21,24),c(18,21,24,27,30),c(23,26,29,32,35),c(23,26,29,31,34)))
print(final)
start_date end_date id group increase decrease start_count end_count
<date> <date> <dbl> <chr> <int> <int> <dbl> <dbl>
1 2021-06-01 2021-07-01 1 a 4 1 10 13
2 2021-07-01 2021-08-01 1 a 5 2 13 16
3 2021-08-01 2021-09-01 1 a 6 3 16 19
4 2021-09-01 2021-10-01 1 a 7 4 19 21
5 2021-10-01 2021-11-01 1 a 8 5 21 24
6 2021-06-01 2021-07-01 2 a 5 2 15 18
7 2021-07-01 2021-08-01 2 a 6 3 18 21
8 2021-08-01 2021-09-01 2 a 7 4 21 24
9 2021-09-01 2021-10-01 2 a 8 5 24 27
10 2021-10-01 2021-11-01 2 a 9 6 27 30
11 2021-06-01 2021-07-01 1 b 6 3 20 23
12 2021-07-01 2021-08-01 1 b 7 4 23 26
13 2021-08-01 2021-09-01 1 b 8 5 26 29
14 2021-09-01 2021-10-01 1 b 9 6 29 32
15 2021-10-01 2021-11-01 1 b 10 7 32 35
16 2021-06-01 2021-07-01 2 b 7 4 20 23
17 2021-07-01 2021-08-01 2 b 8 5 23 26
18 2021-08-01 2021-09-01 2 b 9 6 26 29
19 2021-09-01 2021-10-01 2 b 10 7 29 31
20 2021-10-01 2021-11-01 2 b 11 8 31 34
我们可以使用
library(dplyr)
library(zoo)
intial %>%
group_by(id, group) %>%
mutate(start_count_new = coalesce(start_count,
zoo::na.locf0(start_count) + increase - decrease),
end_count_new = coalesce(lead(start_count_new),
lead(start_count_new, default = last(start_count_new)) + increase - decrease))
我有这样的数据:
intial<-tibble(start_date=rep(seq.Date(as.Date("2021-06-01"),as.Date("2021-10-01"),by="months"),4),end_date=rep(seq.Date(as.Date("2021-07-01"),as.Date("2021-11-01"),by="months"),4),id=rep(c(rep(1,5),rep(2,5)),2),group=c(rep("a",10),rep("b",10)),increase=c(c(4:8),c(5:9),c(6:10),c(7:11)),decrease=c(c(1:5),c(2:6),c(3:7),c(4:8)),start_count=c(c(10,13,16,19,NA),c(15,18,21,24,NA),c(20,23,26,29,NA),c(20,23,26,29,NA)),end_count=c(c(13,16,19,NA,NA),c(18,21,24,NA,NA),c(23,26,29,NA,NA),c(23,26,29,NA,NA)))
print(initial)
start_date end_date id group increase decrease start_count end_count
<date> <date> <dbl> <chr> <int> <int> <dbl> <dbl>
1 2021-06-01 2021-07-01 1 a 4 1 10 13
2 2021-07-01 2021-08-01 1 a 5 2 13 16
3 2021-08-01 2021-09-01 1 a 6 3 16 19
4 2021-09-01 2021-10-01 1 a 7 4 19 NA
5 2021-10-01 2021-11-01 1 a 8 5 NA NA
6 2021-06-01 2021-07-01 2 a 5 2 15 18
7 2021-07-01 2021-08-01 2 a 6 3 18 21
8 2021-08-01 2021-09-01 2 a 7 4 21 24
9 2021-09-01 2021-10-01 2 a 8 5 24 NA
10 2021-10-01 2021-11-01 2 a 9 6 NA NA
11 2021-06-01 2021-07-01 1 b 6 3 20 23
12 2021-07-01 2021-08-01 1 b 7 4 23 26
13 2021-08-01 2021-09-01 1 b 8 5 26 29
14 2021-09-01 2021-10-01 1 b 9 6 29 NA
15 2021-10-01 2021-11-01 1 b 10 7 NA NA
16 2021-06-01 2021-07-01 2 b 7 4 20 23
17 2021-07-01 2021-08-01 2 b 8 5 23 26
18 2021-08-01 2021-09-01 2 b 9 6 26 29
19 2021-09-01 2021-10-01 2 b 10 7 29 NA
20 2021-10-01 2021-11-01 2 b 11 8 NA NA
其中 id
和 group
的每个唯一组合都包含一个日期索引。我需要使用 increase
和 decrease
字段中的值来计算并在每个 id
和 [=15] 中的每个日期按顺序填充 start_count
和 end_count
=] 组合.
end_count
的计算公式为:start_count
+ increase
- decrease
,而 start_count
是前一个日期的 end_count
.
基本上,我想得到如下所示的内容 example: final<-tibble(start_date=rep(seq.Date(as.Date("2021-06-01"),as.Date("2021-10-01"),by="months"),4),end_date=rep(seq.Date(as.Date("2021-07-01"),as.Date("2021-11-01"),by="months"),4),id=rep(c(rep(1,5),rep(2,5)),2),group=c(rep("a",10),rep("b",10)),increase=c(c(4:8),c(5:9),c(6:10),c(7:11)),decrease=c(c(1:5),c(2:6),c(3:7),c(4:8)),start_count=c(c(10,13,16,19,21),c(15,18,21,24,27),c(20,23,26,29,32),c(20,23,26,29,31)),end_count=c(c(13,16,19,21,24),c(18,21,24,27,30),c(23,26,29,32,35),c(23,26,29,31,34)))
print(final)
start_date end_date id group increase decrease start_count end_count
<date> <date> <dbl> <chr> <int> <int> <dbl> <dbl>
1 2021-06-01 2021-07-01 1 a 4 1 10 13
2 2021-07-01 2021-08-01 1 a 5 2 13 16
3 2021-08-01 2021-09-01 1 a 6 3 16 19
4 2021-09-01 2021-10-01 1 a 7 4 19 21
5 2021-10-01 2021-11-01 1 a 8 5 21 24
6 2021-06-01 2021-07-01 2 a 5 2 15 18
7 2021-07-01 2021-08-01 2 a 6 3 18 21
8 2021-08-01 2021-09-01 2 a 7 4 21 24
9 2021-09-01 2021-10-01 2 a 8 5 24 27
10 2021-10-01 2021-11-01 2 a 9 6 27 30
11 2021-06-01 2021-07-01 1 b 6 3 20 23
12 2021-07-01 2021-08-01 1 b 7 4 23 26
13 2021-08-01 2021-09-01 1 b 8 5 26 29
14 2021-09-01 2021-10-01 1 b 9 6 29 32
15 2021-10-01 2021-11-01 1 b 10 7 32 35
16 2021-06-01 2021-07-01 2 b 7 4 20 23
17 2021-07-01 2021-08-01 2 b 8 5 23 26
18 2021-08-01 2021-09-01 2 b 9 6 26 29
19 2021-09-01 2021-10-01 2 b 10 7 29 31
20 2021-10-01 2021-11-01 2 b 11 8 31 34
我们可以使用
library(dplyr)
library(zoo)
intial %>%
group_by(id, group) %>%
mutate(start_count_new = coalesce(start_count,
zoo::na.locf0(start_count) + increase - decrease),
end_count_new = coalesce(lead(start_count_new),
lead(start_count_new, default = last(start_count_new)) + increase - decrease))