如何创建按 2 个不同列分组的累计和
How to create a cumulative sum that is grouped by 2 different columns
假设我有以下数据框,初始化为 df:
ID date value
A 01/2012 1
A 03/2012 2
A 05/2012 4
A 07/2012 3
A 09/2012 7
A 11/2012 1
A 01/2013 2
A 03/2013 8
A 05/2013 13
A 07/2013 2
A 09/2013 5
A 11/2013 2
B 01/2012 3
B 03/2012 9
B 05/2012 1
B 07/2012 0
B 09/2012 12
B 11/2012 3
B 01/2013 1
B 03/2013 4
B 05/2013 3
B 07/2013 3
B 09/2013 1
B 11/2013 1
其中日期变量的格式为 mm/yyyy。我正在寻找一种在此数据框中创建附加列的方法,该列提供按 ID 和年份分组的值列的累计总和。例如,下面的输出是我想要的:
ID date value cumsum
A 01/2012 1 1
A 03/2012 2 3
A 05/2012 4 7
A 07/2012 3 10
A 09/2012 7 17
A 11/2012 1 18
A 01/2013 2 2
A 03/2013 8 10
A 05/2013 13 23
A 07/2013 2 25
A 09/2013 5 30
A 11/2013 2 32
B 01/2012 3 3
B 03/2012 9 12
B 05/2012 1 13
B 07/2012 0 13
B 09/2012 12 25
B 11/2012 3 28
B 01/2013 1 1
B 03/2013 4 5
B 05/2013 3 8
B 07/2013 3 11
B 09/2013 1 12
B 11/2013 1 13
如您所见,每年都会重置总和,每个 ID 也会重置。本质上,我不确定如何创建一个由 2 列而不是 1 列分组的累计和。任何帮助将不胜感激
你可以这样做:
df %>%
group_by(ID, year = substr(date, 4,8)) %>%
mutate(cumsum = cumsum(value))
# A tibble: 24 x 5
# Groups: ID, year [4]
ID date value year cumsum
<chr> <chr> <int> <chr> <int>
1 A 01/2012 1 2012 1
2 A 03/2012 2 2012 3
3 A 05/2012 4 2012 7
4 A 07/2012 3 2012 10
5 A 09/2012 7 2012 17
6 A 11/2012 1 2012 18
7 A 01/2013 2 2013 2
8 A 03/2013 8 2013 10
9 A 05/2013 13 2013 23
10 A 07/2013 2 2013 25
我们转换为 Date
class 并提取 year
部分以创建分组列,然后再进行累加和
library(dplyr)
library(lubridate)
df1 %>%
group_by(ID, year = year(my(date))) %>%
mutate(cumsum = cumsum(value)) %>%
ungroup %>%
select(-year)
# A tibble: 24 x 4
ID date value cumsum
<chr> <chr> <int> <int>
1 A 01/2012 1 1
2 A 03/2012 2 3
3 A 05/2012 4 7
4 A 07/2012 3 10
5 A 09/2012 7 17
6 A 11/2012 1 18
7 A 01/2013 2 2
8 A 03/2013 8 10
9 A 05/2013 13 23
10 A 07/2013 2 25
# … with 14 more rows
数据
df1 <- structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B"), date = c("01/2012", "03/2012", "05/2012", "07/2012",
"09/2012", "11/2012", "01/2013", "03/2013", "05/2013", "07/2013",
"09/2013", "11/2013", "01/2012", "03/2012", "05/2012", "07/2012",
"09/2012", "11/2012", "01/2013", "03/2013", "05/2013", "07/2013",
"09/2013", "11/2013"), value = c(1L, 2L, 4L, 3L, 7L, 1L, 2L,
8L, 13L, 2L, 5L, 2L, 3L, 9L, 1L, 0L, 12L, 3L, 1L, 4L, 3L, 3L,
1L, 1L)), class = "data.frame", row.names = c(NA, -24L))
可选方案:
library(data.tabe)
setDT(df)[,cumsum:=cumsum(value),by=.(ID,substr(date, 4,8))][]
为了完成这里是一个基本的 R 选项 -
transform(df, cumsum = ave(value, ID, sub('.*/', '', date), FUN = cumsum))
# ID date value cumsum
#1 A 01/2012 1 1
#2 A 03/2012 2 3
#3 A 05/2012 4 7
#4 A 07/2012 3 10
#5 A 09/2012 7 17
#6 A 11/2012 1 18
#7 A 01/2013 2 2
#8 A 03/2013 8 10
#9 A 05/2013 13 23
#10 A 07/2013 2 25
#11 A 09/2013 5 30
#12 A 11/2013 2 32
#13 B 01/2012 3 3
#14 B 03/2012 9 12
#15 B 05/2012 1 13
#16 B 07/2012 0 13
#17 B 09/2012 12 25
#18 B 11/2012 3 28
#19 B 01/2013 1 1
#20 B 03/2013 4 5
#21 B 05/2013 3 8
#22 B 07/2013 3 11
#23 B 09/2013 1 12
#24 B 11/2013 1 13
假设我有以下数据框,初始化为 df:
ID date value
A 01/2012 1
A 03/2012 2
A 05/2012 4
A 07/2012 3
A 09/2012 7
A 11/2012 1
A 01/2013 2
A 03/2013 8
A 05/2013 13
A 07/2013 2
A 09/2013 5
A 11/2013 2
B 01/2012 3
B 03/2012 9
B 05/2012 1
B 07/2012 0
B 09/2012 12
B 11/2012 3
B 01/2013 1
B 03/2013 4
B 05/2013 3
B 07/2013 3
B 09/2013 1
B 11/2013 1
其中日期变量的格式为 mm/yyyy。我正在寻找一种在此数据框中创建附加列的方法,该列提供按 ID 和年份分组的值列的累计总和。例如,下面的输出是我想要的:
ID date value cumsum
A 01/2012 1 1
A 03/2012 2 3
A 05/2012 4 7
A 07/2012 3 10
A 09/2012 7 17
A 11/2012 1 18
A 01/2013 2 2
A 03/2013 8 10
A 05/2013 13 23
A 07/2013 2 25
A 09/2013 5 30
A 11/2013 2 32
B 01/2012 3 3
B 03/2012 9 12
B 05/2012 1 13
B 07/2012 0 13
B 09/2012 12 25
B 11/2012 3 28
B 01/2013 1 1
B 03/2013 4 5
B 05/2013 3 8
B 07/2013 3 11
B 09/2013 1 12
B 11/2013 1 13
如您所见,每年都会重置总和,每个 ID 也会重置。本质上,我不确定如何创建一个由 2 列而不是 1 列分组的累计和。任何帮助将不胜感激
你可以这样做:
df %>%
group_by(ID, year = substr(date, 4,8)) %>%
mutate(cumsum = cumsum(value))
# A tibble: 24 x 5
# Groups: ID, year [4]
ID date value year cumsum
<chr> <chr> <int> <chr> <int>
1 A 01/2012 1 2012 1
2 A 03/2012 2 2012 3
3 A 05/2012 4 2012 7
4 A 07/2012 3 2012 10
5 A 09/2012 7 2012 17
6 A 11/2012 1 2012 18
7 A 01/2013 2 2013 2
8 A 03/2013 8 2013 10
9 A 05/2013 13 2013 23
10 A 07/2013 2 2013 25
我们转换为 Date
class 并提取 year
部分以创建分组列,然后再进行累加和
library(dplyr)
library(lubridate)
df1 %>%
group_by(ID, year = year(my(date))) %>%
mutate(cumsum = cumsum(value)) %>%
ungroup %>%
select(-year)
# A tibble: 24 x 4
ID date value cumsum
<chr> <chr> <int> <int>
1 A 01/2012 1 1
2 A 03/2012 2 3
3 A 05/2012 4 7
4 A 07/2012 3 10
5 A 09/2012 7 17
6 A 11/2012 1 18
7 A 01/2013 2 2
8 A 03/2013 8 10
9 A 05/2013 13 23
10 A 07/2013 2 25
# … with 14 more rows
数据
df1 <- structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B"), date = c("01/2012", "03/2012", "05/2012", "07/2012",
"09/2012", "11/2012", "01/2013", "03/2013", "05/2013", "07/2013",
"09/2013", "11/2013", "01/2012", "03/2012", "05/2012", "07/2012",
"09/2012", "11/2012", "01/2013", "03/2013", "05/2013", "07/2013",
"09/2013", "11/2013"), value = c(1L, 2L, 4L, 3L, 7L, 1L, 2L,
8L, 13L, 2L, 5L, 2L, 3L, 9L, 1L, 0L, 12L, 3L, 1L, 4L, 3L, 3L,
1L, 1L)), class = "data.frame", row.names = c(NA, -24L))
可选方案:
library(data.tabe)
setDT(df)[,cumsum:=cumsum(value),by=.(ID,substr(date, 4,8))][]
为了完成这里是一个基本的 R 选项 -
transform(df, cumsum = ave(value, ID, sub('.*/', '', date), FUN = cumsum))
# ID date value cumsum
#1 A 01/2012 1 1
#2 A 03/2012 2 3
#3 A 05/2012 4 7
#4 A 07/2012 3 10
#5 A 09/2012 7 17
#6 A 11/2012 1 18
#7 A 01/2013 2 2
#8 A 03/2013 8 10
#9 A 05/2013 13 23
#10 A 07/2013 2 25
#11 A 09/2013 5 30
#12 A 11/2013 2 32
#13 B 01/2012 3 3
#14 B 03/2012 9 12
#15 B 05/2012 1 13
#16 B 07/2012 0 13
#17 B 09/2012 12 25
#18 B 11/2012 3 28
#19 B 01/2013 1 1
#20 B 03/2013 4 5
#21 B 05/2013 3 8
#22 B 07/2013 3 11
#23 B 09/2013 1 12
#24 B 11/2013 1 13