R:滞后 "cumulative" 两个值之间的差异
R: lagged "cumulative" difference between two values
我有一个 data.frame df
有很多组 (series
),其中数据 area
每年都会呈现。我正在尝试创建一个新列,其中 diff
是第 1 行和第 2 行中区域之间的差异。但我需要继续从“新”差异中减去。对于每个 series
.
,这需要按 year
降序完成
df<-
structure(list(series = c("A218t23", "A218t23", "A218t23", "A218t23",
"A218t23", "A218t23", "A218t23", "A218t23", "A218t23"), year = 2018:2010,
area = c(16409.3632611811, 274.5866082, 293.8540619, 323.0603775,
544.7366938, 108.0737561, 134.8579038, 143.14125, 167.8244576
)), row.names = c(NA, -9L), groups = structure(list(series = "A218t23",
.rows = structure(list(1:9), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
我想要的输出如下所示:
因此,16409
-275
=16135
然后,16135
-294
=15841
等等。
我一直在使用的代码:
df_diffs <- df %>%
dplyr::group_by(series) %>%
dplyr::mutate(diff = area - dplyr::lag(area, default=0, order_by = desc(year)))
但是,这只是 returns area
列中行之间的滞后差异。我正在寻找的结果是“累积”或 运行 差异。我检查了 RcppRoll
和其他一些 SO 帖子,但没有运气。理想情况下,我可以将所有这些都保存在一个管道框架中,因为我还有其他功能在进行。如果有办法将第一行的NA替换为当年相应的面积值,则加分。
非常感谢您的建议!
调整 你可以做的答案:
library(dplyr)
df %>%
dplyr::group_by(series) %>%
dplyr::mutate(diff = c(area[1L], area[1L] - cumsum(area[-1L])))
#> # A tibble: 9 × 4
#> # Groups: series [1]
#> series year area diff
#> <chr> <int> <dbl> <dbl>
#> 1 A218t23 2018 16409. 16409.
#> 2 A218t23 2017 275. 16135.
#> 3 A218t23 2016 294. 15841.
#> 4 A218t23 2015 323. 15518.
#> 5 A218t23 2014 545. 14973.
#> 6 A218t23 2013 108. 14865.
#> 7 A218t23 2012 135. 14730.
#> 8 A218t23 2011 143. 14587.
#> 9 A218t23 2010 168. 14419.
您可以通过调整累积总和来实现。
确实,您从每组的第一个值开始,然后减去后面的每个值。如果您将第一个值之后的每个值都视为负值,则累积和将是您的预期输出。
代码如下:
library(tidyverse)
df = df %>%
mutate(series="A") %>%
bind_rows(df)
df %>%
group_by(series) %>%
mutate(
x = ifelse(row_number()==1, area, -area),
diff = cumsum(x)
)
#> # A tibble: 18 x 5
#> # Groups: series [2]
#> series year area x diff
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 A 2018 16409. 16409. 16409.
#> 2 A 2017 275. -275. 16135.
#> 3 A 2016 294. -294. 15841.
#> 4 A 2015 323. -323. 15518.
#> 5 A 2014 545. -545. 14973.
#> 6 A 2013 108. -108. 14865.
#> 7 A 2012 135. -135. 14730.
#> 8 A 2011 143. -143. 14587.
#> 9 A 2010 168. -168. 14419.
#> 10 A218t23 2018 16409. 16409. 16409.
#> 11 A218t23 2017 275. -275. 16135.
#> 12 A218t23 2016 294. -294. 15841.
#> 13 A218t23 2015 323. -323. 15518.
#> 14 A218t23 2014 545. -545. 14973.
#> 15 A218t23 2013 108. -108. 14865.
#> 16 A218t23 2012 135. -135. 14730.
#> 17 A218t23 2011 143. -143. 14587.
#> 18 A218t23 2010 168. -168. 14419.
由 reprex package (v2.0.1)
于 2021-11-09 创建
另一种选择,使用Reduce()
df %>%
group_by(series) %>%
mutate(diff = Reduce("-", area, accumulate = T))
# A tibble: 9 × 4
# Groups: series [1]
series year area diff
<chr> <int> <dbl> <dbl>
1 A218t23 2018 16409. 16409.
2 A218t23 2017 275. 16135.
3 A218t23 2016 294. 15841.
4 A218t23 2015 323. 15518.
5 A218t23 2014 545. 14973.
6 A218t23 2013 108. 14865.
7 A218t23 2012 135. 14730.
8 A218t23 2011 143. 14587.
9 A218t23 2010 168. 14419.
如果你在 tidyverse 中工作,你可以使用 purrr::accumulate
:
library(purrr)
library(dplyr)
df %>%
group_by(series) %>%
mutate(diff = accumulate(area, ~ .x - .y))
在purrr函数中,.x
是当前值,.y
是前一个值。
类似于Reduce
答案,您可以将算术运算符`-`
传递给它:accumulate(area, `-`)
.
输出
# A tibble: 9 x 4
# Groups: series [1]
series year area diff
<chr> <int> <dbl> <dbl>
1 A218t23 2018 16409. 16409.
2 A218t23 2017 275. 16135.
3 A218t23 2016 294. 15841.
4 A218t23 2015 323. 15518.
5 A218t23 2014 545. 14973.
6 A218t23 2013 108. 14865.
7 A218t23 2012 135. 14730.
8 A218t23 2011 143. 14587.
9 A218t23 2010 168. 14419.
我有一个 data.frame df
有很多组 (series
),其中数据 area
每年都会呈现。我正在尝试创建一个新列,其中 diff
是第 1 行和第 2 行中区域之间的差异。但我需要继续从“新”差异中减去。对于每个 series
.
year
降序完成
df<-
structure(list(series = c("A218t23", "A218t23", "A218t23", "A218t23",
"A218t23", "A218t23", "A218t23", "A218t23", "A218t23"), year = 2018:2010,
area = c(16409.3632611811, 274.5866082, 293.8540619, 323.0603775,
544.7366938, 108.0737561, 134.8579038, 143.14125, 167.8244576
)), row.names = c(NA, -9L), groups = structure(list(series = "A218t23",
.rows = structure(list(1:9), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
我想要的输出如下所示:
因此,16409
-275
=16135
然后,16135
-294
=15841
等等。
我一直在使用的代码:
df_diffs <- df %>%
dplyr::group_by(series) %>%
dplyr::mutate(diff = area - dplyr::lag(area, default=0, order_by = desc(year)))
但是,这只是 returns area
列中行之间的滞后差异。我正在寻找的结果是“累积”或 运行 差异。我检查了 RcppRoll
和其他一些 SO 帖子,但没有运气。理想情况下,我可以将所有这些都保存在一个管道框架中,因为我还有其他功能在进行。如果有办法将第一行的NA替换为当年相应的面积值,则加分。
非常感谢您的建议!
调整
library(dplyr)
df %>%
dplyr::group_by(series) %>%
dplyr::mutate(diff = c(area[1L], area[1L] - cumsum(area[-1L])))
#> # A tibble: 9 × 4
#> # Groups: series [1]
#> series year area diff
#> <chr> <int> <dbl> <dbl>
#> 1 A218t23 2018 16409. 16409.
#> 2 A218t23 2017 275. 16135.
#> 3 A218t23 2016 294. 15841.
#> 4 A218t23 2015 323. 15518.
#> 5 A218t23 2014 545. 14973.
#> 6 A218t23 2013 108. 14865.
#> 7 A218t23 2012 135. 14730.
#> 8 A218t23 2011 143. 14587.
#> 9 A218t23 2010 168. 14419.
您可以通过调整累积总和来实现。
确实,您从每组的第一个值开始,然后减去后面的每个值。如果您将第一个值之后的每个值都视为负值,则累积和将是您的预期输出。
代码如下:
library(tidyverse)
df = df %>%
mutate(series="A") %>%
bind_rows(df)
df %>%
group_by(series) %>%
mutate(
x = ifelse(row_number()==1, area, -area),
diff = cumsum(x)
)
#> # A tibble: 18 x 5
#> # Groups: series [2]
#> series year area x diff
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 A 2018 16409. 16409. 16409.
#> 2 A 2017 275. -275. 16135.
#> 3 A 2016 294. -294. 15841.
#> 4 A 2015 323. -323. 15518.
#> 5 A 2014 545. -545. 14973.
#> 6 A 2013 108. -108. 14865.
#> 7 A 2012 135. -135. 14730.
#> 8 A 2011 143. -143. 14587.
#> 9 A 2010 168. -168. 14419.
#> 10 A218t23 2018 16409. 16409. 16409.
#> 11 A218t23 2017 275. -275. 16135.
#> 12 A218t23 2016 294. -294. 15841.
#> 13 A218t23 2015 323. -323. 15518.
#> 14 A218t23 2014 545. -545. 14973.
#> 15 A218t23 2013 108. -108. 14865.
#> 16 A218t23 2012 135. -135. 14730.
#> 17 A218t23 2011 143. -143. 14587.
#> 18 A218t23 2010 168. -168. 14419.
由 reprex package (v2.0.1)
于 2021-11-09 创建另一种选择,使用Reduce()
df %>%
group_by(series) %>%
mutate(diff = Reduce("-", area, accumulate = T))
# A tibble: 9 × 4
# Groups: series [1]
series year area diff
<chr> <int> <dbl> <dbl>
1 A218t23 2018 16409. 16409.
2 A218t23 2017 275. 16135.
3 A218t23 2016 294. 15841.
4 A218t23 2015 323. 15518.
5 A218t23 2014 545. 14973.
6 A218t23 2013 108. 14865.
7 A218t23 2012 135. 14730.
8 A218t23 2011 143. 14587.
9 A218t23 2010 168. 14419.
如果你在 tidyverse 中工作,你可以使用 purrr::accumulate
:
library(purrr)
library(dplyr)
df %>%
group_by(series) %>%
mutate(diff = accumulate(area, ~ .x - .y))
在purrr函数中,.x
是当前值,.y
是前一个值。
类似于Reduce
答案,您可以将算术运算符`-`
传递给它:accumulate(area, `-`)
.
输出
# A tibble: 9 x 4
# Groups: series [1]
series year area diff
<chr> <int> <dbl> <dbl>
1 A218t23 2018 16409. 16409.
2 A218t23 2017 275. 16135.
3 A218t23 2016 294. 15841.
4 A218t23 2015 323. 15518.
5 A218t23 2014 545. 14973.
6 A218t23 2013 108. 14865.
7 A218t23 2012 135. 14730.
8 A218t23 2011 143. 14587.
9 A218t23 2010 168. 14419.