R - 对不同数据框中同一列的一组行求和
R - sum a set number of rows from same column in a different data frame
我有以下数据框:
df <- data.frame( year = c(1985,1986,1987,1988,1989,1990,
1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,
2001,2002,2003,2004,2005,2006,2007,2008,2009, 2010,
2011,2012, 2013,2014,2015,2016,2017,2018,2019,2020),
value = c(0,5,10,2,6,7,3,4,5,9,10,6,8,7,3,5,2,10,9,6,5,10,4,7,8,10,
4,6,8,9,2,3,7,6,2,1))
我想创建第二个数据框 (df2),它由前一个数据框的 20 年间隔组成,即
df2 <- data.frame(year=c("1985-2005", "1986-2006","1987-2007", "1988-2008","1989-2009",
"1990-2010", "1991-2011","1992-2002", "1993-2003","1994-2004",
"1995-2005", "1996-2006","1997-2007", "1998-2008", "1999-2009",
"2000-2020"))
现在 df2
的 value
应该是 df
的 value
的总和,间隔 20 年
(即,对于 df2
中的“1985-2005”年,该值是 df
中从 1985 年到 2005 年的值的总和 - Excel 附有最终值的快照)
如何执行此计算?还有任何可能的自动化来定义 df2
中的 year
间隔而无需键入它?
可能的解决方案:
library(tidyverse)
df <- data.frame( year = c(1985,1986,1987,1988,1989,1990,
1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,
2001,2002,2003,2004,2005,2006,2007,2008,2009, 2010,
2011,2012, 2013,2014,2015,2016,2017,2018,2019,2020),
value = c(0,5,10,2,6,7,3,4,5,9,10,6,8,7,3,5,2,10,9,6,5,10,4,7,8,10,
4,6,8,9,2,3,7,6,2,1))
df2 <- data.frame(year=c("1985-2005", "1986-2006","1987-2007", "1988-2008","1989-2009",
"1990-2010", "1991-2011","1992-2002", "1993-2003","1994-2004",
"1995-2005", "1996-2006","1997-2007", "1998-2008", "1999-2009",
"2000-2020"))
df2 %>%
separate(year, into = c("y1", "y2"), sep="-", convert = T, remove = F) %>%
rowwise %>%
mutate(value = sum(df$value[df$year >= y1 & df$year <= y2])) %>%
select(-y1, -y2) %>% ungroup
#> # A tibble: 16 × 2
#> year value
#> <chr> <dbl>
#> 1 1985-2005 122
#> 2 1986-2006 132
#> 3 1987-2007 131
#> 4 1988-2008 128
#> 5 1989-2009 134
#> 6 1990-2010 138
#> 7 1991-2011 135
#> 8 1992-2002 69
#> 9 1993-2003 74
#> 10 1994-2004 75
#> 11 1995-2005 71
#> 12 1996-2006 71
#> 13 1997-2007 69
#> 14 1998-2008 68
#> 15 1999-2009 69
#> 16 2000-2020 124
我有以下数据框:
df <- data.frame( year = c(1985,1986,1987,1988,1989,1990,
1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,
2001,2002,2003,2004,2005,2006,2007,2008,2009, 2010,
2011,2012, 2013,2014,2015,2016,2017,2018,2019,2020),
value = c(0,5,10,2,6,7,3,4,5,9,10,6,8,7,3,5,2,10,9,6,5,10,4,7,8,10,
4,6,8,9,2,3,7,6,2,1))
我想创建第二个数据框 (df2),它由前一个数据框的 20 年间隔组成,即
df2 <- data.frame(year=c("1985-2005", "1986-2006","1987-2007", "1988-2008","1989-2009",
"1990-2010", "1991-2011","1992-2002", "1993-2003","1994-2004",
"1995-2005", "1996-2006","1997-2007", "1998-2008", "1999-2009",
"2000-2020"))
现在 df2
的 value
应该是 df
的 value
的总和,间隔 20 年
(即,对于 df2
中的“1985-2005”年,该值是 df
中从 1985 年到 2005 年的值的总和 - Excel 附有最终值的快照)
如何执行此计算?还有任何可能的自动化来定义 df2
中的 year
间隔而无需键入它?
可能的解决方案:
library(tidyverse)
df <- data.frame( year = c(1985,1986,1987,1988,1989,1990,
1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,
2001,2002,2003,2004,2005,2006,2007,2008,2009, 2010,
2011,2012, 2013,2014,2015,2016,2017,2018,2019,2020),
value = c(0,5,10,2,6,7,3,4,5,9,10,6,8,7,3,5,2,10,9,6,5,10,4,7,8,10,
4,6,8,9,2,3,7,6,2,1))
df2 <- data.frame(year=c("1985-2005", "1986-2006","1987-2007", "1988-2008","1989-2009",
"1990-2010", "1991-2011","1992-2002", "1993-2003","1994-2004",
"1995-2005", "1996-2006","1997-2007", "1998-2008", "1999-2009",
"2000-2020"))
df2 %>%
separate(year, into = c("y1", "y2"), sep="-", convert = T, remove = F) %>%
rowwise %>%
mutate(value = sum(df$value[df$year >= y1 & df$year <= y2])) %>%
select(-y1, -y2) %>% ungroup
#> # A tibble: 16 × 2
#> year value
#> <chr> <dbl>
#> 1 1985-2005 122
#> 2 1986-2006 132
#> 3 1987-2007 131
#> 4 1988-2008 128
#> 5 1989-2009 134
#> 6 1990-2010 138
#> 7 1991-2011 135
#> 8 1992-2002 69
#> 9 1993-2003 74
#> 10 1994-2004 75
#> 11 1995-2005 71
#> 12 1996-2006 71
#> 13 1997-2007 69
#> 14 1998-2008 68
#> 15 1999-2009 69
#> 16 2000-2020 124