R:如何计算数据框中每个第 n 个间隔的平均值?
R: How can I calculate averages for each nth interval in a data frame?
我正在尝试使用 tidyverse 函数(即 dplyr and/or tidyr)找到一组每 5 年间隔的列的平均值。
例如,如果我在 R 中使用现有的 gapminder 数据,我将如何计算每个大洲每 5 年间隔的平均预期寿命?
我可以尝试这样的方法,但它并不能准确地给出我想要的,因为我不确定如何在代码中包含 5 年的时间间隔:
library(gapminder)
gapminder <- gapminder
gapminder.avglife <- gapminder %>% group_by(continent) %>%
summarize(lifeavg = mean(lifeExp))
每 5 年在 group_by
中创建另一列并计算 lifeExp
的 mean
。
library(gapminder)
library(dplyr)
gapminder %>%
group_by(continent, year = ceiling(year/5) * 5) %>%
summarize(year = paste(first(year) - 5, first(year), sep = '-'),
lifeavg = mean(lifeExp)) %>%
ungroup
# continent year lifeavg
# <fct> <chr> <dbl>
# 1 Africa 1950-1955 39.1
# 2 Africa 1955-1960 41.3
# 3 Africa 1960-1965 43.3
# 4 Africa 1965-1970 45.3
# 5 Africa 1970-1975 47.5
# 6 Africa 1975-1980 49.6
# 7 Africa 1980-1985 51.6
# 8 Africa 1985-1990 53.3
# 9 Africa 1990-1995 53.6
#10 Africa 1995-2000 53.6
# … with 50 more rows
我的回答是这样的
gapminder %>% group_by(continent) %>%
mutate(FiveYrInterval = ((year - min(year)) %/% 5)+1) %>%
group_by(continent, FiveYrInterval) %>%
summarise(mean(lifeExp))
# A tibble: 60 x 3
# Groups: continent [5]
continent FiveYrInterval `mean(lifeExp)`
<fct> <dbl> <dbl>
1 Africa 1 39.1
2 Africa 2 41.3
3 Africa 3 43.3
4 Africa 4 45.3
5 Africa 5 47.5
6 Africa 6 49.6
7 Africa 7 51.6
8 Africa 8 53.3
9 Africa 9 53.6
10 Africa 10 53.6
# ... with 50 more rows
的确,Ronak 的要好得多。
您可以尝试使用 ggplot2 中的 cut_interval 来获取每个大陆的 5 年间隔
gapminder %>%
mutate(interval = cut_interval(year,
n = (max(year)-min(year))/5)) %>%
group_by(continent, interval) %>%
summarise(avg = mean(lifeExp))
# A tibble: 55 x 3
# Groups: continent [5]
continent interval avg
<fct> <fct> <dbl>
1 Africa [1952,1957] 40.2
2 Africa (1957,1962] 43.3
3 Africa (1962,1967] 45.3
4 Africa (1967,1972] 47.5
5 Africa (1972,1977] 49.6
6 Africa (1977,1982] 51.6
7 Africa (1982,1987] 53.3
8 Africa (1987,1992] 53.6
9 Africa (1992,1997] 53.6
10 Africa (1997,2002] 53.3
# ... with 45 more rows
尝试使用 Hmisc 包中的 cut2
library(Hmisc)
gapminder %>%
mutate(interval = cut2(year, seq(1952,2007,5))) %>%
group_by(continent, interval) %>%
summarise(avg = mean(lifeExp))
# A tibble: 55 x 3
# Groups: continent [5]
continent interval avg
<fct> <fct> <dbl>
1 Africa 1952 39.1
2 Africa 1957 41.3
3 Africa 1962 43.3
4 Africa 1967 45.3
5 Africa 1972 47.5
6 Africa 1977 49.6
7 Africa 1982 51.6
8 Africa 1987 53.3
9 Africa 1992 53.6
10 Africa 1997 53.6
# ... with 45 more rows
我正在尝试使用 tidyverse 函数(即 dplyr and/or tidyr)找到一组每 5 年间隔的列的平均值。
例如,如果我在 R 中使用现有的 gapminder 数据,我将如何计算每个大洲每 5 年间隔的平均预期寿命?
我可以尝试这样的方法,但它并不能准确地给出我想要的,因为我不确定如何在代码中包含 5 年的时间间隔:
library(gapminder)
gapminder <- gapminder
gapminder.avglife <- gapminder %>% group_by(continent) %>%
summarize(lifeavg = mean(lifeExp))
每 5 年在 group_by
中创建另一列并计算 lifeExp
的 mean
。
library(gapminder)
library(dplyr)
gapminder %>%
group_by(continent, year = ceiling(year/5) * 5) %>%
summarize(year = paste(first(year) - 5, first(year), sep = '-'),
lifeavg = mean(lifeExp)) %>%
ungroup
# continent year lifeavg
# <fct> <chr> <dbl>
# 1 Africa 1950-1955 39.1
# 2 Africa 1955-1960 41.3
# 3 Africa 1960-1965 43.3
# 4 Africa 1965-1970 45.3
# 5 Africa 1970-1975 47.5
# 6 Africa 1975-1980 49.6
# 7 Africa 1980-1985 51.6
# 8 Africa 1985-1990 53.3
# 9 Africa 1990-1995 53.6
#10 Africa 1995-2000 53.6
# … with 50 more rows
我的回答是这样的
gapminder %>% group_by(continent) %>%
mutate(FiveYrInterval = ((year - min(year)) %/% 5)+1) %>%
group_by(continent, FiveYrInterval) %>%
summarise(mean(lifeExp))
# A tibble: 60 x 3
# Groups: continent [5]
continent FiveYrInterval `mean(lifeExp)`
<fct> <dbl> <dbl>
1 Africa 1 39.1
2 Africa 2 41.3
3 Africa 3 43.3
4 Africa 4 45.3
5 Africa 5 47.5
6 Africa 6 49.6
7 Africa 7 51.6
8 Africa 8 53.3
9 Africa 9 53.6
10 Africa 10 53.6
# ... with 50 more rows
的确,Ronak 的
您可以尝试使用 ggplot2 中的 cut_interval 来获取每个大陆的 5 年间隔
gapminder %>%
mutate(interval = cut_interval(year,
n = (max(year)-min(year))/5)) %>%
group_by(continent, interval) %>%
summarise(avg = mean(lifeExp))
# A tibble: 55 x 3
# Groups: continent [5]
continent interval avg
<fct> <fct> <dbl>
1 Africa [1952,1957] 40.2
2 Africa (1957,1962] 43.3
3 Africa (1962,1967] 45.3
4 Africa (1967,1972] 47.5
5 Africa (1972,1977] 49.6
6 Africa (1977,1982] 51.6
7 Africa (1982,1987] 53.3
8 Africa (1987,1992] 53.6
9 Africa (1992,1997] 53.6
10 Africa (1997,2002] 53.3
# ... with 45 more rows
尝试使用 Hmisc 包中的 cut2
library(Hmisc)
gapminder %>%
mutate(interval = cut2(year, seq(1952,2007,5))) %>%
group_by(continent, interval) %>%
summarise(avg = mean(lifeExp))
# A tibble: 55 x 3
# Groups: continent [5]
continent interval avg
<fct> <fct> <dbl>
1 Africa 1952 39.1
2 Africa 1957 41.3
3 Africa 1962 43.3
4 Africa 1967 45.3
5 Africa 1972 47.5
6 Africa 1977 49.6
7 Africa 1982 51.6
8 Africa 1987 53.3
9 Africa 1992 53.6
10 Africa 1997 53.6
# ... with 45 more rows