R:如何计算数据框中每个第 n 个间隔的平均值?

R: How can I calculate averages for each nth interval in a data frame?

我正在尝试使用 tidyverse 函数(即 dplyr and/or tidyr)找到一组每 5 年间隔的列的平均值。

例如,如果我在 R 中使用现有的 gapminder 数据,我将如何计算每个大洲每 5 年间隔的平均预期寿命?

我可以尝试这样的方法,但它并不能准确地给出我想要的,因为我不确定如何在代码中包含 5 年的时间间隔:

library(gapminder)
gapminder <- gapminder

gapminder.avglife <- gapminder %>% group_by(continent) %>% 
  summarize(lifeavg = mean(lifeExp))

每 5 年在 group_by 中创建另一列并计算 lifeExpmean

library(gapminder)
library(dplyr)

gapminder %>% 
  group_by(continent, year = ceiling(year/5) * 5) %>% 
  summarize(year = paste(first(year) - 5, first(year), sep = '-'),
            lifeavg = mean(lifeExp)) %>%
  ungroup

#  continent year      lifeavg
#   <fct>     <chr>       <dbl>
# 1 Africa    1950-1955    39.1
# 2 Africa    1955-1960    41.3
# 3 Africa    1960-1965    43.3
# 4 Africa    1965-1970    45.3
# 5 Africa    1970-1975    47.5
# 6 Africa    1975-1980    49.6
# 7 Africa    1980-1985    51.6
# 8 Africa    1985-1990    53.3
# 9 Africa    1990-1995    53.6
#10 Africa    1995-2000    53.6
# … with 50 more rows

我的回答是这样的

gapminder %>% group_by(continent) %>% 
  mutate(FiveYrInterval = ((year - min(year)) %/% 5)+1) %>%
  group_by(continent, FiveYrInterval) %>%
  summarise(mean(lifeExp))

# A tibble: 60 x 3
# Groups:   continent [5]
   continent FiveYrInterval `mean(lifeExp)`
   <fct>              <dbl>           <dbl>
 1 Africa                 1            39.1
 2 Africa                 2            41.3
 3 Africa                 3            43.3
 4 Africa                 4            45.3
 5 Africa                 5            47.5
 6 Africa                 6            49.6
 7 Africa                 7            51.6
 8 Africa                 8            53.3
 9 Africa                 9            53.6
10 Africa                10            53.6
# ... with 50 more rows

的确,Ronak 的要好得多。

您可以尝试使用 ggplot2 中的 cut_interval 来获取每个大陆的 5 年间隔

gapminder %>% 
  mutate(interval = cut_interval(year, 
                                 n = (max(year)-min(year))/5)) %>% 
  group_by(continent, interval) %>% 
  summarise(avg = mean(lifeExp)) 

# A tibble: 55 x 3
# Groups:   continent [5]
   continent interval      avg
   <fct>     <fct>       <dbl>
 1 Africa    [1952,1957]  40.2
 2 Africa    (1957,1962]  43.3
 3 Africa    (1962,1967]  45.3
 4 Africa    (1967,1972]  47.5
 5 Africa    (1972,1977]  49.6
 6 Africa    (1977,1982]  51.6
 7 Africa    (1982,1987]  53.3
 8 Africa    (1987,1992]  53.6
 9 Africa    (1992,1997]  53.6
10 Africa    (1997,2002]  53.3
# ... with 45 more rows

尝试使用 Hmisc 包中的 cut2

library(Hmisc)

gapminder %>% 
  mutate(interval = cut2(year, seq(1952,2007,5))) %>% 
  group_by(continent, interval) %>% 
  summarise(avg = mean(lifeExp))

# A tibble: 55 x 3
# Groups:   continent [5]
   continent interval   avg
   <fct>     <fct>    <dbl>
 1 Africa    1952      39.1
 2 Africa    1957      41.3
 3 Africa    1962      43.3
 4 Africa    1967      45.3
 5 Africa    1972      47.5
 6 Africa    1977      49.6
 7 Africa    1982      51.6
 8 Africa    1987      53.3
 9 Africa    1992      53.6
10 Africa    1997      53.6
# ... with 45 more rows