根据年龄跨度和其他类别创建新组

Create new groups based on age span and other categories

如何将人口划分为特定年龄段的年龄组?

更具体地说,我想在每个组中创建 5 个年龄组:15-20、21-26、27-32 等等。我还想保留类别 marriage_status 和性别。我试过了,但有点卡住了。

# data
tibble::tribble(
     ~region, ~marriage_status, ~age,   ~gender, ~population, ~year,
     "Riket",         "ogifta",   15,     "män",       56031,  1968,
     "Riket",         "ogifta",   15, "kvinnor",       52959,  1968,
     "Riket",         "ogifta",   16,     "män",       55917,  1968,
     "Riket",         "ogifta",   16, "kvinnor",       52979,  1968,
     "Riket",         "ogifta",   17,     "män",       55922,  1968,
     "Riket",         "ogifta",   17, "kvinnor",       52050,  1968,
     "Riket",         "ogifta",   18,     "män",       58681,  1968,
     "Riket",         "ogifta",   18, "kvinnor",       51862,  1968,
     "Riket",         "ogifta",   19,     "män",       60387,  1968,
     "Riket",         "ogifta",   19, "kvinnor",       49750,  1968,
     "Riket",         "ogifta",   20,     "män",       62487,  1968,
     "Riket",         "ogifta",   20, "kvinnor",       50089,  1968,
     "Riket",         "ogifta",   21,     "män",       60714,  1968,
     "Riket",         "ogifta",   21, "kvinnor",       43413,  1968,
     "Riket",         "ogifta",   22,     "män",       56801,  1968,
     "Riket",         "ogifta",   22, "kvinnor",       36301,  1968,
     "Riket",         "ogifta",   23,     "män",       49862,  1968,
     "Riket",         "ogifta",   23, "kvinnor",       29227,  1968,
     "Riket",         "ogifta",   24,     "män",       42143,  1968,
     "Riket",         "ogifta",   24, "kvinnor",       23155,  1968
     )

# Create groups
pop_clean %>%
  group_by(gender, marriage_status) %>% 
  group_by(grp = cut(age, seq(15, 74, by = 5)))

输出有点像我想要的,但它给出了一些 NA,并且组重叠。

非常感谢任何帮助!

 region marriage_status   age gender  population  year grp    
   <chr>  <chr>           <dbl> <chr>        <dbl> <dbl> <fct>  
 1 Riket  ogifta             15 män          56031  1968 NA     
 2 Riket  ogifta             15 kvinnor      52959  1968 NA     
 3 Riket  ogifta             16 män          55917  1968 (15,20]
 4 Riket  ogifta             16 kvinnor      52979  1968 (15,20]
 5 Riket  ogifta             17 män          55922  1968 (15,20]

cut 中,您需要包含 include.lowest = TRUE 参数以包含 left-limit。为了遵循您问题中的间隔(即 15-20、21-26、27-32 等),我建议将 labels 添加到 cut 函数。

如果你想把所有的age分组到不同的区间,你不需要使用group_bymutate就足够了。

library(dplyr)

pop_clean %>% mutate(grp = cut(age, 
                               breaks = seq(15, 75, by = 6), 
                               labels = paste0(seq(15, 70, by = 6), "-", seq(20, 75, by = 6)),
                               include.lowest = T,
                               right = F))

# A tibble: 20 × 7
   region marriage_status   age gender  population  year grp  
   <chr>  <chr>           <dbl> <chr>        <dbl> <dbl> <fct>
 1 Riket  ogifta             15 män          56031  1968 15-20
 2 Riket  ogifta             15 kvinnor      52959  1968 15-20
 3 Riket  ogifta             16 män          55917  1968 15-20
 4 Riket  ogifta             16 kvinnor      52979  1968 15-20
 5 Riket  ogifta             17 män          55922  1968 15-20
 6 Riket  ogifta             17 kvinnor      52050  1968 15-20
 7 Riket  ogifta             18 män          58681  1968 15-20
 8 Riket  ogifta             18 kvinnor      51862  1968 15-20
 9 Riket  ogifta             19 män          60387  1968 15-20
10 Riket  ogifta             19 kvinnor      49750  1968 15-20
11 Riket  ogifta             20 män          62487  1968 15-20
12 Riket  ogifta             20 kvinnor      50089  1968 15-20
13 Riket  ogifta             21 män          60714  1968 21-26
14 Riket  ogifta             21 kvinnor      43413  1968 21-26
15 Riket  ogifta             22 män          56801  1968 21-26
16 Riket  ogifta             22 kvinnor      36301  1968 21-26
17 Riket  ogifta             23 män          49862  1968 21-26
18 Riket  ogifta             23 kvinnor      29227  1968 21-26
19 Riket  ogifta             24 män          42143  1968 21-26
20 Riket  ogifta             24 kvinnor      23155  1968 21-26