根据年龄跨度和其他类别创建新组
Create new groups based on age span and other categories
如何将人口划分为特定年龄段的年龄组?
更具体地说,我想在每个组中创建 5 个年龄组:15-20、21-26、27-32 等等。我还想保留类别 marriage_status 和性别。我试过了,但有点卡住了。
# data
tibble::tribble(
~region, ~marriage_status, ~age, ~gender, ~population, ~year,
"Riket", "ogifta", 15, "män", 56031, 1968,
"Riket", "ogifta", 15, "kvinnor", 52959, 1968,
"Riket", "ogifta", 16, "män", 55917, 1968,
"Riket", "ogifta", 16, "kvinnor", 52979, 1968,
"Riket", "ogifta", 17, "män", 55922, 1968,
"Riket", "ogifta", 17, "kvinnor", 52050, 1968,
"Riket", "ogifta", 18, "män", 58681, 1968,
"Riket", "ogifta", 18, "kvinnor", 51862, 1968,
"Riket", "ogifta", 19, "män", 60387, 1968,
"Riket", "ogifta", 19, "kvinnor", 49750, 1968,
"Riket", "ogifta", 20, "män", 62487, 1968,
"Riket", "ogifta", 20, "kvinnor", 50089, 1968,
"Riket", "ogifta", 21, "män", 60714, 1968,
"Riket", "ogifta", 21, "kvinnor", 43413, 1968,
"Riket", "ogifta", 22, "män", 56801, 1968,
"Riket", "ogifta", 22, "kvinnor", 36301, 1968,
"Riket", "ogifta", 23, "män", 49862, 1968,
"Riket", "ogifta", 23, "kvinnor", 29227, 1968,
"Riket", "ogifta", 24, "män", 42143, 1968,
"Riket", "ogifta", 24, "kvinnor", 23155, 1968
)
# Create groups
pop_clean %>%
group_by(gender, marriage_status) %>%
group_by(grp = cut(age, seq(15, 74, by = 5)))
输出有点像我想要的,但它给出了一些 NA,并且组重叠。
非常感谢任何帮助!
region marriage_status age gender population year grp
<chr> <chr> <dbl> <chr> <dbl> <dbl> <fct>
1 Riket ogifta 15 män 56031 1968 NA
2 Riket ogifta 15 kvinnor 52959 1968 NA
3 Riket ogifta 16 män 55917 1968 (15,20]
4 Riket ogifta 16 kvinnor 52979 1968 (15,20]
5 Riket ogifta 17 män 55922 1968 (15,20]
在 cut
中,您需要包含 include.lowest = TRUE
参数以包含 left-limit。为了遵循您问题中的间隔(即 15-20、21-26、27-32 等),我建议将 labels
添加到 cut
函数。
如果你想把所有的age
分组到不同的区间,你不需要使用group_by
,mutate
就足够了。
library(dplyr)
pop_clean %>% mutate(grp = cut(age,
breaks = seq(15, 75, by = 6),
labels = paste0(seq(15, 70, by = 6), "-", seq(20, 75, by = 6)),
include.lowest = T,
right = F))
# A tibble: 20 × 7
region marriage_status age gender population year grp
<chr> <chr> <dbl> <chr> <dbl> <dbl> <fct>
1 Riket ogifta 15 män 56031 1968 15-20
2 Riket ogifta 15 kvinnor 52959 1968 15-20
3 Riket ogifta 16 män 55917 1968 15-20
4 Riket ogifta 16 kvinnor 52979 1968 15-20
5 Riket ogifta 17 män 55922 1968 15-20
6 Riket ogifta 17 kvinnor 52050 1968 15-20
7 Riket ogifta 18 män 58681 1968 15-20
8 Riket ogifta 18 kvinnor 51862 1968 15-20
9 Riket ogifta 19 män 60387 1968 15-20
10 Riket ogifta 19 kvinnor 49750 1968 15-20
11 Riket ogifta 20 män 62487 1968 15-20
12 Riket ogifta 20 kvinnor 50089 1968 15-20
13 Riket ogifta 21 män 60714 1968 21-26
14 Riket ogifta 21 kvinnor 43413 1968 21-26
15 Riket ogifta 22 män 56801 1968 21-26
16 Riket ogifta 22 kvinnor 36301 1968 21-26
17 Riket ogifta 23 män 49862 1968 21-26
18 Riket ogifta 23 kvinnor 29227 1968 21-26
19 Riket ogifta 24 män 42143 1968 21-26
20 Riket ogifta 24 kvinnor 23155 1968 21-26
如何将人口划分为特定年龄段的年龄组?
更具体地说,我想在每个组中创建 5 个年龄组:15-20、21-26、27-32 等等。我还想保留类别 marriage_status 和性别。我试过了,但有点卡住了。
# data
tibble::tribble(
~region, ~marriage_status, ~age, ~gender, ~population, ~year,
"Riket", "ogifta", 15, "män", 56031, 1968,
"Riket", "ogifta", 15, "kvinnor", 52959, 1968,
"Riket", "ogifta", 16, "män", 55917, 1968,
"Riket", "ogifta", 16, "kvinnor", 52979, 1968,
"Riket", "ogifta", 17, "män", 55922, 1968,
"Riket", "ogifta", 17, "kvinnor", 52050, 1968,
"Riket", "ogifta", 18, "män", 58681, 1968,
"Riket", "ogifta", 18, "kvinnor", 51862, 1968,
"Riket", "ogifta", 19, "män", 60387, 1968,
"Riket", "ogifta", 19, "kvinnor", 49750, 1968,
"Riket", "ogifta", 20, "män", 62487, 1968,
"Riket", "ogifta", 20, "kvinnor", 50089, 1968,
"Riket", "ogifta", 21, "män", 60714, 1968,
"Riket", "ogifta", 21, "kvinnor", 43413, 1968,
"Riket", "ogifta", 22, "män", 56801, 1968,
"Riket", "ogifta", 22, "kvinnor", 36301, 1968,
"Riket", "ogifta", 23, "män", 49862, 1968,
"Riket", "ogifta", 23, "kvinnor", 29227, 1968,
"Riket", "ogifta", 24, "män", 42143, 1968,
"Riket", "ogifta", 24, "kvinnor", 23155, 1968
)
# Create groups
pop_clean %>%
group_by(gender, marriage_status) %>%
group_by(grp = cut(age, seq(15, 74, by = 5)))
输出有点像我想要的,但它给出了一些 NA,并且组重叠。
非常感谢任何帮助!
region marriage_status age gender population year grp
<chr> <chr> <dbl> <chr> <dbl> <dbl> <fct>
1 Riket ogifta 15 män 56031 1968 NA
2 Riket ogifta 15 kvinnor 52959 1968 NA
3 Riket ogifta 16 män 55917 1968 (15,20]
4 Riket ogifta 16 kvinnor 52979 1968 (15,20]
5 Riket ogifta 17 män 55922 1968 (15,20]
在 cut
中,您需要包含 include.lowest = TRUE
参数以包含 left-limit。为了遵循您问题中的间隔(即 15-20、21-26、27-32 等),我建议将 labels
添加到 cut
函数。
如果你想把所有的age
分组到不同的区间,你不需要使用group_by
,mutate
就足够了。
library(dplyr)
pop_clean %>% mutate(grp = cut(age,
breaks = seq(15, 75, by = 6),
labels = paste0(seq(15, 70, by = 6), "-", seq(20, 75, by = 6)),
include.lowest = T,
right = F))
# A tibble: 20 × 7
region marriage_status age gender population year grp
<chr> <chr> <dbl> <chr> <dbl> <dbl> <fct>
1 Riket ogifta 15 män 56031 1968 15-20
2 Riket ogifta 15 kvinnor 52959 1968 15-20
3 Riket ogifta 16 män 55917 1968 15-20
4 Riket ogifta 16 kvinnor 52979 1968 15-20
5 Riket ogifta 17 män 55922 1968 15-20
6 Riket ogifta 17 kvinnor 52050 1968 15-20
7 Riket ogifta 18 män 58681 1968 15-20
8 Riket ogifta 18 kvinnor 51862 1968 15-20
9 Riket ogifta 19 män 60387 1968 15-20
10 Riket ogifta 19 kvinnor 49750 1968 15-20
11 Riket ogifta 20 män 62487 1968 15-20
12 Riket ogifta 20 kvinnor 50089 1968 15-20
13 Riket ogifta 21 män 60714 1968 21-26
14 Riket ogifta 21 kvinnor 43413 1968 21-26
15 Riket ogifta 22 män 56801 1968 21-26
16 Riket ogifta 22 kvinnor 36301 1968 21-26
17 Riket ogifta 23 män 49862 1968 21-26
18 Riket ogifta 23 kvinnor 29227 1968 21-26
19 Riket ogifta 24 män 42143 1968 21-26
20 Riket ogifta 24 kvinnor 23155 1968 21-26