只为选定的列旋转更长的时间
pivot longer for only selected columns
我有一个包含 23 列的宽数据集。我想 select 某些列并将它们调整为行(长格式),但只有这些 selected 列。这是我的数据集的示例:
# A tibble: 2 x 23
year popu dd popmale ddmale popfemale ddfemale pop40 dd40 pop41_50 dd41_50 pop51_60 dd51_60 pop61_70 dd61_70 pop71_80 dd71_80 pop81_90
1 2011 197548 2167 98145 1302 99403 1302 56822 52 27614 88 33368 384 30477 683 25418 630 14961
2 2012 200724 2250 99783 1354 100941 896 58646 54 28256 91 34111 400 30919 705 25718 655 14862
# ... with 5 more variables: dd81_90 <dbl>, pop91_100 <dbl>, dd91_100 <dbl>, pop100 <dbl>, dd100 <dbl>
df<-structure(list(year = c(2011, 2012), popu = c(197548, 200724),
dd = c(2167, 2250), popmale = c(98145, 99783), ddmale = c(1302,
1354), popfemale = c(99403, 100941), ddfemale = c(1302, 896
), pop40 = c(56822, 58646), dd40 = c(52, 54), pop41_50 = c(27614,
28256), dd41_50 = c(88, 91), pop51_60 = c(33368, 34111),
dd51_60 = c(384, 400), pop61_70 = c(30477, 30919), dd61_70 = c(683,
705), pop71_80 = c(25418, 25718), dd71_80 = c(630, 655),
pop81_90 = c(14961, 14862), dd81_90 = c(288, 288), pop91_100 = c(7210,
6746), dd91_100 = c(54, 55), pop100 = c(1678, 1466), dd100 = c(1,
2)), row.names = 1:2, class = "data.frame")
在上面的 DF 中,每个年龄类别都有不同的人口列(例如 pop41_50
)和事件列(dd41_50
)。
我想创建一个格式更长的数据框,它将年龄类别作为值放在一列中,并将人口和事件数量也作为值,如下所示:
year popu dd popmale ddmale popfemale ddfemale age_cate pop_age event_age
1 2011 197548 2167 98145 1302 99403 1302 40 56822 52
2 2011 197548 2167 98145 1302 99403 1302 41_50 27614 88
3 2011 197548 2167 98145 1302 99403 1302 51_60 33368 384
4 2011 197548 2167 98145 1302 99403 1302 61_70 30477 683
5 2011 197548 2167 98145 1302 99403 1302 71_80 25418 630
etc.
我尝试了以下脚本,但这将所有内容都放在一列中,这不是我想要的输出。
pivot_longer(df, -c(year, popu, dd), values_to = "number", names_to = "category")
非常感谢!
一个选项是先重命名列,然后在第二个下划线处拆分。
library(tidyverse)
df %>%
rename_with(~ str_replace(., "dd", "event_age_"), dd40:dd100) %>%
rename_with(~ str_replace(., "pop", "pop_age_"), pop40:pop100) %>%
tidyr::pivot_longer(.,
cols = -c("year", "popu", "dd","popmale","ddmale","popfemale","ddfemale"),
names_to = c('.value', 'age_cate'),
names_pattern = "^([^_]*_[^_]*)_(.*)")
输出
year popu dd popmale ddmale popfemale ddfemale age_cate pop_age event_age
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 2011 197548 2167 98145 1302 99403 1302 40 56822 52
2 2011 197548 2167 98145 1302 99403 1302 41_50 27614 88
3 2011 197548 2167 98145 1302 99403 1302 51_60 33368 384
4 2011 197548 2167 98145 1302 99403 1302 61_70 30477 683
5 2011 197548 2167 98145 1302 99403 1302 71_80 25418 630
6 2011 197548 2167 98145 1302 99403 1302 81_90 14961 288
7 2011 197548 2167 98145 1302 99403 1302 91_100 7210 54
8 2011 197548 2167 98145 1302 99403 1302 100 1678 1
9 2012 200724 2250 99783 1354 100941 896 40 58646 54
10 2012 200724 2250 99783 1354 100941 896 41_50 28256 91
11 2012 200724 2250 99783 1354 100941 896 51_60 34111 400
12 2012 200724 2250 99783 1354 100941 896 61_70 30919 705
13 2012 200724 2250 99783 1354 100941 896 71_80 25718 655
14 2012 200724 2250 99783 1354 100941 896 81_90 14862 288
15 2012 200724 2250 99783 1354 100941 896 91_100 6746 55
16 2012 200724 2250 99783 1354 100941 896 100 1466 2
我有一个包含 23 列的宽数据集。我想 select 某些列并将它们调整为行(长格式),但只有这些 selected 列。这是我的数据集的示例:
# A tibble: 2 x 23
year popu dd popmale ddmale popfemale ddfemale pop40 dd40 pop41_50 dd41_50 pop51_60 dd51_60 pop61_70 dd61_70 pop71_80 dd71_80 pop81_90
1 2011 197548 2167 98145 1302 99403 1302 56822 52 27614 88 33368 384 30477 683 25418 630 14961
2 2012 200724 2250 99783 1354 100941 896 58646 54 28256 91 34111 400 30919 705 25718 655 14862
# ... with 5 more variables: dd81_90 <dbl>, pop91_100 <dbl>, dd91_100 <dbl>, pop100 <dbl>, dd100 <dbl>
df<-structure(list(year = c(2011, 2012), popu = c(197548, 200724),
dd = c(2167, 2250), popmale = c(98145, 99783), ddmale = c(1302,
1354), popfemale = c(99403, 100941), ddfemale = c(1302, 896
), pop40 = c(56822, 58646), dd40 = c(52, 54), pop41_50 = c(27614,
28256), dd41_50 = c(88, 91), pop51_60 = c(33368, 34111),
dd51_60 = c(384, 400), pop61_70 = c(30477, 30919), dd61_70 = c(683,
705), pop71_80 = c(25418, 25718), dd71_80 = c(630, 655),
pop81_90 = c(14961, 14862), dd81_90 = c(288, 288), pop91_100 = c(7210,
6746), dd91_100 = c(54, 55), pop100 = c(1678, 1466), dd100 = c(1,
2)), row.names = 1:2, class = "data.frame")
在上面的 DF 中,每个年龄类别都有不同的人口列(例如 pop41_50
)和事件列(dd41_50
)。
我想创建一个格式更长的数据框,它将年龄类别作为值放在一列中,并将人口和事件数量也作为值,如下所示:
year popu dd popmale ddmale popfemale ddfemale age_cate pop_age event_age
1 2011 197548 2167 98145 1302 99403 1302 40 56822 52
2 2011 197548 2167 98145 1302 99403 1302 41_50 27614 88
3 2011 197548 2167 98145 1302 99403 1302 51_60 33368 384
4 2011 197548 2167 98145 1302 99403 1302 61_70 30477 683
5 2011 197548 2167 98145 1302 99403 1302 71_80 25418 630
etc.
我尝试了以下脚本,但这将所有内容都放在一列中,这不是我想要的输出。
pivot_longer(df, -c(year, popu, dd), values_to = "number", names_to = "category")
非常感谢!
一个选项是先重命名列,然后在第二个下划线处拆分。
library(tidyverse)
df %>%
rename_with(~ str_replace(., "dd", "event_age_"), dd40:dd100) %>%
rename_with(~ str_replace(., "pop", "pop_age_"), pop40:pop100) %>%
tidyr::pivot_longer(.,
cols = -c("year", "popu", "dd","popmale","ddmale","popfemale","ddfemale"),
names_to = c('.value', 'age_cate'),
names_pattern = "^([^_]*_[^_]*)_(.*)")
输出
year popu dd popmale ddmale popfemale ddfemale age_cate pop_age event_age
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 2011 197548 2167 98145 1302 99403 1302 40 56822 52
2 2011 197548 2167 98145 1302 99403 1302 41_50 27614 88
3 2011 197548 2167 98145 1302 99403 1302 51_60 33368 384
4 2011 197548 2167 98145 1302 99403 1302 61_70 30477 683
5 2011 197548 2167 98145 1302 99403 1302 71_80 25418 630
6 2011 197548 2167 98145 1302 99403 1302 81_90 14961 288
7 2011 197548 2167 98145 1302 99403 1302 91_100 7210 54
8 2011 197548 2167 98145 1302 99403 1302 100 1678 1
9 2012 200724 2250 99783 1354 100941 896 40 58646 54
10 2012 200724 2250 99783 1354 100941 896 41_50 28256 91
11 2012 200724 2250 99783 1354 100941 896 51_60 34111 400
12 2012 200724 2250 99783 1354 100941 896 61_70 30919 705
13 2012 200724 2250 99783 1354 100941 896 71_80 25718 655
14 2012 200724 2250 99783 1354 100941 896 81_90 14862 288
15 2012 200724 2250 99783 1354 100941 896 91_100 6746 55
16 2012 200724 2250 99783 1354 100941 896 100 1466 2