如何在 r 中创建多个 pivot_longer() 列?
How to create multiple pivot_longer() columns in r?
我正在处理具有 multiple cumulative field cols
和 multiple daily field cols
的数据集,并使用 pivot_longer
希望将它们转换成:
- single/long
cumulative field col
和
- single/long
daily field col
数据快照
"Updated.On","State","First.Dose.Administered","Second.Dose.Administered","Daily_First_dose","Daily_Second_dose"
2021-07-09,"India",297184419,69894633,2071121,1141754
2021-07-09,"Andaman and Nicobar Islands",164079,51646,795,6336
2021-07-09,"Andhra Pradesh",13372990,3379660,25571,11168
2021-07-09,"Arunachal Pradesh",583021,108393,8423,3430
2021-07-09,"Assam",6462249,1328646,52913,7681
2021-07-09,"Bihar",15741778,2638686,262299,55007
2021-07-09,"Chandigarh",483886,110610,4351,3491
2021-07-09,"Chhattisgarh",7438953,1819709,64193,42342
2021-07-09,"Dadra and Nagar Haveli and Daman and Diu",458057,57874,7122,1745
2021-07-09,"Delhi",6761634,2049318,95352,35848
cols 转换为:
- 俱乐部
First.Dose.Administered, Second.Dose.Administered
列
names_to = "Dose_type"
、values_to = "Dose_Administered"
- 俱乐部
Daily_First_dose, Daily_Second_dose
列到 names_to = "Dose_type_daily"
、values_to = "Dose_Administered_daily"
所需的数据框示例:
Updated.On State Dose_type Dose_Administered Dose_type_daily Dose_Administered_daily
2021-07-09 India First.Dose.Administered 297184419 Daily_First_dose 2071121
2021-07-09 India Second.Dose.Administered 69894633 Daily_Second_dose 1141754
df 可从以下网址下载:
library(tidyverse)
library(lubridate)
file_url1 <- "https://raw.githubusercontent.com/johnsnow09/covid19-df_stack-code/main/df_vaccination_june.csv"
df_vaccination <- read.csv(url(file_url1))
df_vaccination <- df_vaccination %>%
mutate(Updated.On = as.Date(Updated.On))
尝试: 我试过下面的代码但不确定如何创建两个不同的 values_to 列?
df_vaccination %>%
pivot_longer(cols = c(First.Dose.Administered:Second.Dose.Administered,
Daily_First_dose:Daily_Second_dose),
names_to = c("Dose_type","Dose_type_daily"),
values_to = c("Dose_Administered","Dose_Administered_daily"))
我认为你需要 pivot_longer
两次并使用适当的 names_pattern
regex
参数。
library(tidyverse)
df_vaccination %>%
pivot_longer(cols = 3:4, names_to = c('dose_adminstered','.value'),
names_pattern = '(.*)\.(Administered)') %>%
pivot_longer(cols = 3:4, names_to = c('dose_type_daily', '.value'),
names_pattern = '(.*)_(.*)$')
#> # A tibble: 3,552 x 6
#> Updated.On State dose_adminstered Administered dose_type_daily dose
#> <date> <chr> <chr> <int> <chr> <int>
#> 1 2021-07-09 India First.Dose 297184419 Daily_First 2.07e6
#> 2 2021-07-09 India First.Dose 297184419 Daily_Second 1.14e6
#> 3 2021-07-09 India Second.Dose 69894633 Daily_First 2.07e6
#> 4 2021-07-09 India Second.Dose 69894633 Daily_Second 1.14e6
#> 5 2021-07-09 Andaman and ~ First.Dose 164079 Daily_First 7.95e2
#> 6 2021-07-09 Andaman and ~ First.Dose 164079 Daily_Second 6.34e3
#> 7 2021-07-09 Andaman and ~ Second.Dose 51646 Daily_First 7.95e2
#> 8 2021-07-09 Andaman and ~ Second.Dose 51646 Daily_Second 6.34e3
#> 9 2021-07-09 Andhra Prade~ First.Dose 13372990 Daily_First 2.56e4
#> 10 2021-07-09 Andhra Prade~ First.Dose 13372990 Daily_Second 1.12e4
#> # ... with 3,542 more rows
每个州 return 只有 2 行,您可以使用此解决方法
df_vaccination[1:4] %>%
pivot_longer(cols = 3:4, names_to = c('dose_adminstered','.value'),
names_pattern = '(.*)\.(Administered)') %>%
cbind(df_vaccination[5:6] %>% pivot_longer(everything(), names_to = c('dose_type_daily', '.value'),
names_pattern = '(.*)_(.*)$'))
Updated.On State dose_adminstered Administered dose_type_daily dose
1 2021-07-09 India First.Dose 297184419 Daily_First 2071121
2 2021-07-09 India Second.Dose 69894633 Daily_Second 1141754
3 2021-07-09 Andaman and Nicobar Islands First.Dose 164079 Daily_First 795
4 2021-07-09 Andaman and Nicobar Islands Second.Dose 51646 Daily_Second 6336
5 2021-07-09 Andhra Pradesh First.Dose 13372990 Daily_First 25571
6 2021-07-09 Andhra Pradesh Second.Dose 3379660 Daily_Second 11168
7 2021-07-09 Arunachal Pradesh First.Dose 583021 Daily_First 8423
8 2021-07-09 Arunachal Pradesh Second.Dose 108393 Daily_Second 3430
.
.
.
更新:2行解决方案
这是获得两行解决方案的另一种方法:
创建一个长度格式为 First.Dose.Administered
和 Second.Dose.Administered
的数据框
mutate
到 Daily_First_dose
和 Daily_Second_dose
的长格式
library(dplyr)
# create a dataframe with First.Dose.Administered, Second.Dose.Administered
df_vaccination_1 <- df_vaccination %>%
select(First.Dose.Administered, Second.Dose.Administered) %>%
pivot_longer(
cols = c(First.Dose.Administered, Second.Dose.Administered),
names_to = "Dose_type",
values_to = "Dose_Administered"
)
# resulting dataframe using columns of df_vaccination_1
df_vaccination_result <- df_vaccination %>%
select(-First.Dose.Administered, -Second.Dose.Administered) %>%
pivot_longer(
cols = c(Daily_First_dose, Daily_Second_dose),
names_to = "Dose_type_daily",
values_to = "Dose_Administered_daily"
) %>%
mutate(First.Dose.Administered = df_vaccination_1$Dose_type,
Second.Dose.Administered = df_vaccination_1$Dose_Administered)
输出:
A tibble: 1,776 x 6
Updated.On State Dose_type_daily Dose_Administere~ First.Dose.Admin~ Second.Dose.Admi~
<date> <chr> <chr> <int> <chr> <int>
1 2021-07-09 India Daily_First_do~ 2071121 First.Dose.Admin~ 297184419
2 2021-07-09 India Daily_Second_d~ 1141754 Second.Dose.Admi~ 69894633
3 2021-07-09 Andaman ~ Daily_First_do~ 795 First.Dose.Admin~ 164079
4 2021-07-09 Andaman ~ Daily_Second_d~ 6336 Second.Dose.Admi~ 51646
5 2021-07-09 Andhra P~ Daily_First_do~ 25571 First.Dose.Admin~ 13372990
6 2021-07-09 Andhra P~ Daily_Second_d~ 11168 Second.Dose.Admi~ 3379660
7 2021-07-09 Arunacha~ Daily_First_do~ 8423 First.Dose.Admin~ 583021
8 2021-07-09 Arunacha~ Daily_Second_d~ 3430 Second.Dose.Admi~ 108393
9 2021-07-09 Assam Daily_First_do~ 52913 First.Dose.Admin~ 6462249
10 2021-07-09 Assam Daily_Second_d~ 7681 Second.Dose.Admi~ 1328646
# ... with 1,766 more rows
第一个回答:
这是您要找的吗?
df_vaccination %>%
pivot_longer(
cols = c(First.Dose.Administered, Second.Dose.Administered),
names_to = "Dose_type",
values_to = "Dose_Administered"
) %>%
pivot_longer(
cols = c(Daily_First_dose, Daily_Second_dose),
names_to = "Dose_type_daily",
values_to = "Dose_Administered_daily"
)
输出:
Updated.On State Dose_type Dose_Administer~ Dose_type_daily Dose_Administered~
<date> <chr> <chr> <int> <chr> <int>
1 2021-07-09 India First.Dose.Ad~ 297184419 Daily_First_do~ 2071121
2 2021-07-09 India First.Dose.Ad~ 297184419 Daily_Second_d~ 1141754
3 2021-07-09 India Second.Dose.A~ 69894633 Daily_First_do~ 2071121
4 2021-07-09 India Second.Dose.A~ 69894633 Daily_Second_d~ 1141754
5 2021-07-09 Andaman and~ First.Dose.Ad~ 164079 Daily_First_do~ 795
6 2021-07-09 Andaman and~ First.Dose.Ad~ 164079 Daily_Second_d~ 6336
7 2021-07-09 Andaman and~ Second.Dose.A~ 51646 Daily_First_do~ 795
8 2021-07-09 Andaman and~ Second.Dose.A~ 51646 Daily_Second_d~ 6336
9 2021-07-09 Andhra Prad~ First.Dose.Ad~ 13372990 Daily_First_do~ 25571
10 2021-07-09 Andhra Prad~ First.Dose.Ad~ 13372990 Daily_Second_d~ 11168
# ... with 3,542 more rows
我认为你的最终输出应该有 nrow(df_vaccination) * 2
行。
这是一个不可扩展的解决方案,但应该适用于此数据。
df <- df_vaccination
res <- data.frame(Updated.on = rep(df$Updated.On, 2),
State = rep(df$State, 2),
Dose_type = rep(names(df)[3:4], each = nrow(df)),
Dose_Administered = unlist(df[3:4]),
Dose_type_daily = rep(names(df)[5:6], each = nrow(df)),
Dose_Administered_daily = unlist(df[5:6]), row.names = NULL)
res
# Updated.on State Dose_type Dose_Administered
#1 2021-07-09 India First.Dose.Administered 297184419
#2 2021-07-09 Andaman and Nicobar Islands First.Dose.Administered 164079
#3 2021-07-09 Andhra Pradesh First.Dose.Administered 13372990
#4 2021-07-09 Arunachal Pradesh First.Dose.Administered 583021
#5 2021-07-09 Assam First.Dose.Administered 6462249
#6 2021-07-09 Bihar First.Dose.Administered 15741778
# Dose_type_daily Dose_Administered_daily
#1 Daily_First_dose 2071121
#2 Daily_First_dose 795
#3 Daily_First_dose 25571
#4 Daily_First_dose 8423
#5 Daily_First_dose 52913
#6 Daily_First_dose 262299
我正在处理具有 multiple cumulative field cols
和 multiple daily field cols
的数据集,并使用 pivot_longer
希望将它们转换成:
- single/long
cumulative field col
和 - single/long
daily field col
数据快照
"Updated.On","State","First.Dose.Administered","Second.Dose.Administered","Daily_First_dose","Daily_Second_dose"
2021-07-09,"India",297184419,69894633,2071121,1141754
2021-07-09,"Andaman and Nicobar Islands",164079,51646,795,6336
2021-07-09,"Andhra Pradesh",13372990,3379660,25571,11168
2021-07-09,"Arunachal Pradesh",583021,108393,8423,3430
2021-07-09,"Assam",6462249,1328646,52913,7681
2021-07-09,"Bihar",15741778,2638686,262299,55007
2021-07-09,"Chandigarh",483886,110610,4351,3491
2021-07-09,"Chhattisgarh",7438953,1819709,64193,42342
2021-07-09,"Dadra and Nagar Haveli and Daman and Diu",458057,57874,7122,1745
2021-07-09,"Delhi",6761634,2049318,95352,35848
cols 转换为:
- 俱乐部
First.Dose.Administered, Second.Dose.Administered
列names_to = "Dose_type"
、values_to = "Dose_Administered"
- 俱乐部
Daily_First_dose, Daily_Second_dose
列到names_to = "Dose_type_daily"
、values_to = "Dose_Administered_daily"
所需的数据框示例:
Updated.On State Dose_type Dose_Administered Dose_type_daily Dose_Administered_daily
2021-07-09 India First.Dose.Administered 297184419 Daily_First_dose 2071121
2021-07-09 India Second.Dose.Administered 69894633 Daily_Second_dose 1141754
df 可从以下网址下载:
library(tidyverse)
library(lubridate)
file_url1 <- "https://raw.githubusercontent.com/johnsnow09/covid19-df_stack-code/main/df_vaccination_june.csv"
df_vaccination <- read.csv(url(file_url1))
df_vaccination <- df_vaccination %>%
mutate(Updated.On = as.Date(Updated.On))
尝试: 我试过下面的代码但不确定如何创建两个不同的 values_to 列?
df_vaccination %>%
pivot_longer(cols = c(First.Dose.Administered:Second.Dose.Administered,
Daily_First_dose:Daily_Second_dose),
names_to = c("Dose_type","Dose_type_daily"),
values_to = c("Dose_Administered","Dose_Administered_daily"))
我认为你需要 pivot_longer
两次并使用适当的 names_pattern
regex
参数。
library(tidyverse)
df_vaccination %>%
pivot_longer(cols = 3:4, names_to = c('dose_adminstered','.value'),
names_pattern = '(.*)\.(Administered)') %>%
pivot_longer(cols = 3:4, names_to = c('dose_type_daily', '.value'),
names_pattern = '(.*)_(.*)$')
#> # A tibble: 3,552 x 6
#> Updated.On State dose_adminstered Administered dose_type_daily dose
#> <date> <chr> <chr> <int> <chr> <int>
#> 1 2021-07-09 India First.Dose 297184419 Daily_First 2.07e6
#> 2 2021-07-09 India First.Dose 297184419 Daily_Second 1.14e6
#> 3 2021-07-09 India Second.Dose 69894633 Daily_First 2.07e6
#> 4 2021-07-09 India Second.Dose 69894633 Daily_Second 1.14e6
#> 5 2021-07-09 Andaman and ~ First.Dose 164079 Daily_First 7.95e2
#> 6 2021-07-09 Andaman and ~ First.Dose 164079 Daily_Second 6.34e3
#> 7 2021-07-09 Andaman and ~ Second.Dose 51646 Daily_First 7.95e2
#> 8 2021-07-09 Andaman and ~ Second.Dose 51646 Daily_Second 6.34e3
#> 9 2021-07-09 Andhra Prade~ First.Dose 13372990 Daily_First 2.56e4
#> 10 2021-07-09 Andhra Prade~ First.Dose 13372990 Daily_Second 1.12e4
#> # ... with 3,542 more rows
每个州 return 只有 2 行,您可以使用此解决方法
df_vaccination[1:4] %>%
pivot_longer(cols = 3:4, names_to = c('dose_adminstered','.value'),
names_pattern = '(.*)\.(Administered)') %>%
cbind(df_vaccination[5:6] %>% pivot_longer(everything(), names_to = c('dose_type_daily', '.value'),
names_pattern = '(.*)_(.*)$'))
Updated.On State dose_adminstered Administered dose_type_daily dose
1 2021-07-09 India First.Dose 297184419 Daily_First 2071121
2 2021-07-09 India Second.Dose 69894633 Daily_Second 1141754
3 2021-07-09 Andaman and Nicobar Islands First.Dose 164079 Daily_First 795
4 2021-07-09 Andaman and Nicobar Islands Second.Dose 51646 Daily_Second 6336
5 2021-07-09 Andhra Pradesh First.Dose 13372990 Daily_First 25571
6 2021-07-09 Andhra Pradesh Second.Dose 3379660 Daily_Second 11168
7 2021-07-09 Arunachal Pradesh First.Dose 583021 Daily_First 8423
8 2021-07-09 Arunachal Pradesh Second.Dose 108393 Daily_Second 3430
.
.
.
更新:2行解决方案 这是获得两行解决方案的另一种方法:
创建一个长度格式为
的数据框First.Dose.Administered
和Second.Dose.Administered
的长格式mutate
到Daily_First_dose
和Daily_Second_dose
library(dplyr)
# create a dataframe with First.Dose.Administered, Second.Dose.Administered
df_vaccination_1 <- df_vaccination %>%
select(First.Dose.Administered, Second.Dose.Administered) %>%
pivot_longer(
cols = c(First.Dose.Administered, Second.Dose.Administered),
names_to = "Dose_type",
values_to = "Dose_Administered"
)
# resulting dataframe using columns of df_vaccination_1
df_vaccination_result <- df_vaccination %>%
select(-First.Dose.Administered, -Second.Dose.Administered) %>%
pivot_longer(
cols = c(Daily_First_dose, Daily_Second_dose),
names_to = "Dose_type_daily",
values_to = "Dose_Administered_daily"
) %>%
mutate(First.Dose.Administered = df_vaccination_1$Dose_type,
Second.Dose.Administered = df_vaccination_1$Dose_Administered)
输出:
A tibble: 1,776 x 6
Updated.On State Dose_type_daily Dose_Administere~ First.Dose.Admin~ Second.Dose.Admi~
<date> <chr> <chr> <int> <chr> <int>
1 2021-07-09 India Daily_First_do~ 2071121 First.Dose.Admin~ 297184419
2 2021-07-09 India Daily_Second_d~ 1141754 Second.Dose.Admi~ 69894633
3 2021-07-09 Andaman ~ Daily_First_do~ 795 First.Dose.Admin~ 164079
4 2021-07-09 Andaman ~ Daily_Second_d~ 6336 Second.Dose.Admi~ 51646
5 2021-07-09 Andhra P~ Daily_First_do~ 25571 First.Dose.Admin~ 13372990
6 2021-07-09 Andhra P~ Daily_Second_d~ 11168 Second.Dose.Admi~ 3379660
7 2021-07-09 Arunacha~ Daily_First_do~ 8423 First.Dose.Admin~ 583021
8 2021-07-09 Arunacha~ Daily_Second_d~ 3430 Second.Dose.Admi~ 108393
9 2021-07-09 Assam Daily_First_do~ 52913 First.Dose.Admin~ 6462249
10 2021-07-09 Assam Daily_Second_d~ 7681 Second.Dose.Admi~ 1328646
# ... with 1,766 more rows
第一个回答:
这是您要找的吗?
df_vaccination %>%
pivot_longer(
cols = c(First.Dose.Administered, Second.Dose.Administered),
names_to = "Dose_type",
values_to = "Dose_Administered"
) %>%
pivot_longer(
cols = c(Daily_First_dose, Daily_Second_dose),
names_to = "Dose_type_daily",
values_to = "Dose_Administered_daily"
)
输出:
Updated.On State Dose_type Dose_Administer~ Dose_type_daily Dose_Administered~
<date> <chr> <chr> <int> <chr> <int>
1 2021-07-09 India First.Dose.Ad~ 297184419 Daily_First_do~ 2071121
2 2021-07-09 India First.Dose.Ad~ 297184419 Daily_Second_d~ 1141754
3 2021-07-09 India Second.Dose.A~ 69894633 Daily_First_do~ 2071121
4 2021-07-09 India Second.Dose.A~ 69894633 Daily_Second_d~ 1141754
5 2021-07-09 Andaman and~ First.Dose.Ad~ 164079 Daily_First_do~ 795
6 2021-07-09 Andaman and~ First.Dose.Ad~ 164079 Daily_Second_d~ 6336
7 2021-07-09 Andaman and~ Second.Dose.A~ 51646 Daily_First_do~ 795
8 2021-07-09 Andaman and~ Second.Dose.A~ 51646 Daily_Second_d~ 6336
9 2021-07-09 Andhra Prad~ First.Dose.Ad~ 13372990 Daily_First_do~ 25571
10 2021-07-09 Andhra Prad~ First.Dose.Ad~ 13372990 Daily_Second_d~ 11168
# ... with 3,542 more rows
我认为你的最终输出应该有 nrow(df_vaccination) * 2
行。
这是一个不可扩展的解决方案,但应该适用于此数据。
df <- df_vaccination
res <- data.frame(Updated.on = rep(df$Updated.On, 2),
State = rep(df$State, 2),
Dose_type = rep(names(df)[3:4], each = nrow(df)),
Dose_Administered = unlist(df[3:4]),
Dose_type_daily = rep(names(df)[5:6], each = nrow(df)),
Dose_Administered_daily = unlist(df[5:6]), row.names = NULL)
res
# Updated.on State Dose_type Dose_Administered
#1 2021-07-09 India First.Dose.Administered 297184419
#2 2021-07-09 Andaman and Nicobar Islands First.Dose.Administered 164079
#3 2021-07-09 Andhra Pradesh First.Dose.Administered 13372990
#4 2021-07-09 Arunachal Pradesh First.Dose.Administered 583021
#5 2021-07-09 Assam First.Dose.Administered 6462249
#6 2021-07-09 Bihar First.Dose.Administered 15741778
# Dose_type_daily Dose_Administered_daily
#1 Daily_First_dose 2071121
#2 Daily_First_dose 795
#3 Daily_First_dose 25571
#4 Daily_First_dose 8423
#5 Daily_First_dose 52913
#6 Daily_First_dose 262299