将具有多列的宽数据集转换为长格式
Convert wide dataset with many columns to long format
我有下面的数据框
DF<-structure(list(Channel_title = c("Blippi Toys", "PewDiePie"),
Kids_13_and_under = c("TRUE", "FALSE"), `Teens_13-17` = c("FALSE",
"TRUE"), Adults_17_and_older = c("FALSE", "FALSE"), `2018-03` = c(101163664,
222435612), `2018-04` = c(138215454, 217690060), `2018-05` = c(151697886,
254479194), `2018-06` = c(142169001, 297190999)), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame"))
我想将其转换成如下所示:
Channel_title Month Views_for_that_month Kids_13_and_under Teens_13_17 Adults_17_and_older
1 Blippi Toys 2018-03 101163664 TRUE FALSE FALSE
2 Blippi Toys 2018-04 138215454 TRUE FALSE FALSE
3 Blippi Toys 2018-05 151697886 TRUE FALSE FALSE
4 Blippi Toys 2018-06 142169001 TRUE FALSE FALSE
5 PewDiePie 2018-03 222435612 FALSE TRUE FALSE
6 PewDiePie 2018-04 217690060 FALSE TRUE FALSE
7 PewDiePie 2018-05 254479194 FALSE TRUE FALSE
8 PewDiePie 2018-06 297190999 FALSE TRUE FALSE
类似
long <- melt(setDT(DF), id.vars = colnames(DF), variable.name = "Channel_title")
我的实际数据有数百个月的列
library(tidyr)
cols <- grep("\d{4}-\d{2}", names(DF), value = TRUE)
pivot_longer(DF, cols = cols, names_to = "Month", values_to = "Views_for_that_month")
结果:
Channel_title Kids_13_and_under `Teens_13-17` Adults_17_and_older Month Views_for_that_month
<chr> <chr> <chr> <chr> <chr> <dbl>
1 Blippi Toys TRUE FALSE FALSE 2018-03 101163664
2 Blippi Toys TRUE FALSE FALSE 2018-04 138215454
3 Blippi Toys TRUE FALSE FALSE 2018-05 151697886
4 Blippi Toys TRUE FALSE FALSE 2018-06 142169001
5 PewDiePie FALSE TRUE FALSE 2018-03 222435612
6 PewDiePie FALSE TRUE FALSE 2018-04 217690060
7 PewDiePie FALSE TRUE FALSE 2018-05 254479194
8 PewDiePie FALSE TRUE FALSE 2018-06 297190999
我有下面的数据框
DF<-structure(list(Channel_title = c("Blippi Toys", "PewDiePie"),
Kids_13_and_under = c("TRUE", "FALSE"), `Teens_13-17` = c("FALSE",
"TRUE"), Adults_17_and_older = c("FALSE", "FALSE"), `2018-03` = c(101163664,
222435612), `2018-04` = c(138215454, 217690060), `2018-05` = c(151697886,
254479194), `2018-06` = c(142169001, 297190999)), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame"))
我想将其转换成如下所示:
Channel_title Month Views_for_that_month Kids_13_and_under Teens_13_17 Adults_17_and_older
1 Blippi Toys 2018-03 101163664 TRUE FALSE FALSE
2 Blippi Toys 2018-04 138215454 TRUE FALSE FALSE
3 Blippi Toys 2018-05 151697886 TRUE FALSE FALSE
4 Blippi Toys 2018-06 142169001 TRUE FALSE FALSE
5 PewDiePie 2018-03 222435612 FALSE TRUE FALSE
6 PewDiePie 2018-04 217690060 FALSE TRUE FALSE
7 PewDiePie 2018-05 254479194 FALSE TRUE FALSE
8 PewDiePie 2018-06 297190999 FALSE TRUE FALSE
类似
long <- melt(setDT(DF), id.vars = colnames(DF), variable.name = "Channel_title")
我的实际数据有数百个月的列
library(tidyr)
cols <- grep("\d{4}-\d{2}", names(DF), value = TRUE)
pivot_longer(DF, cols = cols, names_to = "Month", values_to = "Views_for_that_month")
结果:
Channel_title Kids_13_and_under `Teens_13-17` Adults_17_and_older Month Views_for_that_month
<chr> <chr> <chr> <chr> <chr> <dbl>
1 Blippi Toys TRUE FALSE FALSE 2018-03 101163664
2 Blippi Toys TRUE FALSE FALSE 2018-04 138215454
3 Blippi Toys TRUE FALSE FALSE 2018-05 151697886
4 Blippi Toys TRUE FALSE FALSE 2018-06 142169001
5 PewDiePie FALSE TRUE FALSE 2018-03 222435612
6 PewDiePie FALSE TRUE FALSE 2018-04 217690060
7 PewDiePie FALSE TRUE FALSE 2018-05 254479194
8 PewDiePie FALSE TRUE FALSE 2018-06 297190999