如何将具有不同名称的列的宽格式转换为长格式?
How to pivot wide to long format for columns with different names?
我有一个宽 df,我想将其转换为基于多列的长格式。例如,所有带有“Type”的列都将在一个单独的列中,所有带有“Finding”的列都将在一个单独的列中。我相信 dplyr
是最好的方法,但没有运气 pivot_longer
.
初始df
structure(list(Date = structure(c(1648512000, 1648598400, 1648166400, 1648166400),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Site = c("A", "B", "C", "D"),
`Finding?` = c("Yes", "Yes", "Yes", "Yes"),
`Topic Area` = c("A", "B", "C", "D"),
`Type` = c("1", "2", "3", "4"),
`Risk Ranking` = c("Medium", "Low", "Medium", "Medium"),
`Additional Finding?` = c("Yes", "Yes", "Yes", "Yes"),
`Topic Area2` = c("A", "B", "C", "D"),
`Type2` = c("1", "2", "2", "3"),
`Risk Ranking2` = c("Medium", "Medium", "Low", "Medium")),
row.names = c(NA, -4L),
class = c("tbl_df", "tbl", "data.frame"))
期望的输出
data.frame(Date = structure(c(1648512000, 1648598400, 1648166400, 1648166400, 1648512000, 1648598400, 1648166400, 1648166400),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Site = c("A", "B", "C", "D", "A", "B", "C", "D"),
"Finding?" = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"),
"Topic Area" = c("A", "B", "C", "D", "A", "B", "C", "D"),
`Type` = c("1", "2", "3", "4", "1", "2", "2", "3"),
"Risk Ranking" = c("Medium", "Low", "Medium", "Medium", "Medium", "Medium", "Low", "Medium"))
相应的一组列中的一个,即 Finding,'Additional Finding' 与其他列略有不同,因为这些列名组的末尾值为 2。因此,我们只更改'Finding'列的后缀部分,然后使用pivot_longer
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
rename_with(~ str_c("Finding", seq_along(.x)), contains("Finding")) %>%
pivot_longer(cols = -c(Date, Site), names_to = c(".value"),
names_pattern = "(\D+)\d*$", values_drop_na = TRUE)
-输出
# A tibble: 8 × 6
Date Site Finding `Topic Area` Type `Risk Ranking`
<dttm> <chr> <chr> <chr> <chr> <chr>
1 2022-03-29 00:00:00 A Yes A 1 Medium
2 2022-03-29 00:00:00 A Yes A 1 Medium
3 2022-03-30 00:00:00 B Yes B 2 Low
4 2022-03-30 00:00:00 B Yes B 2 Medium
5 2022-03-25 00:00:00 C Yes C 3 Medium
6 2022-03-25 00:00:00 C Yes C 2 Low
7 2022-03-25 00:00:00 D Yes D 4 Medium
8 2022-03-25 00:00:00 D Yes D 3 Medium
我有一个宽 df,我想将其转换为基于多列的长格式。例如,所有带有“Type”的列都将在一个单独的列中,所有带有“Finding”的列都将在一个单独的列中。我相信 dplyr
是最好的方法,但没有运气 pivot_longer
.
初始df
structure(list(Date = structure(c(1648512000, 1648598400, 1648166400, 1648166400),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Site = c("A", "B", "C", "D"),
`Finding?` = c("Yes", "Yes", "Yes", "Yes"),
`Topic Area` = c("A", "B", "C", "D"),
`Type` = c("1", "2", "3", "4"),
`Risk Ranking` = c("Medium", "Low", "Medium", "Medium"),
`Additional Finding?` = c("Yes", "Yes", "Yes", "Yes"),
`Topic Area2` = c("A", "B", "C", "D"),
`Type2` = c("1", "2", "2", "3"),
`Risk Ranking2` = c("Medium", "Medium", "Low", "Medium")),
row.names = c(NA, -4L),
class = c("tbl_df", "tbl", "data.frame"))
期望的输出
data.frame(Date = structure(c(1648512000, 1648598400, 1648166400, 1648166400, 1648512000, 1648598400, 1648166400, 1648166400),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Site = c("A", "B", "C", "D", "A", "B", "C", "D"),
"Finding?" = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"),
"Topic Area" = c("A", "B", "C", "D", "A", "B", "C", "D"),
`Type` = c("1", "2", "3", "4", "1", "2", "2", "3"),
"Risk Ranking" = c("Medium", "Low", "Medium", "Medium", "Medium", "Medium", "Low", "Medium"))
相应的一组列中的一个,即 Finding,'Additional Finding' 与其他列略有不同,因为这些列名组的末尾值为 2。因此,我们只更改'Finding'列的后缀部分,然后使用pivot_longer
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
rename_with(~ str_c("Finding", seq_along(.x)), contains("Finding")) %>%
pivot_longer(cols = -c(Date, Site), names_to = c(".value"),
names_pattern = "(\D+)\d*$", values_drop_na = TRUE)
-输出
# A tibble: 8 × 6
Date Site Finding `Topic Area` Type `Risk Ranking`
<dttm> <chr> <chr> <chr> <chr> <chr>
1 2022-03-29 00:00:00 A Yes A 1 Medium
2 2022-03-29 00:00:00 A Yes A 1 Medium
3 2022-03-30 00:00:00 B Yes B 2 Low
4 2022-03-30 00:00:00 B Yes B 2 Medium
5 2022-03-25 00:00:00 C Yes C 3 Medium
6 2022-03-25 00:00:00 C Yes C 2 Low
7 2022-03-25 00:00:00 D Yes D 4 Medium
8 2022-03-25 00:00:00 D Yes D 3 Medium