pivot_wider 具有复杂名称 R 的数据框
pivot_wider a dataframe with complex names R
所以我有一个如下所示的数据框:
datInput <- tibble(id = 1:2,
c.0.opt = c("a,b", "c,d"),
c.0.optI = c("1,2", "3,4"),
c.0.sel = c("a", "c"),
c.1.opt = c("e,f", "g,h"),
c.1.optI = c("5,6", "7,8"),
c.1.sel = c("e", "g"))
datInput
# id c.0.opt c.0.optI c.0.sel c.1.opt c.1.optI c.1.sel
#1 1 a,b 1,2 a e,f 5,6 e
#2 2 c,d 3,4 c g,h 7,8 g
我需要它看起来像这样:
datOutput <- tibble(id = c(1,1,2,2),
c_opt = c("a,b", "e,f", "c,d", "g,h"),
c_optI = c("1,2", "5,6", "3,4", "7,8"),
c_sel = c("a", "e", "c", "g"))
# id c_opt c_optI c_sel
#1 1 a,b 1,2 a
#2 1 e,f 5,6 e
#3 2 c,d 3,4 c
#4 2 g,h 7,8 g
我通常使用 dplyr::pivot_longer
来完成这类任务,但我不知道如何处理那些复杂的列名,行标识符在中间。有办法吗?
谢谢
datInput %>%
gather(colname, val,-1 ) %>%
mutate(colname = gsub("\.\d\.","_",colname)) %>%
pivot_wider(id_cols = id, names_from = colname, values_from = val, values_fn = list) %>%
unnest(cols = c(colnames(.)))
# A tibble: 4 x 4
id c_opt c_optI c_sel
<int> <chr> <chr> <chr>
1 1 a,b 1,2 a
2 1 e,f 5,6 e
3 2 c,d 3,4 c
4 2 g,h 7,8 g
我们也可以使用 pivot_longer
和 names_sep
作为正则表达式查找来匹配列名中数字
之后的 .
library(dplyr)
library(tidyr)
library(stringr)
pivot_longer(datInput, cols = -id, names_to = c("grp", ".value"),
names_sep = "(?<=\d)\.") %>%
select(-grp) %>%
rename_with(~ str_c('c_', .), -id)
# A tibble: 4 x 4
# id c_opt c_optI c_sel
# <int> <chr> <chr> <chr>
#1 1 a,b 1,2 a
#2 1 e,f 5,6 e
#3 2 c,d 3,4 c
#4 2 g,h 7,8 g
我用 zimia 的评论修改了 Akrun 的回答,如下所示:
datOutput <- datInput %>%
pivot_longer(-id, names_to = "colname", values_to = "val") %>%
mutate(colname = gsub("\.\d\.","_",colname)) %>%
pivot_wider(id_cols = id, names_from = colname, values_from = val, values_fn = list) %>%
unnest(cols = c(colnames(.)))
效果很好。谢谢你们。
所以我有一个如下所示的数据框:
datInput <- tibble(id = 1:2,
c.0.opt = c("a,b", "c,d"),
c.0.optI = c("1,2", "3,4"),
c.0.sel = c("a", "c"),
c.1.opt = c("e,f", "g,h"),
c.1.optI = c("5,6", "7,8"),
c.1.sel = c("e", "g"))
datInput
# id c.0.opt c.0.optI c.0.sel c.1.opt c.1.optI c.1.sel
#1 1 a,b 1,2 a e,f 5,6 e
#2 2 c,d 3,4 c g,h 7,8 g
我需要它看起来像这样:
datOutput <- tibble(id = c(1,1,2,2),
c_opt = c("a,b", "e,f", "c,d", "g,h"),
c_optI = c("1,2", "5,6", "3,4", "7,8"),
c_sel = c("a", "e", "c", "g"))
# id c_opt c_optI c_sel
#1 1 a,b 1,2 a
#2 1 e,f 5,6 e
#3 2 c,d 3,4 c
#4 2 g,h 7,8 g
我通常使用 dplyr::pivot_longer
来完成这类任务,但我不知道如何处理那些复杂的列名,行标识符在中间。有办法吗?
谢谢
datInput %>%
gather(colname, val,-1 ) %>%
mutate(colname = gsub("\.\d\.","_",colname)) %>%
pivot_wider(id_cols = id, names_from = colname, values_from = val, values_fn = list) %>%
unnest(cols = c(colnames(.)))
# A tibble: 4 x 4
id c_opt c_optI c_sel
<int> <chr> <chr> <chr>
1 1 a,b 1,2 a
2 1 e,f 5,6 e
3 2 c,d 3,4 c
4 2 g,h 7,8 g
我们也可以使用 pivot_longer
和 names_sep
作为正则表达式查找来匹配列名中数字
.
library(dplyr)
library(tidyr)
library(stringr)
pivot_longer(datInput, cols = -id, names_to = c("grp", ".value"),
names_sep = "(?<=\d)\.") %>%
select(-grp) %>%
rename_with(~ str_c('c_', .), -id)
# A tibble: 4 x 4
# id c_opt c_optI c_sel
# <int> <chr> <chr> <chr>
#1 1 a,b 1,2 a
#2 1 e,f 5,6 e
#3 2 c,d 3,4 c
#4 2 g,h 7,8 g
我用 zimia 的评论修改了 Akrun 的回答,如下所示:
datOutput <- datInput %>%
pivot_longer(-id, names_to = "colname", values_to = "val") %>%
mutate(colname = gsub("\.\d\.","_",colname)) %>%
pivot_wider(id_cols = id, names_from = colname, values_from = val, values_fn = list) %>%
unnest(cols = c(colnames(.)))
效果很好。谢谢你们。