使用指定分组变量级别的键重塑宽长

reshape wide long with key specifying levels of grouping variable

我有一个数据集

structure(list(group = c("A", "B", "A", "B", "B", "A", "A"), 
    technique = c("attack", "defenese", "attack ", "defense ", 
    "defense ", "attack", "defense "), outcome1.part1 = c(24L, 
    1234L, 15L, 234L, 23L, 3L, 3L), outcome1.part.2 = c(52L, 
    321L, 23L, 234L, 234L, 145L, 145L), outcome1.part.3 = c(14L, 
    23L, 3L, 2L, 234L, 234L, 234L), outcome2.part.1 = c(14L, 
    234L, 145L, 4L, 234L, 145L, 145L), outcome2.part.2 = c(234L, 
    234L, 234L, 234L, 234L, 234L, 234L), outcome2.part.3 = c(234L, 
    234L, 234L, 234L, 145L, 145L, 145L)), class = "data.frame", row.names = c(NA, 
-7L))

数据集需要转换为长格式,但我想制作一个向量,从指定的键中查找任何短语,然后决定分配哪个级别的分组变量。

所以在列 outcome1.part.2 中说我想将其转换为长格式并创建一个名为 strata 的列,该列使用列名称中找到的键的任何值。所以关键是 c("part.1", "part.2", "part.3") 它会像这样转换一行。

至此

我不想使用正则表达式解决方案,因为我想灵活地更改键中的值,而无需为分组变量的每个级别找出新的正则表达式解决方案。

我们可以使用pivot_longer

library(dplyr)
library(tidyr)
library(stringr)
v1 <- c("part.1", "part.2", "part.3")
pat <- sprintf("^(outcome\d*).*(%s).*$", str_c(v1, collapse="|"))
df1 %>% 
   pivot_longer(cols = starts_with('outcome'), 
    names_to = c(".value", "strata"), 
     names_pattern = pat)

-输出

# A tibble: 21 × 5
   group technique  strata outcome1 outcome2
   <chr> <chr>      <chr>     <int>    <int>
 1 A     "attack"   part.1       24       14
 2 A     "attack"   part.2       52      234
 3 A     "attack"   part.3       14      234
 4 B     "defenese" part.1     1234      234
 5 B     "defenese" part.2      321      234
 6 B     "defenese" part.3       23      234
 7 A     "attack "  part.1       15      145
 8 A     "attack "  part.2       23      234
 9 A     "attack "  part.3        3      234
10 B     "defense " part.1      234        4
# … with 11 more rows

注意:列名中有一个拼写错误,即第三列应该是

names(df1)[3] <- 'outcome1.part.1'

没有正则表达式的解决方案,利用 separate

中的 extra = "merge" 参数
library(dplyr)
library(tidyr)
df %>% 
  mutate(id = row_number()) %>% 
  pivot_longer(
    cols = -c(id, group, technique)
  ) %>% 
  separate(name, into=c('outcome', 'strata'), extra = "merge") %>% 
  pivot_wider(
    names_from = outcome,
    values_from = value,
  ) %>% 
  select(-id)
   group technique  strata outcome1 outcome2
   <chr> <chr>      <chr>     <int>    <int>
 1 A     "attack"   part.1       24       14
 2 A     "attack"   part.2       52      234
 3 A     "attack"   part.3       14      234
 4 B     "defenese" part.1     1234      234
 5 B     "defenese" part.2      321      234
 6 B     "defenese" part.3       23      234
 7 A     "attack "  part.1       15      145
 8 A     "attack "  part.2       23      234
 9 A     "attack "  part.3        3      234
10 B     "defense " part.1      234        4
# ... with 11 more rows