使用 R/Regex 根据数字位置从数字代码中提取变量

Question

我一直在努力寻找一种可靠而简洁的方法来重新编码一个变量，该变量是一个 4 位数字代码，表示其他变量的一些组合，我们现在可以说二进制。这些变量是：

位置：1 = 北，2 = 南
性别：1=男，2=女
工作：1= driver, 2= 施工
收入：1=高，2=低

比如一个1111编码的变量表示：North,male,driver,high

数据的R代码如下：

    library(tidyselect)
    library(tidyverse)
    library(dplyr)
    
    location <- c("North", "South")
    sex <- c("male", "female")
    job <- c("driver", "construction")
    income <- c("high, "low") 
    
    dt <- tibble(data= c(1112,1212,1122,1221))

# A tibble: 4 × 1
   data
  <dbl>
1  1112
2  1212
3  1133
4  1231

我想重新编码此列以获得最终输出

# A tibble: 4 × 1
  data                          
  <chr>                         
1 North,male,driver,high        
2 North,female,driver,low       
3 North,male,construction,low   
4 North,female,construction,high

我尝试了 str_extract 的各种组合，希望将正则表达式用于数字位置，然后 ifelse 或 case_when 尝试，但它要么不起作用，要么体积庞大且多余真实数据集（有 4 个数字代码，每个数字位置最多 9 个实际其他字符）

Answer 1

我们可以创建一个 list 命名向量，然后进行匹配

library(dplyr)
library(tidyr)
lst1 <- list(location = c(`1` = 'North', `2` = 'South'),
   sex = c(`1` = 'male', `2` = 'female'), job = c(`1` = 'driver', `2` = 'construction'), income = c(`1` = 'high', `2` = 'low'))
 dt %>% 
  separate(data, into = c('location', 'sex', 'job', 'income'),
       sep = "(?<=\d)(?=\d)") %>%
   mutate(across(everything(), ~ lst1[[cur_column()]][.x])) %>% 
   unite(data, everything(), sep = ",")

-输出

# A tibble: 4 × 1
  data                          
  <chr>                         
1 North,male,driver,low         
2 North,female,driver,low       
3 North,male,construction,low   
4 North,female,construction,high

使用 R/Regex 根据数字位置从数字代码中提取变量

Extract variable from numerical code based on digit location with R/Regex

r

extract

tidyverse