使用 R/Regex 根据数字位置从数字代码中提取变量

Extract variable from numerical code based on digit location with R/Regex

我一直在努力寻找一种可靠而简洁的方法来重新编码一个变量,该变量是一个 4 位数字代码,表示其他变量的一些组合,我们现在可以说二进制。 这些变量是:


位置:1 = 北,2 = 南
性别:1=男,2=女
工作:1= driver, 2= 施工
收入:1=高,2=低

比如一个1111编码的变量表示:North,male,driver,high

数据的R代码如下:

    library(tidyselect)
    library(tidyverse)
    library(dplyr)
    
    location <- c("North", "South")
    sex <- c("male", "female")
    job <- c("driver", "construction")
    income <- c("high, "low") 
    
    dt <- tibble(data= c(1112,1212,1122,1221))

# A tibble: 4 × 1
   data
  <dbl>
1  1112
2  1212
3  1133
4  1231

我想重新编码此列以获得最终输出

# A tibble: 4 × 1
  data                          
  <chr>                         
1 North,male,driver,high        
2 North,female,driver,low       
3 North,male,construction,low   
4 North,female,construction,high

我尝试了 str_extract 的各种组合,希望将正则表达式用于数字位置,然后 ifelsecase_when 尝试,但它要么不起作用,要么体积庞大且多余真实数据集(有 4 个数字代码,每个数字位置最多 9 个实际其他字符)

我们可以创建一个 list 命名向量,然后进行匹配

library(dplyr)
library(tidyr)
lst1 <- list(location = c(`1` = 'North', `2` = 'South'),
   sex = c(`1` = 'male', `2` = 'female'), job = c(`1` = 'driver', `2` = 'construction'), income = c(`1` = 'high', `2` = 'low'))
 dt %>% 
  separate(data, into = c('location', 'sex', 'job', 'income'),
       sep = "(?<=\d)(?=\d)") %>%
   mutate(across(everything(), ~ lst1[[cur_column()]][.x])) %>% 
   unite(data, everything(), sep = ",")

-输出

# A tibble: 4 × 1
  data                          
  <chr>                         
1 North,male,driver,low         
2 North,female,driver,low       
3 North,male,construction,low   
4 North,female,construction,high