对于每次 id1 在 id3 中的字符串中,将 id2 部分放入新列

For every time id1 is within the string in id3, put the id2 part in the new column

我想为此数据框生成一个附加列,其中包含一些附加信息:每次 id1 在 id3 中的字符串中时,将此部分替换为 id2 中的对应部分:

 library(tidyverse)



    df1 <- tibble(
         id1 = c("119930", "124659", "114679", "119934", "126821", "124679", "119842", "134863", "133678", "133675"),
         id2 = c("AB1", "AB2", "AB3", "AB5", "AB4", "AB0", "DF1", "FR3", "GTA5", "BA1"),
         id3 = c("", "119934;126821;124679", "", "", "", "119842",  "", "", "", "133678;119930"))

这是我的:

    > df1
# A tibble: 10 x 3
   id1    id2   id3                   
   <chr>  <chr> <chr>                 
 1 119930 AB1   ""                    
 2 124659 AB2   "119934;126821;124679"
 3 114679 AB3   ""                    
 4 119934 AB5   ""                    
 5 126821 AB4   ""                    
 6 124679 AB0   "119842"       
 7 119842 DF1   ""                    
 8 134863 FR3   ""                    
 9 133678 GTA5  ""                    
10 133675 BA1   "133678;119930" 

我需要的结果:

    > df1
# A tibble: 10 x 4
   id1    id2   id3                    id4  
   <chr>  <chr> <chr>                  <chr>
 1 119930 AB1   ""                     ""   
 2 124659 AB2   "119934;126821;124679" "AB5;AB4;AB0"   
 3 114679 AB3   ""                     ""   
 4 119934 AB5   ""                     ""   
 5 126821 AB4   ""                     ""   
 6 124679 AB0   ""                     ""   
 7 119842 DF1   ""                     ""   
 8 134863 FR3   "119842"              "DF1"   
 9 133678 GTA5  ""                     ""   
10 133675 BA1   "133678;119930"               "GTA5;AB1"  

涉及 stringr 的一个选项可能是:

with(df1, str_replace_all(id3, setNames(id2, id1)))

 [1] ""            "AB5;AB4;AB0" ""            ""            ""            "DF1"        
 [7] ""            ""            ""            "GTA5;AB1"   

这是使用 strsplit() + match()

的基础 R 解决方案
df1$id4 <- apply(df1, 1, function(v) {
  paste0(na.omit(df1$id2[match(unlist(strsplit(v["id3"],split = ";")),df1$id1)]),collapse = ";")
})

这样

> df1
# A tibble: 10 x 4
   id1    id2   id3                  id4         
   <chr>  <chr> <chr>                <chr>       
 1 119930 AB1   ""                   ""          
 2 124659 AB2   "119934;126821;1246~ "AB5;AB4;AB~
 3 114679 AB3   ""                   ""          
 4 119934 AB5   ""                   ""          
 5 126821 AB4   ""                   ""          
 6 124679 AB0   "119842"             "DF1"       
 7 119842 DF1   ""                   ""          
 8 134863 FR3   ""                   ""          
 9 133678 GTA5  ""                   ""          
10 133675 BA1   "133678;119930"      "GTA5;AB1" 

基于@tmfmnk 的回答,完整的解决方案如下所示:

    > df1 %>% mutate(id4 = str_replace_all(id3, setNames(id2, id1)))
# A tibble: 10 x 4
   id1    id2   id3                    id4          
   <chr>  <chr> <chr>                  <chr>        
 1 119930 AB1   ""                     ""           
 2 124659 AB2   "119934;126821;124679" "AB5;AB4;AB0"
 3 114679 AB3   ""                     ""           
 4 119934 AB5   ""                     ""           
 5 126821 AB4   ""                     ""           
 6 124679 AB0   "119842;134863"        "DF1;FR3"    
 7 119842 DF1   ""                     ""           
 8 134863 FR3   ""                     ""           
 9 133678 GTA5  ""                     ""           
10 133675 BA1   "133678"               "GTA5"