使用具有多重匹配的另一个数据框替换字符值

Replace character value using another dataframe with multiple matching

test.vector <- c("jdoe","John Doe","jodoe","Sarah Scarlet","sscarlet","scarlet")
test.df <- data.frame("Full.Name" = c("John Doe","Sarah Scarlet"),
                      "alias1" = c("jdoe","sscarlet"),
                      "alias2" = c("jodoe","scarlet"))
want.vector <- c("John Doe","John Doe","John Doe","Sarah Scarlet","Sarah Scarlet","Sarah Scarlet")

> test.vector
[1] "jdoe"          "John Doe"      "jodoe"         "Sarah Scarlet" "sscarlet"      "scarlet" 

> test.df
      Full.Name   alias1  alias2
1      John Doe     jdoe   jodoe
2 Sarah Scarlet sscarlet scarlet     

> want.vector
[1] "John Doe"      "John Doe"      "John Doe"      "Sarah Scarlet" "Sarah Scarlet" "Sarah Scarlet"

所有像这样的搜索结果恰好有一个匹配,并且使用了merge()join()。 但是,在这种情况下,有多种可能性,我不确定该如何处理。 我尝试过的几件事是(使用屠宰语法):

  1. str_replace(test.vector,test.df[,-1],test.df[.1])
  2. recode(test.vector,test.df)
  3. 在将 test.vector 更改为 df
  4. 后加入 by = c(test.df[,-1], test.vector)

需要注意的一件事是,我的项目实际 test.df 有多个非常稀疏的列(因为每个别名都与特定的 location/position 相关)。不知道会不会和上面的例子有很大的区别。

您可以创建一个与您的数据框具有相同 dimensions 的数组,并让第一列循环使用,然后遍历测试向量以通过 sapply 中的数据框对数组进行子集化.

test.a <- array(test.df[, 1], dim=dim(test.df))
sapply(test.vector, function(x) test.a[x == test.df], USE.NAMES=F)
# [1] "John Doe"      "John Doe"      "John Doe"      "Sarah Scarlet" "Sarah Scarlet"
# [6] "Sarah Scarlet"