使用具有多重匹配的另一个数据框替换字符值

Question

test.vector <- c("jdoe","John Doe","jodoe","Sarah Scarlet","sscarlet","scarlet")
test.df <- data.frame("Full.Name" = c("John Doe","Sarah Scarlet"),
                      "alias1" = c("jdoe","sscarlet"),
                      "alias2" = c("jodoe","scarlet"))
want.vector <- c("John Doe","John Doe","John Doe","Sarah Scarlet","Sarah Scarlet","Sarah Scarlet")

> test.vector
[1] "jdoe"          "John Doe"      "jodoe"         "Sarah Scarlet" "sscarlet"      "scarlet" 

> test.df
      Full.Name   alias1  alias2
1      John Doe     jdoe   jodoe
2 Sarah Scarlet sscarlet scarlet     

> want.vector
[1] "John Doe"      "John Doe"      "John Doe"      "Sarah Scarlet" "Sarah Scarlet" "Sarah Scarlet"

所有像这样的搜索结果恰好有一个匹配，并且使用了merge()或join()。但是，在这种情况下，有多种可能性，我不确定该如何处理。我尝试过的几件事是（使用屠宰语法）：

str_replace(test.vector,test.df[,-1],test.df[.1])
recode(test.vector,test.df)
在将 test.vector 更改为 df

by = c(test.df[,-1], test.vector)

需要注意的一件事是，我的项目实际 test.df 有多个非常稀疏的列（因为每个别名都与特定的 location/position 相关）。不知道会不会和上面的例子有很大的区别。

Answer 1

您可以创建一个与您的数据框具有相同 dimensions 的数组，并让第一列循环使用，然后遍历测试向量以通过 sapply 中的数据框对数组进行子集化.

test.a <- array(test.df[, 1], dim=dim(test.df))
sapply(test.vector, function(x) test.a[x == test.df], USE.NAMES=F)
# [1] "John Doe"      "John Doe"      "John Doe"      "Sarah Scarlet" "Sarah Scarlet"
# [6] "Sarah Scarlet"

使用具有多重匹配的另一个数据框替换字符值

Replace character value using another dataframe with multiple matching

replace

join

r

dataframe

stringr