使用具有多重匹配的另一个数据框替换字符值
Replace character value using another dataframe with multiple matching
test.vector <- c("jdoe","John Doe","jodoe","Sarah Scarlet","sscarlet","scarlet")
test.df <- data.frame("Full.Name" = c("John Doe","Sarah Scarlet"),
"alias1" = c("jdoe","sscarlet"),
"alias2" = c("jodoe","scarlet"))
want.vector <- c("John Doe","John Doe","John Doe","Sarah Scarlet","Sarah Scarlet","Sarah Scarlet")
> test.vector
[1] "jdoe" "John Doe" "jodoe" "Sarah Scarlet" "sscarlet" "scarlet"
> test.df
Full.Name alias1 alias2
1 John Doe jdoe jodoe
2 Sarah Scarlet sscarlet scarlet
> want.vector
[1] "John Doe" "John Doe" "John Doe" "Sarah Scarlet" "Sarah Scarlet" "Sarah Scarlet"
所有像这样的搜索结果恰好有一个匹配,并且使用了merge()
或join()
。
但是,在这种情况下,有多种可能性,我不确定该如何处理。
我尝试过的几件事是(使用屠宰语法):
str_replace(test.vector,test.df[,-1],test.df[.1])
recode(test.vector,test.df)
- 在将 test.vector 更改为 df
后加入 by = c(test.df[,-1], test.vector)
需要注意的一件事是,我的项目实际 test.df
有多个非常稀疏的列(因为每个别名都与特定的 location/position 相关)。不知道会不会和上面的例子有很大的区别。
您可以创建一个与您的数据框具有相同 dim
ensions 的数组,并让第一列循环使用,然后遍历测试向量以通过 sapply
中的数据框对数组进行子集化.
test.a <- array(test.df[, 1], dim=dim(test.df))
sapply(test.vector, function(x) test.a[x == test.df], USE.NAMES=F)
# [1] "John Doe" "John Doe" "John Doe" "Sarah Scarlet" "Sarah Scarlet"
# [6] "Sarah Scarlet"
test.vector <- c("jdoe","John Doe","jodoe","Sarah Scarlet","sscarlet","scarlet")
test.df <- data.frame("Full.Name" = c("John Doe","Sarah Scarlet"),
"alias1" = c("jdoe","sscarlet"),
"alias2" = c("jodoe","scarlet"))
want.vector <- c("John Doe","John Doe","John Doe","Sarah Scarlet","Sarah Scarlet","Sarah Scarlet")
> test.vector
[1] "jdoe" "John Doe" "jodoe" "Sarah Scarlet" "sscarlet" "scarlet"
> test.df
Full.Name alias1 alias2
1 John Doe jdoe jodoe
2 Sarah Scarlet sscarlet scarlet
> want.vector
[1] "John Doe" "John Doe" "John Doe" "Sarah Scarlet" "Sarah Scarlet" "Sarah Scarlet"
所有像merge()
或join()
。
但是,在这种情况下,有多种可能性,我不确定该如何处理。
我尝试过的几件事是(使用屠宰语法):
str_replace(test.vector,test.df[,-1],test.df[.1])
recode(test.vector,test.df)
- 在将 test.vector 更改为 df 后加入
by = c(test.df[,-1], test.vector)
需要注意的一件事是,我的项目实际 test.df
有多个非常稀疏的列(因为每个别名都与特定的 location/position 相关)。不知道会不会和上面的例子有很大的区别。
您可以创建一个与您的数据框具有相同 dim
ensions 的数组,并让第一列循环使用,然后遍历测试向量以通过 sapply
中的数据框对数组进行子集化.
test.a <- array(test.df[, 1], dim=dim(test.df))
sapply(test.vector, function(x) test.a[x == test.df], USE.NAMES=F)
# [1] "John Doe" "John Doe" "John Doe" "Sarah Scarlet" "Sarah Scarlet"
# [6] "Sarah Scarlet"