如何不使用 sapply 更改重复名称？

Question

我有一个包含已注册药物名称的文本向量，另一个包含新药物名称。我想知道这些新药是否看起来像现有药物。

例如，如果 supercure 是一种可以由 firm1 或 firm2 生产的药物，并且 supercure firm1 1000mg 和 supercure firm2 500mg 已经注册，那么 supercure firm1 500 mg 应该与两者相关联其中

agrep 允许在 R 中进行此类匹配，sapply 允许对新列表中的每种药物进行匹配：

new<-c("supercure firm1 500mg","randomcure firm2 1000mg","unknowncure firm2 100mg")
registered<-c("supercure firm1 1000mg","supercure firm2 500mg","randomcure firm1 1000mg")
res<-unlist(sapply(new,agrep,x=registered))
res

正如预期的那样，supercure 得到了两个匹配项，randomcure 一个匹配项和 unknowncure 没有匹配项（这正是我想要的）。但是，sapply 似乎更改了名称，因此没有重复的名称：supercure firm1 500mg 变为 supercure firm1 500mg1 和 supercure firm1 500mg2：

supercure firm1 500mg1   supercure firm1 500mg2 randomcure firm2 1000mg 
                    1                       2                       3

这是一个问题，因为它阻止我 select 匹配新列表中的药物：

new[new %in% names(res)] 只捕获 randomcure（因为 supercure 的名称已更改）。

我可以想出通过相当粗鲁的文本处理来解决这个问题的方法，但是有没有更聪明的方法来获取找到匹配项的新药物列表？

理想的输出是：

supercure firm1 500mg   supercure firm1 500mg randomcure firm2 1000mg 
                    1                       2                       3

Answer 1

您可以尝试将其设为数据框，stack并使用setNames将其设为命名向量，即

d1 <- unique(stack(data.frame(Filter(length, sapply(new,agrep,x=registered)))))
#  values                     ind
#1      1   supercure.firm1.500mg
#2      2   supercure.firm1.500mg
#3      3 randomcure.firm2.1000mg

setNames(d1$values, d1$ind)
#  supercure.firm1.500mg   supercure.firm1.500mg randomcure.firm2.1000mg 
#                      1                       2                       3

Answer 2

sapply 没有改名字，unlist 改了。这给出了所需的输出：

x <- sapply(new,agrep,x=registered)
setNames(unlist(x),rep(names(x),lengths(x)))
#  supercure firm1 500mg   supercure firm1 500mg randomcure firm2 1000mg 
#                      1                       2                       3

如何不使用 sapply 更改重复名称？

How not to alter duplicate names with sapply?

r

apply

agrep

sapply