匹配允许部分匹配的字符串,但仅当存在唯一匹配时
Match strings with partial matching allowed but only when there's a unique match
我有一个向量 names
和另一个向量 v
我需要与 names
匹配。我想接收 names
的索引,其中 v
匹配。应该允许部分匹配,但前提是部分匹配是唯一的。
以下示例涵盖了所有相关案例:
names <- c("a", "b", "c", "ab", "def", "defg", "hij")
v1 <- c("a", "b")
v2 <- c("a", "ab")
v3 <- c("d")
v4 <- c("h")
v5 <- c("a", "b", "a")
我希望得到以下输出:
match_names(v1, names)
# c(1, 2)
match_names(v2, names)
# c(1, 4)
match_names(v3, names)
# error
match_names(v4, names)
# 7
match_names(v5, names)
# c(1, 2, 1)
如何编写这样的函数?我考虑过 which
和 grep
的(组合),但直到现在还没有找到有用的东西?
我试过的
(之前不知道部分匹配的要求..)
match_names1 <- function(v, names) {
sapply(v, function(i) which(i == names))
}
这适用于示例 v1
、v2
和 v5
。
得到部分匹配的要求后
match_names2 <- function(v, names) {
sapply(v, function(i) grep(i, names))
}
..这当然只适用于 v4
要捕获 v3
,请使用 match_names1
的以下扩展名:
match_names3 <- function(v, names) {
exact <- match_names1(v, names)
assertthat::assert_that(class(exact) != "list")
return(exact)
}
所以这包括 v1
、v2
、v3
和 v5
,但不包括 v4
提前感谢任何提示。
我选择了列表,因为从你的问题来看,如果来自 v
的字符串在 names
中精确匹配不止一次,那么会发生什么是不清楚的,并且保持所有完全匹配的唯一方法是到 return 列表。如果您不喜欢这些列表,您可以简单地 unlist()
结果。
match_names <- function(v, names){
# check exact matches:
resList <- lapply(v, function(elt) which(names == elt))
notMatched <- which(lengths(resList) == 0)
if (length(notMatched) == 0) return (resList)
#partial matching
else{
resNotMatched <- lapply(v[notMatched], grep, x = names)
matchedOnce <- which(lengths(resNotMatched) == 1)
}
resList[notMatched[matchedOnce]] <- resNotMatched[matchedOnce]
return (resList)
}
> match_names(v1, names)
[[1]]
[1] 1
[[2]]
[1] 2
> # c(1, 2)
> match_names(v2, names)
[[1]]
[1] 1
[[2]]
[1] 4
> # c(1, 4)
> match_names(v3, names)
[[1]]
integer(0)
> # error
> match_names(v4, names)
[[1]]
[1] 7
> # 7
> match_names(v5, names)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 1
>
[[
可用于一次部分匹配一个名称:
f = function(v){
sapply(v, function(x) setNames(seq_along(names), names)[[x, exact=FALSE]])
}
# try it on the example
vs = list(v1,v2,v3,v4,v5)
for (i in seq_along(vs)){
cat("\nv", i, ":\n", sep="")
print(try( f(vs[[i]]) ))
}
产生
v1:
a b
1 2
v2:
a ab
1 4
v3:
Error in setNames(seq_along(names), names)[[x, exact = FALSE]] :
subscript out of bounds
[1] "Error in setNames(seq_along(names), names)[[x, exact = FALSE]] : \n subscript out of bounds\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in setNames(seq_along(names), names)[[x, exact = FALSE]]: subscript out of bounds>
v4:
h
7
v5:
a b a
1 2 1
match
函数应该适用于除部分匹配 v4
之外的所有情况。
为了迎合部分匹配,您可以定义一个类似以下的函数:
match_names <- function(v, names) {
ind <- match(v, names)
# If can't find the match then try partial matching
if (any(is.na(ind))) {
# grepl to find partial matching index
ind <- which(grepl(v, names))
# To ensure partial matched value is unique.
if (length(ind) > 1) ind <- NA
}
return(ind)
}
> match_names(v1, names)
[1] 1 2
> match_names(v2, names)
[1] 1 4
> match_names(v3, names)
[1] NA
> match_names(v4, names)
[1] 7
> match_names(v5, names)
[1] 1 2 1
我有一个向量 names
和另一个向量 v
我需要与 names
匹配。我想接收 names
的索引,其中 v
匹配。应该允许部分匹配,但前提是部分匹配是唯一的。
以下示例涵盖了所有相关案例:
names <- c("a", "b", "c", "ab", "def", "defg", "hij")
v1 <- c("a", "b")
v2 <- c("a", "ab")
v3 <- c("d")
v4 <- c("h")
v5 <- c("a", "b", "a")
我希望得到以下输出:
match_names(v1, names)
# c(1, 2)
match_names(v2, names)
# c(1, 4)
match_names(v3, names)
# error
match_names(v4, names)
# 7
match_names(v5, names)
# c(1, 2, 1)
如何编写这样的函数?我考虑过 which
和 grep
的(组合),但直到现在还没有找到有用的东西?
我试过的
(之前不知道部分匹配的要求..)
match_names1 <- function(v, names) {
sapply(v, function(i) which(i == names))
}
这适用于示例 v1
、v2
和 v5
。
得到部分匹配的要求后
match_names2 <- function(v, names) {
sapply(v, function(i) grep(i, names))
}
..这当然只适用于 v4
要捕获 v3
,请使用 match_names1
的以下扩展名:
match_names3 <- function(v, names) {
exact <- match_names1(v, names)
assertthat::assert_that(class(exact) != "list")
return(exact)
}
所以这包括 v1
、v2
、v3
和 v5
,但不包括 v4
提前感谢任何提示。
我选择了列表,因为从你的问题来看,如果来自 v
的字符串在 names
中精确匹配不止一次,那么会发生什么是不清楚的,并且保持所有完全匹配的唯一方法是到 return 列表。如果您不喜欢这些列表,您可以简单地 unlist()
结果。
match_names <- function(v, names){
# check exact matches:
resList <- lapply(v, function(elt) which(names == elt))
notMatched <- which(lengths(resList) == 0)
if (length(notMatched) == 0) return (resList)
#partial matching
else{
resNotMatched <- lapply(v[notMatched], grep, x = names)
matchedOnce <- which(lengths(resNotMatched) == 1)
}
resList[notMatched[matchedOnce]] <- resNotMatched[matchedOnce]
return (resList)
}
> match_names(v1, names)
[[1]]
[1] 1
[[2]]
[1] 2
> # c(1, 2)
> match_names(v2, names)
[[1]]
[1] 1
[[2]]
[1] 4
> # c(1, 4)
> match_names(v3, names)
[[1]]
integer(0)
> # error
> match_names(v4, names)
[[1]]
[1] 7
> # 7
> match_names(v5, names)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 1
>
[[
可用于一次部分匹配一个名称:
f = function(v){
sapply(v, function(x) setNames(seq_along(names), names)[[x, exact=FALSE]])
}
# try it on the example
vs = list(v1,v2,v3,v4,v5)
for (i in seq_along(vs)){
cat("\nv", i, ":\n", sep="")
print(try( f(vs[[i]]) ))
}
产生
v1:
a b
1 2
v2:
a ab
1 4
v3:
Error in setNames(seq_along(names), names)[[x, exact = FALSE]] :
subscript out of bounds
[1] "Error in setNames(seq_along(names), names)[[x, exact = FALSE]] : \n subscript out of bounds\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in setNames(seq_along(names), names)[[x, exact = FALSE]]: subscript out of bounds>
v4:
h
7
v5:
a b a
1 2 1
match
函数应该适用于除部分匹配 v4
之外的所有情况。
为了迎合部分匹配,您可以定义一个类似以下的函数:
match_names <- function(v, names) {
ind <- match(v, names)
# If can't find the match then try partial matching
if (any(is.na(ind))) {
# grepl to find partial matching index
ind <- which(grepl(v, names))
# To ensure partial matched value is unique.
if (length(ind) > 1) ind <- NA
}
return(ind)
}
> match_names(v1, names)
[1] 1 2
> match_names(v2, names)
[1] 1 4
> match_names(v3, names)
[1] NA
> match_names(v4, names)
[1] 7
> match_names(v5, names)
[1] 1 2 1