如何计算两个字符串中任意位置的最长公共子串
How to calculate longest common substring anywhere in two strings
我正在尝试计算 R 中字符串和字符串向量之间没有间隙的最长精确公共子串。如何将 stringdist 修改为 return 两个比较字符串中任意位置的任何公共字符串和 return距离?
转载数据:
string1 <- "whereiam"
vec1 <- c("firstiam","twoiswhereiaminthisvec","thisisthree","fouriamhere","fivewherehere")
尝试了 stringdist 函数(对我的目的不起作用):
library(stringdist)
stringdistvec <- stringdist(string1,vec1,method="lcs")
[1] 8 14 13 11 11 #not calculating the lcs type I want
期望的结果而不是匹配的解释:
#desired to work to get this result:
desired_stringdistvec <- c(3,8,1,3,5)
[1] 3 8 1 3 5
#match 1: iam (3 common substr)
#match 2: whereiam (8 common substr)
#match 3: i (one letter only)
#match 5: iam (3 common substr)
#match 6: where (5 common substr)
一种方法可能是查看 adist()
生成的转换序列并计算最长连续匹配中的字符数:
trafos <- attr(adist(string1, vec1, counts = TRUE), "trafos")
sapply(gregexpr("M+", trafos), function(x) max(0, attr(x, "match.length")))
[1] 3 8 1 3 5
我正在尝试计算 R 中字符串和字符串向量之间没有间隙的最长精确公共子串。如何将 stringdist 修改为 return 两个比较字符串中任意位置的任何公共字符串和 return距离?
转载数据:
string1 <- "whereiam"
vec1 <- c("firstiam","twoiswhereiaminthisvec","thisisthree","fouriamhere","fivewherehere")
尝试了 stringdist 函数(对我的目的不起作用):
library(stringdist)
stringdistvec <- stringdist(string1,vec1,method="lcs")
[1] 8 14 13 11 11 #not calculating the lcs type I want
期望的结果而不是匹配的解释:
#desired to work to get this result:
desired_stringdistvec <- c(3,8,1,3,5)
[1] 3 8 1 3 5
#match 1: iam (3 common substr)
#match 2: whereiam (8 common substr)
#match 3: i (one letter only)
#match 5: iam (3 common substr)
#match 6: where (5 common substr)
一种方法可能是查看 adist()
生成的转换序列并计算最长连续匹配中的字符数:
trafos <- attr(adist(string1, vec1, counts = TRUE), "trafos")
sapply(gregexpr("M+", trafos), function(x) max(0, attr(x, "match.length")))
[1] 3 8 1 3 5