段落中句子的索引
Index of a sentence in a paragraph
我有两个字符串,a
和 b
,我可以使用 a.index(b)
在字符串 a
中找到字符串 b
的索引。
a = """
Hello! This is a string which I am using to present a quesion to Whosebug because I ran into a problem.
How do I solve this?
If anyone knows how to do this, please help!
"""
b = "How do I solve this"
idx = a.index(b)
但是当字符串 b
不完全是字符串 a
的一部分时,这不起作用。例如当字符串 b
是:
b = "How fo I solve rhis"
我想要一种方法,当“不匹配”字符的数量最多为 5 时,我们可以在 a
中找到 b
的索引。
直接的方法是迭代可能的索引并计算从该索引开始的 a
的子字符串与 b
之间的不匹配,如果不匹配的数量低于阈值:
def fuzzy_index(a, b, max_mismatches=5):
n_overall = len(a)
n_to_match = len(b)
if n_overall < n_to_match:
return None
if n_to_match <= max_mismatches:
return 0
for i in range(n_overall - n_to_match + 1):
if sum(c_a != c_b for c_a, c_b in zip(a[i : i + n_to_match], b)
) <= max_mismatches:
return i
a = """
Hello! This is a string which I am using to present a quesion to Whosebug because I ran into a problem.
How do I solve this?
If anyone knows how to do this, please help!
"""
b = "How fo I solve rhis"
print(fuzzy_index(a, b)) # -> 110
还有一些您可能想要使用的模糊字符串匹配包,例如fuzzywuzzy
我有两个字符串,a
和 b
,我可以使用 a.index(b)
在字符串 a
中找到字符串 b
的索引。
a = """
Hello! This is a string which I am using to present a quesion to Whosebug because I ran into a problem.
How do I solve this?
If anyone knows how to do this, please help!
"""
b = "How do I solve this"
idx = a.index(b)
但是当字符串 b
不完全是字符串 a
的一部分时,这不起作用。例如当字符串 b
是:
b = "How fo I solve rhis"
我想要一种方法,当“不匹配”字符的数量最多为 5 时,我们可以在 a
中找到 b
的索引。
直接的方法是迭代可能的索引并计算从该索引开始的 a
的子字符串与 b
之间的不匹配,如果不匹配的数量低于阈值:
def fuzzy_index(a, b, max_mismatches=5):
n_overall = len(a)
n_to_match = len(b)
if n_overall < n_to_match:
return None
if n_to_match <= max_mismatches:
return 0
for i in range(n_overall - n_to_match + 1):
if sum(c_a != c_b for c_a, c_b in zip(a[i : i + n_to_match], b)
) <= max_mismatches:
return i
a = """
Hello! This is a string which I am using to present a quesion to Whosebug because I ran into a problem.
How do I solve this?
If anyone knows how to do this, please help!
"""
b = "How fo I solve rhis"
print(fuzzy_index(a, b)) # -> 110
还有一些您可能想要使用的模糊字符串匹配包,例如fuzzywuzzy