段落中句子的索引

Index of a sentence in a paragraph

我有两个字符串,ab,我可以使用 a.index(b) 在字符串 a 中找到字符串 b 的索引。

a = """
Hello! This is a string which I am using to present a quesion to Whosebug because I ran into a problem.
How do I solve this?
If anyone knows how to do this, please help!
"""
b = "How do I solve this"

idx = a.index(b)

但是当字符串 b 不完全是字符串 a 的一部分时,这不起作用。例如当字符串 b 是:

b = "How fo I solve rhis"

我想要一种方法,当“不匹配”字符的数量最多为 5 时,我们可以在 a 中找到 b 的索引。

直接的方法是迭代可能的索引并计算从该索引开始的 a 的子字符串与 b 之间的不匹配,如果不匹配的数量低于阈值:

def fuzzy_index(a, b, max_mismatches=5):
    
    n_overall = len(a)
    n_to_match = len(b)
    if n_overall < n_to_match:
        return None
    if n_to_match <= max_mismatches:
        return 0
    
    for i in range(n_overall - n_to_match + 1):
        if sum(c_a != c_b for c_a, c_b in zip(a[i : i + n_to_match], b)
                ) <= max_mismatches:
            return i

        
a = """
Hello! This is a string which I am using to present a quesion to Whosebug because I ran into a problem.
How do I solve this?
If anyone knows how to do this, please help!
"""
b = "How fo I solve rhis"

print(fuzzy_index(a, b))  # -> 110

还有一些您可能想要使用的模糊字符串匹配包,例如fuzzywuzzy