获取具有模糊性的正则表达式组
Get regex group with fuzziness
我有一个非常大的单词列表(大约 200k):
["cat", "the dog", "elephant", "the angry tiger"]
我创建了这个正则表达式,带有模糊性:
regex = "(cat){e<3}|(the dog){e<3}|(elephant){e<3}|(the angry tiger){e<3}"
我输入了句子:
sentence1 = "The doog is running in the field"
sentence2 = "The elephent and the kat"
...
我想得到的是:
res1 = ["the dog"]
res2 = ["elephant", "cat"]
我试过这个,例如:
re.findall(regex, sentence2, flags=re.IGNORECASE|re.UNICODE)
但这输出我:
["elephent", "kat"]
知道如何用更正的单词得到正确答案吗?我想要的是为每场比赛获得正则表达式捕获组,但我很难做到这一点。
也许我做的不对,也许正则表达式的方式不是很好,但是带有 for
循环的 if item in list
执行起来太长了。
可以通过手动构造正则表达式并命名组来完成:
import regex as re
a = ["cat", "the dog", "elephant", "the angry tiger"]
a_dict = { 'g%d' % (i):item for i,item in enumerate(a) }
regex = "|".join([ r"\b(?<g%d>(%s){e<3})\b" % (i,item) for i,item in enumerate(a) ])
sentence1 = "The doog is running in the field"
sentence2 = "The elephent and the kat"
for match in re.finditer(regex, sentence2, flags=re.IGNORECASE|re.UNICODE):
for key,value in match.groupdict().items():
if value is not None:
print ("%s: %s" % (a_dict.get(key), value))
elephant: elephent
cat: kat
我有一个非常大的单词列表(大约 200k):
["cat", "the dog", "elephant", "the angry tiger"]
我创建了这个正则表达式,带有模糊性:
regex = "(cat){e<3}|(the dog){e<3}|(elephant){e<3}|(the angry tiger){e<3}"
我输入了句子:
sentence1 = "The doog is running in the field"
sentence2 = "The elephent and the kat"
...
我想得到的是:
res1 = ["the dog"]
res2 = ["elephant", "cat"]
我试过这个,例如:
re.findall(regex, sentence2, flags=re.IGNORECASE|re.UNICODE)
但这输出我:
["elephent", "kat"]
知道如何用更正的单词得到正确答案吗?我想要的是为每场比赛获得正则表达式捕获组,但我很难做到这一点。
也许我做的不对,也许正则表达式的方式不是很好,但是带有 for
循环的 if item in list
执行起来太长了。
可以通过手动构造正则表达式并命名组来完成:
import regex as re
a = ["cat", "the dog", "elephant", "the angry tiger"]
a_dict = { 'g%d' % (i):item for i,item in enumerate(a) }
regex = "|".join([ r"\b(?<g%d>(%s){e<3})\b" % (i,item) for i,item in enumerate(a) ])
sentence1 = "The doog is running in the field"
sentence2 = "The elephent and the kat"
for match in re.finditer(regex, sentence2, flags=re.IGNORECASE|re.UNICODE):
for key,value in match.groupdict().items():
if value is not None:
print ("%s: %s" % (a_dict.get(key), value))
elephant: elephent
cat: kat