查找带通配符的复杂子串 Python
Find complicated substring with wildcards Python
我正在尝试在长字符串中定位表达式的位置。该表达式的工作原理如下。它由 list1 的任何元素给出,后跟 1 到 5 个单词的通配符(以空格分隔),然后是 list2 的任何元素。例如:
list1=["a","b"], list2=["c","d"]
text = "bla a tx fg hg gfgf tzt zt blaa a bli blubb d muh meh muh d"
应该 return "37" 因为这是表达式 ("a bli blubb d") 所在的位置。我研究了正则表达式通配符,但我很难将其与列表的不同元素以及通配符的可变长度放在一起。
感谢任何建议!
您可以构造一个正则表达式:
import re
pref=["a","b"]
suff=["c","d"]
# the pattern is dynamically constructed from your pref and suff lists.
patt = r"(?:\W|^)((?:" + '|'.join(pref) + r")(?: +[^ ]+){1,5} +(?:" + '|'.join(suff) + r"))(?:\W|$)"
text = "bla a tx fg hg gfgf tzt zt blaa a bli blubb d muh meh muh d"
print(patt)
for k in re.findall(patt,text):
print(k, "\n", text.index(k))
输出:
(?:\W|^)((?:a|b)(?: +[^ ]+){1,5} +(?:c|d))(?:\W|$) # pattern
a bli blubb d # found text
33 # position (your 37 is wrong btw.)
公平警告:这不是一个非常可靠的方法。
正则表达式类似于:
Either start of line or non-text character (not captured) followed by
one of your prefs. followed by 1-n spaces, followed by 1-5 non-space things that
are seperated by 1-n spaces, followed by something from suff followed
by (non captured non-Word-Character or end of line)
有关组装正则表达式的演示和更完整的说明:请参阅 https://regex101.com/r/WHZfr9/1
我正在尝试在长字符串中定位表达式的位置。该表达式的工作原理如下。它由 list1 的任何元素给出,后跟 1 到 5 个单词的通配符(以空格分隔),然后是 list2 的任何元素。例如:
list1=["a","b"], list2=["c","d"]
text = "bla a tx fg hg gfgf tzt zt blaa a bli blubb d muh meh muh d"
应该 return "37" 因为这是表达式 ("a bli blubb d") 所在的位置。我研究了正则表达式通配符,但我很难将其与列表的不同元素以及通配符的可变长度放在一起。
感谢任何建议!
您可以构造一个正则表达式:
import re
pref=["a","b"]
suff=["c","d"]
# the pattern is dynamically constructed from your pref and suff lists.
patt = r"(?:\W|^)((?:" + '|'.join(pref) + r")(?: +[^ ]+){1,5} +(?:" + '|'.join(suff) + r"))(?:\W|$)"
text = "bla a tx fg hg gfgf tzt zt blaa a bli blubb d muh meh muh d"
print(patt)
for k in re.findall(patt,text):
print(k, "\n", text.index(k))
输出:
(?:\W|^)((?:a|b)(?: +[^ ]+){1,5} +(?:c|d))(?:\W|$) # pattern
a bli blubb d # found text
33 # position (your 37 is wrong btw.)
公平警告:这不是一个非常可靠的方法。
正则表达式类似于:
Either start of line or non-text character (not captured) followed by
one of your prefs. followed by 1-n spaces, followed by 1-5 non-space things that
are seperated by 1-n spaces, followed by something from suff followed
by (non captured non-Word-Character or end of line)
有关组装正则表达式的演示和更完整的说明:请参阅 https://regex101.com/r/WHZfr9/1