Python 变量字符串之间的正则表达式和内容之间的检查
Python regex expression between variable strings and content check between
我想找到出现在列表元素之间的所有字符串 start_signs
和 end_signs
。当 end_signs
中的元素丢失或稍后出现上下文时,
该解决方案不应采取。
一个解决方案是获取 start_signs
和 end_signs
之间的所有匹配项
并检查匹配项是否仅包含第三个列表中的单词 allowed_words_between
.
import re
allowed_words_between = ["and","with","a","very","beautiful"]
start_signs = ["$","$$"]
end_signs = ["Ferrari","BMW","Lamborghini","ship"]
teststring = """
I would like to be a $-millionaire with a Ferrari. -> Match: $-millionaire with a Ferrari
I would like to be a $$-millionair with a Lamborghini. -> Match: $$-millionair with a Lamborghini
I would like to be a $$-millionair with a rotten Lamborghini. -> No Match because of the word "rotten"
I would like to be a $$-millionair with a Lamborghini and a Ferrari. -> Match: $$-millionair with a Lamborghini and a Ferrari
I would like to be a $-millionaire with a very, very beautiful ship! -> Match: $-millionaire with a very, very beautiful ship
I would like to be a $-millionaire with a very, very beautiful but a bit dirty ship. -> No Match because of the word dirty
I would like to be a $-millionaire with a dog, a cat, two children and a cowboy hat. That would be great. -> No Match
"""
另一种解决方案是以 start_signs
开头的字符串,并在出现未出现在允许列表中的字符串时立即将其剪切:
allowed_list = allowed_words_between + start_signs + end_signs
到目前为止我尝试了什么:
我使用了this post
的解决方案
regexString = "("+"|".join(start_signs) + ")" + ".*?" + "(" +"|".join(end_signs)+")"
并尝试创建一个可变的正则表达式字符串 w.r.t。开始和结束。那不是不起作用。
我也不知道内容检查如何工作。
matches = re.findall(regexString,teststring)
substituted_text = re.sub(regexString, "[[Found It]]", teststring, count=0)
您可以重复所有 allowed_words_between
(可选)后跟逗号和空格字符,直到到达 end_signs
.
之一
您可以将捕获组变为非捕获 (?:
否则 re.findall 将 return 捕获组值。
注意转义 $
以字面匹配
图案看起来像
(?:$|$$)\S*(?:(?:\s+(?:and|with|a|very|beautiful),?)*\s+(?:Ferrari|BMW|Lamborghini|ship))+
模式匹配
(?:$|$$)\S*
匹配任何 start_signs 后跟可选的非空白字符(\S
也可以匹配美元符号,但是你可以使它更具体,例如 -\w+
)
(?:
外层非捕获组
(?:
内部非捕获组
\s+(?:and|with|a|very|beautiful),?
匹配任何 allowed_words_between 后跟一个逗号
)*\s+
关闭内部非捕获组并重复 0+ 次后跟 1+ whitspace 字符
(?:Ferrari|BMW|Lamborghini|ship)
匹配任何 end_signs
)+
关闭外部非捕获组并重复 1+ 次以将字符串与 Lamborghini 和法拉利 匹配
import re
allowed_words_between = ["and", "with", "a", "very", "beautiful"]
start_signs = [r"$", "$$"]
end_signs = ["Ferrari", "BMW", "Lamborghini", "ship"]
teststring = """
I would like to be a $-millionaire with a Ferrari.
I would like to be a $$-millionair with a Lamborghini.
I would like to be a $$-millionair with a rotten Lamborghini.
I would like to be a $$-millionair with a Lamborghini and a Ferrari.
I would like to be a $-millionaire with a very, very beautiful ship!
I would like to be a $-millionaire with a very, very beautiful but a bit dirty ship.
I would like to be a $-millionaire with a dog, a cat, two children and a cowboy hat. That would be great.
"""
regexString = "(?:" + "|".join(start_signs) + ")\S*(?:(?:\s+(?:" + "|".join(allowed_words_between) + "),?)*\s+(?:" + "|".join(end_signs) + "))+"
for s in re.findall(regexString, teststring):
print(s)
输出
$-millionaire with a Ferrari
$$-millionair with a Lamborghini
$$-millionair with a Lamborghini and a Ferrari
$-millionaire with a very, very beautiful ship
我想找到出现在列表元素之间的所有字符串 start_signs
和 end_signs
。当 end_signs
中的元素丢失或稍后出现上下文时,
该解决方案不应采取。
一个解决方案是获取 start_signs
和 end_signs
之间的所有匹配项
并检查匹配项是否仅包含第三个列表中的单词 allowed_words_between
.
import re
allowed_words_between = ["and","with","a","very","beautiful"]
start_signs = ["$","$$"]
end_signs = ["Ferrari","BMW","Lamborghini","ship"]
teststring = """
I would like to be a $-millionaire with a Ferrari. -> Match: $-millionaire with a Ferrari
I would like to be a $$-millionair with a Lamborghini. -> Match: $$-millionair with a Lamborghini
I would like to be a $$-millionair with a rotten Lamborghini. -> No Match because of the word "rotten"
I would like to be a $$-millionair with a Lamborghini and a Ferrari. -> Match: $$-millionair with a Lamborghini and a Ferrari
I would like to be a $-millionaire with a very, very beautiful ship! -> Match: $-millionaire with a very, very beautiful ship
I would like to be a $-millionaire with a very, very beautiful but a bit dirty ship. -> No Match because of the word dirty
I would like to be a $-millionaire with a dog, a cat, two children and a cowboy hat. That would be great. -> No Match
"""
另一种解决方案是以 start_signs
开头的字符串,并在出现未出现在允许列表中的字符串时立即将其剪切:
allowed_list = allowed_words_between + start_signs + end_signs
到目前为止我尝试了什么:
我使用了this post
的解决方案regexString = "("+"|".join(start_signs) + ")" + ".*?" + "(" +"|".join(end_signs)+")"
并尝试创建一个可变的正则表达式字符串 w.r.t。开始和结束。那不是不起作用。 我也不知道内容检查如何工作。
matches = re.findall(regexString,teststring)
substituted_text = re.sub(regexString, "[[Found It]]", teststring, count=0)
您可以重复所有 allowed_words_between
(可选)后跟逗号和空格字符,直到到达 end_signs
.
您可以将捕获组变为非捕获 (?:
否则 re.findall 将 return 捕获组值。
注意转义 $
以字面匹配
图案看起来像
(?:$|$$)\S*(?:(?:\s+(?:and|with|a|very|beautiful),?)*\s+(?:Ferrari|BMW|Lamborghini|ship))+
模式匹配
(?:$|$$)\S*
匹配任何 start_signs 后跟可选的非空白字符(\S
也可以匹配美元符号,但是你可以使它更具体,例如-\w+
)(?:
外层非捕获组(?:
内部非捕获组\s+(?:and|with|a|very|beautiful),?
匹配任何 allowed_words_between 后跟一个逗号
)*\s+
关闭内部非捕获组并重复 0+ 次后跟 1+ whitspace 字符(?:Ferrari|BMW|Lamborghini|ship)
匹配任何 end_signs
)+
关闭外部非捕获组并重复 1+ 次以将字符串与 Lamborghini 和法拉利 匹配
import re
allowed_words_between = ["and", "with", "a", "very", "beautiful"]
start_signs = [r"$", "$$"]
end_signs = ["Ferrari", "BMW", "Lamborghini", "ship"]
teststring = """
I would like to be a $-millionaire with a Ferrari.
I would like to be a $$-millionair with a Lamborghini.
I would like to be a $$-millionair with a rotten Lamborghini.
I would like to be a $$-millionair with a Lamborghini and a Ferrari.
I would like to be a $-millionaire with a very, very beautiful ship!
I would like to be a $-millionaire with a very, very beautiful but a bit dirty ship.
I would like to be a $-millionaire with a dog, a cat, two children and a cowboy hat. That would be great.
"""
regexString = "(?:" + "|".join(start_signs) + ")\S*(?:(?:\s+(?:" + "|".join(allowed_words_between) + "),?)*\s+(?:" + "|".join(end_signs) + "))+"
for s in re.findall(regexString, teststring):
print(s)
输出
$-millionaire with a Ferrari
$$-millionair with a Lamborghini
$$-millionair with a Lamborghini and a Ferrari
$-millionaire with a very, very beautiful ship