python 将文本从特定模式打印到另一个模式的正则表达式,但条件是特定字符串应存在于两者之间
python regex to print text from a specific pattern to another pattern, but in condition that a specific string should exist in between
所以我有一个这样的文件:
<html>
<div>
<h1>HOiihilasdl</h1>
</div>
<script src=https://example.com/file.js></script>
<script>
blabla
blabla
blabla
blabla
blabla
</script>
<script src=https://example.com/file.js></script>
<script>
blabla
blabla
cow
blabla
blabla
</script>
</html>
我想打印从 <script>
到 </script>
但只有当单词 cow 存在于两者之间时才打印(我想使用 python 正则表达式来做到这一点)。
输出将如下所示:
<script>
blabla
blabla
cow
blabla
blabla
</script>
我搜索了很多答案,但没有找到解决我问题的答案。
我也想知道如果 <script>
和 </script>
之间存在单词“cow”是否有可能只是 return 我的“脚本”
我正在使用 Python 3.10.4
我不完全确定你在这里要做什么。如果您只是想解决您在问题中明确提出的场景,那么解决方案可能如下所示,您可以在其中遍历文件的每一行,并跟踪 opening/closing 标记。每当遇到结束标记时,您就开始存储行。如果在下一个结束标记之前没有找到诸如“cow”之类的模式,则在遇到下一个开始标记时重新开始搜索。
注意:下面的解决方案不适用于嵌套标签,但可以很容易地进行更改。
def find_pattern(file, pattern):
with open(file, 'r') as f:
lines = []
start = False
found_pattern = False
# Iterate through the lines in the file
for line in f:
# Remove the newline character
line = line.replace("\n", "")
# Remove the leading whitespaces
stripped_line = line.lstrip()
# If we met the start of a tag such as <script>, we need to keep track of the lines until we met the end tag
if start is False and stripped_line.startswith("<") and not "</" in line:
start = True
# We only append lines, whenever we start keeping track
if start:
lines.append(line)
# If we find the pattern, we set a flag to true
if pattern in line:
found_pattern = True
# If we met an end tag, we have two possibilities:
# If we found the pattern we break and print. Otherwise, we keep searching.
if stripped_line.startswith("</"):
if found_pattern:
break
else:
lines = []
start = False
# If the lines are not empty, i.e. we found the pattern, we print them
if lines:
for line in lines:
print(line)
find_pattern(file="t.txt", pattern="cow")
Output:
<script>
blabla
blabla
cow
blabla
blabla
</script>
所以我有一个这样的文件:
<html>
<div>
<h1>HOiihilasdl</h1>
</div>
<script src=https://example.com/file.js></script>
<script>
blabla
blabla
blabla
blabla
blabla
</script>
<script src=https://example.com/file.js></script>
<script>
blabla
blabla
cow
blabla
blabla
</script>
</html>
我想打印从 <script>
到 </script>
但只有当单词 cow 存在于两者之间时才打印(我想使用 python 正则表达式来做到这一点)。
输出将如下所示:
<script>
blabla
blabla
cow
blabla
blabla
</script>
我搜索了很多答案,但没有找到解决我问题的答案。
我也想知道如果 <script>
和 </script>
之间存在单词“cow”是否有可能只是 return 我的“脚本”
我正在使用 Python 3.10.4
我不完全确定你在这里要做什么。如果您只是想解决您在问题中明确提出的场景,那么解决方案可能如下所示,您可以在其中遍历文件的每一行,并跟踪 opening/closing 标记。每当遇到结束标记时,您就开始存储行。如果在下一个结束标记之前没有找到诸如“cow”之类的模式,则在遇到下一个开始标记时重新开始搜索。
注意:下面的解决方案不适用于嵌套标签,但可以很容易地进行更改。
def find_pattern(file, pattern):
with open(file, 'r') as f:
lines = []
start = False
found_pattern = False
# Iterate through the lines in the file
for line in f:
# Remove the newline character
line = line.replace("\n", "")
# Remove the leading whitespaces
stripped_line = line.lstrip()
# If we met the start of a tag such as <script>, we need to keep track of the lines until we met the end tag
if start is False and stripped_line.startswith("<") and not "</" in line:
start = True
# We only append lines, whenever we start keeping track
if start:
lines.append(line)
# If we find the pattern, we set a flag to true
if pattern in line:
found_pattern = True
# If we met an end tag, we have two possibilities:
# If we found the pattern we break and print. Otherwise, we keep searching.
if stripped_line.startswith("</"):
if found_pattern:
break
else:
lines = []
start = False
# If the lines are not empty, i.e. we found the pattern, we print them
if lines:
for line in lines:
print(line)
find_pattern(file="t.txt", pattern="cow")
Output:
<script>
blabla
blabla
cow
blabla
blabla
</script>