贪婪和惰性量词。使用 HTML 个标签进行测试

Question

输入是

<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>

预期的第一个输出是（因为我使用的是贪婪量词）

<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>

Greedy 使用的代码如下

text = '''
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
'''
pattern=re.compile(r'\<p\>.*\<\/p\>')
data1=pattern.match(text,re.MULTILINE)
print('data1:- ',data1,'\n')

预期的第二个输出是（因为我使用的是惰性量词）

<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>

用于lazy的代码如下

text = '''
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
'''
#pattern=re.compile(r'\<p\>.*?\<\/p\>')
pattern=re.compile(r'<p>.*?</p>')
data1=pattern.match(text,re.MULTILINE)
print('data1:- ',data1,'\n')

我得到 None 两种情况都是实际输出

Answer 1

你有几个问题。首先，当使用 Pattern.match, the second and third parameters are positional, not flags. The flags need to be specified to re.compile. Secondly, you should be using re.DOTALL to make . match newline, not re.MULTILINE. Finally - match insists that the match occurs at the beginning of the string (which in your case is a newline character), so it won't match. You might want to use Pattern.search 时。这将适用于您的示例输入：

pattern=re.compile(r'<p>.*</p>', re.DOTALL)
data1=pattern.search(text)
print('data1:- ',data1.group(0),'\n')

输出：

data1:-  <p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>

单场比赛：

pattern=re.compile(r'<p>.*?</p>', re.DOTALL)
data1=pattern.search(text)
print('data1:- ',data1.group(0),'\n')

输出：

data1:-  <p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>

另请注意，/、< 和 > 在正则表达式中没有特殊含义，不需要转义。我已经在上面的代码中删除了它。

贪婪和惰性量词。使用 HTML 个标签进行测试

Greedy and Lazy quantifier. Testing with HTML tags

html

regex

python-3.x

regex-greedy