贪婪和惰性量词。使用 HTML 个标签进行测试
Greedy and Lazy quantifier. Testing with HTML tags
输入是
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
预期的第一个输出是(因为我使用的是贪婪量词)
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
Greedy 使用的代码如下
text = '''
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
'''
pattern=re.compile(r'\<p\>.*\<\/p\>')
data1=pattern.match(text,re.MULTILINE)
print('data1:- ',data1,'\n')
预期的第二个输出是(因为我使用的是惰性量词)
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
用于lazy的代码如下
text = '''
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
'''
#pattern=re.compile(r'\<p\>.*?\<\/p\>')
pattern=re.compile(r'<p>.*?</p>')
data1=pattern.match(text,re.MULTILINE)
print('data1:- ',data1,'\n')
我得到 None 两种情况都是实际输出
你有几个问题。首先,当使用 Pattern.match
, the second and third parameters are positional, not flags. The flags need to be specified to re.compile
. Secondly, you should be using re.DOTALL
to make .
match newline, not re.MULTILINE
. Finally - match
insists that the match occurs at the beginning of the string (which in your case is a newline character), so it won't match. You might want to use Pattern.search
时。这将适用于您的示例输入:
pattern=re.compile(r'<p>.*</p>', re.DOTALL)
data1=pattern.search(text)
print('data1:- ',data1.group(0),'\n')
输出:
data1:- <p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
单场比赛:
pattern=re.compile(r'<p>.*?</p>', re.DOTALL)
data1=pattern.search(text)
print('data1:- ',data1.group(0),'\n')
输出:
data1:- <p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
另请注意,/
、<
和 >
在正则表达式中没有特殊含义,不需要转义。我已经在上面的代码中删除了它。
输入是
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
预期的第一个输出是(因为我使用的是贪婪量词)
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
Greedy 使用的代码如下
text = '''
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
'''
pattern=re.compile(r'\<p\>.*\<\/p\>')
data1=pattern.match(text,re.MULTILINE)
print('data1:- ',data1,'\n')
预期的第二个输出是(因为我使用的是惰性量词)
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
用于lazy的代码如下
text = '''
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
'''
#pattern=re.compile(r'\<p\>.*?\<\/p\>')
pattern=re.compile(r'<p>.*?</p>')
data1=pattern.match(text,re.MULTILINE)
print('data1:- ',data1,'\n')
我得到 None 两种情况都是实际输出
你有几个问题。首先,当使用 Pattern.match
, the second and third parameters are positional, not flags. The flags need to be specified to re.compile
. Secondly, you should be using re.DOTALL
to make .
match newline, not re.MULTILINE
. Finally - match
insists that the match occurs at the beginning of the string (which in your case is a newline character), so it won't match. You might want to use Pattern.search
时。这将适用于您的示例输入:
pattern=re.compile(r'<p>.*</p>', re.DOTALL)
data1=pattern.search(text)
print('data1:- ',data1.group(0),'\n')
输出:
data1:- <p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
单场比赛:
pattern=re.compile(r'<p>.*?</p>', re.DOTALL)
data1=pattern.search(text)
print('data1:- ',data1.group(0),'\n')
输出:
data1:- <p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
另请注意,/
、<
和 >
在正则表达式中没有特殊含义,不需要转义。我已经在上面的代码中删除了它。