Python 正则表达式没有返回我要查找的内容

Python Regular Expressions not returning what I am looking for

我正在抓取一个网站并希望获取特定标签内的内容。 我想获取里面内容的标签是:<pre class="js-tab-content"></pre>

这是我的代码:

request = urllib.request.Request(url=url)
response = urllib.request.urlopen(request)
content = response.read().decode()

tab = re.findall(r'<pre class="js-tab-content">(.*?)</pre>', content)

print(tab)

当我打印标签时,我得到一个空列表[]

这是我正在搜索的内容:

.... <pre class="js-tab-content"><i></i><span>Em</span>              <span>D</span>              <span>Em</span>             <span>D</span>

Lift M
ac Cahir Og your face, brooding o'er the old disgrace 

     <span>Em</span>                  <span>D</span>                       <span>G</span>-<span>D</span>-<span>Em</span>     

That black Fitzwilliam stormed your place and drove you to the Fern.

<span>Em</span>              <span>D</span>           <span>Em</span>                         <span>D</span>

Gray said victory was sure, soon the firebrand he'd secure

<span>Em</span>                <span>D</span>          <span>G</span>-<span>D</span>-<span>Em</span>

Until he met at Glenmalure, Feach Mac Hugh O'Byrne 



Chorus:

<span>G</span>                                <span>D</span>

Curse and swear, Lord Kildare, Feach will do what Feach will dare

<span>G</span>                               <span>G</span>-<span>D</span>-<span>Em</span>

Now Fitzwilliam have a care, fallen is your star low

<span>G</span>                                       <span>D</span> 

Up with halbert, out with sword, on we go for by the Lord

<span>G</span>                               <span>G</span>-<span>D</span>-<span>Em</span>

Feach Mac Hugh has given his word: Follow me up to Carlow 



From Tassagart ____to Clonmore flows a stream of Saxon Gore

Great is Rory Og O'More at sending loons to Hades.

White is sick and Lane is fled, now for black Fitzwilliams head

We'll send it over, dripping red, to Liza and her ladies



See the swords of Glen Imayle flashing o'er the English Pale

See all the children of the Gael, beneath O'Byrne's banners

Rooster of the fighting stock, would you let an Saxon cock

Crow out upon an Irish rock, fly up and teach him manners

</pre> ....

我不明白为什么这会返回一个空列表而不是列表中包含内容的字符串。

我在互联网上搜索了大约半小时,但找不到任何帮助。

对不起,如果我在这里看起来很蠢,如果它是如此明显!

无论如何,先谢谢了!

好的,添加到评论中,这里是你如何使用 BeautifulSoup HTML Parser 来提取 pre 中的文本这种情况:

from bs4 import BeautifulSoup

soup = BeautifulSoup(content, "html.parser")
print(soup.find("pre", class_="js-tab-content").get_text())
tab = re.findall(r'<pre class="js-tab-content">(.*?)</pre>', content, re.S)

re.S. 匹配换行符所必需的。