使用 BeautifulSoup 从 html 标签中检索所有名称
Retrieve all names from html tags using BeautifulSoup
我通过 Beautiful Soup 成功设置并找到了我需要的标签。如何提取标签中的所有名称?
tags = soup.find_all("a")
print(tags)
在运行上面的代码之后,我得到了以下输出
[<a href="/wiki/Alfred_the_Great" title="Alfred the Great">Alfred the Great</a>, <a class="mw-redirect" href="/wiki/Elizabeth_I_of_England" title="Elizabeth I of England">Queen Elizabeth I</a>, <a href="/wiki/Family_tree_of_Scottish_monarchs" title="Family tree of Scottish monarchs">Family tree of Scottish monarchs</a>, <a href="/wiki/Kenneth_MacAlpin" title="Kenneth MacAlpin">Kenneth MacAlpin</a>]
如何检索阿尔弗雷德大帝、伊丽莎白女王一世、肯尼思·麦卡尔平等姓名?我需要使用正则表达式吗?使用 .string 给我一个错误
您可以遍历标签并使用 tag.get('title')
获取标题值。
其他一些相同的方法:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#attributes
无需申请re
。您可以通过迭代所有 a 标签轻松获取所有名称,然后调用 title attribute or get_text() or .find(text=True)
html='''
<html>
<body>
<a href="/wiki/Alfred_the_Great" title="Alfred the Great">
Alfred the Great
</a>
,
<a class="mw-redirect" href="/wiki/Elizabeth_I_of_England" title="Elizabeth I of England">
Queen Elizabeth I
</a>
,
<a href="/wiki/Family_tree_of_Scottish_monarchs" title="Family tree of Scottish monarchs">
Family tree of Scottish monarchs
</a>
,
<a href="/wiki/Kenneth_MacAlpin" title="Kenneth MacAlpin">
Kenneth MacAlpin
</a>
</body>
</html>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html,'lxml')
#print(soup.prettify())
for name in soup.find_all('a'):
txt = name.get('title')
#OR
#txt = name.get_text(strip=True)
print(txt)
输出:
Alfred the Great
Queen Elizabeth I
Family tree of Scottish monarchs
Kenneth MacAlpin
我通过 Beautiful Soup 成功设置并找到了我需要的标签。如何提取标签中的所有名称?
tags = soup.find_all("a")
print(tags)
在运行上面的代码之后,我得到了以下输出
[<a href="/wiki/Alfred_the_Great" title="Alfred the Great">Alfred the Great</a>, <a class="mw-redirect" href="/wiki/Elizabeth_I_of_England" title="Elizabeth I of England">Queen Elizabeth I</a>, <a href="/wiki/Family_tree_of_Scottish_monarchs" title="Family tree of Scottish monarchs">Family tree of Scottish monarchs</a>, <a href="/wiki/Kenneth_MacAlpin" title="Kenneth MacAlpin">Kenneth MacAlpin</a>]
如何检索阿尔弗雷德大帝、伊丽莎白女王一世、肯尼思·麦卡尔平等姓名?我需要使用正则表达式吗?使用 .string 给我一个错误
您可以遍历标签并使用 tag.get('title')
获取标题值。
其他一些相同的方法: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#attributes
无需申请re
。您可以通过迭代所有 a 标签轻松获取所有名称,然后调用 title attribute or get_text() or .find(text=True)
html='''
<html>
<body>
<a href="/wiki/Alfred_the_Great" title="Alfred the Great">
Alfred the Great
</a>
,
<a class="mw-redirect" href="/wiki/Elizabeth_I_of_England" title="Elizabeth I of England">
Queen Elizabeth I
</a>
,
<a href="/wiki/Family_tree_of_Scottish_monarchs" title="Family tree of Scottish monarchs">
Family tree of Scottish monarchs
</a>
,
<a href="/wiki/Kenneth_MacAlpin" title="Kenneth MacAlpin">
Kenneth MacAlpin
</a>
</body>
</html>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html,'lxml')
#print(soup.prettify())
for name in soup.find_all('a'):
txt = name.get('title')
#OR
#txt = name.get_text(strip=True)
print(txt)
输出:
Alfred the Great
Queen Elizabeth I
Family tree of Scottish monarchs
Kenneth MacAlpin