美丽的汤解析具有不同属性的多个标签
Beautiful soup parse multiple tags with different Attributes
sentences.find_all(['p','h2'],attrs={['class':None,'class':Not None]})
.
这是一个无效的语法,但是除了这样做之外还有什么替代方法。我想要带有一个属性的 p 标签和带有另一个属性的 h2 标签,我需要它们按顺序排列,而不是像将它们作为两个解析树找到它们,即我不想做
sentences.find_all('p',attrs={'class':None])
sentences.find_all('h2',attrs={'class':Not None])
您可以使用 CSS select 或 ,
(CSS reference):
from bs4 import BeautifulSoup
html_doc = """
<p class="cls1">Select this</p>
<p class="cls2">Don't select this</p>
<h2 class="cls3">Select this</h2>
<h2 class="cls4">Don't select this</h2>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for tag in soup.select("p.cls1, h2.cls3"):
print(tag)
打印:
<p class="cls1">Select this</p>
<h2 class="cls3">Select this</h2>
编辑:select 多个标签和一个标签必须具有空属性:
from bs4 import BeautifulSoup
html_doc = """
<p>Select this</p>
<p class="cls2">Don't select this</p>
<h2 class="cls3">Select this</h2>
<h2 class="cls4">Don't select this</h2>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for tag in soup.select("p, h2.cls3"):
if tag.name == "p" and len(tag.attrs) != 0:
continue
print(tag)
打印:
<p>Select this</p>
<h2 class="cls3">Select this</h2>
sentences.find_all(['p','h2'],attrs={['class':None,'class':Not None]})
.
这是一个无效的语法,但是除了这样做之外还有什么替代方法。我想要带有一个属性的 p 标签和带有另一个属性的 h2 标签,我需要它们按顺序排列,而不是像将它们作为两个解析树找到它们,即我不想做
sentences.find_all('p',attrs={'class':None])
sentences.find_all('h2',attrs={'class':Not None])
您可以使用 CSS select 或 ,
(CSS reference):
from bs4 import BeautifulSoup
html_doc = """
<p class="cls1">Select this</p>
<p class="cls2">Don't select this</p>
<h2 class="cls3">Select this</h2>
<h2 class="cls4">Don't select this</h2>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for tag in soup.select("p.cls1, h2.cls3"):
print(tag)
打印:
<p class="cls1">Select this</p>
<h2 class="cls3">Select this</h2>
编辑:select 多个标签和一个标签必须具有空属性:
from bs4 import BeautifulSoup
html_doc = """
<p>Select this</p>
<p class="cls2">Don't select this</p>
<h2 class="cls3">Select this</h2>
<h2 class="cls4">Don't select this</h2>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for tag in soup.select("p, h2.cls3"):
if tag.name == "p" and len(tag.attrs) != 0:
continue
print(tag)
打印:
<p>Select this</p>
<h2 class="cls3">Select this</h2>