如何使用 BeautifulSoup 找到 html 中的 div 的直接 children(不是 children 的 children)?
How to find the direct children (not the children of children) of a div in html using BeautifulSoup?
标记:
<div class = "parent-div">
<div class = "child-1">
<div class = "child-1.1">
</div>
</div>
<div class = "child-2">
<div class = "child-2.1">
</div>
</div>
</div>
我想获取div[parent-div]
的直接children列表
即列表为:
[div class = "child-1">
<div class = "child-1.1">
</div>
</div>,<div class = "child-2">
<div class = "child-2.1">
</div>
</div>]
我正在使用下面的 BeautifulSoup 代码:
page_soup = soup(page_html,"html.parser")
main_cont = page_soup.find('div',{'class':'parent-div'}).findAll('div')
此代码为我提供了所有 div 的列表:
[<div class = "child-1">
<div class = "child-1.1">
</div>
</div>,<div class = "child-1.1">
</div>,<div class = "child-2">
<div class = "child-2.1">
</div>
</div>,<div class = "child-2.1">
</div>]
如何获取 parent div 的直接 children 列表?
您可以使用 findChildren()
方法获取子标签。
main_cont = soup.find('div',{'class':'parent-div'}).findChildren('div',recursive=False)
输出:
[<div class="child-1"><div class="child-1.1"></div></div>, <div class="child-2"><div class="child-2.1"> </div></div>]
您可以使用 CSS 选择器轻松完成此操作。注意:使用 Beautiful Soup 4.7+。具体来说,使用子组合器:https://developer.mozilla.org/en-US/docs/Web/CSS/Child_combinator.
from bs4 import BeautifulSoup
html = """
<div class = "parent-div">
<div class = "child-1">
<div class = "child-1.1">
</div>
</div>
<div class = "child-2">
<div class = "child-2.1">
</div>
</div>
</div>
"""
soup = BeautifulSoup(html, 'html.parser')
print(soup.select('div.parent-div > *'))
输出
[<div class="child-1">\n<div class="child-1.1">\n</div>\n</div>, <div class="child-2">\n<div class="child-2.1">\n</div>\n</div>]
标记:
<div class = "parent-div">
<div class = "child-1">
<div class = "child-1.1">
</div>
</div>
<div class = "child-2">
<div class = "child-2.1">
</div>
</div>
</div>
我想获取div[parent-div]
的直接children列表即列表为:
[div class = "child-1">
<div class = "child-1.1">
</div>
</div>,<div class = "child-2">
<div class = "child-2.1">
</div>
</div>]
我正在使用下面的 BeautifulSoup 代码:
page_soup = soup(page_html,"html.parser")
main_cont = page_soup.find('div',{'class':'parent-div'}).findAll('div')
此代码为我提供了所有 div 的列表:
[<div class = "child-1">
<div class = "child-1.1">
</div>
</div>,<div class = "child-1.1">
</div>,<div class = "child-2">
<div class = "child-2.1">
</div>
</div>,<div class = "child-2.1">
</div>]
如何获取 parent div 的直接 children 列表?
您可以使用 findChildren()
方法获取子标签。
main_cont = soup.find('div',{'class':'parent-div'}).findChildren('div',recursive=False)
输出:
[<div class="child-1"><div class="child-1.1"></div></div>, <div class="child-2"><div class="child-2.1"> </div></div>]
您可以使用 CSS 选择器轻松完成此操作。注意:使用 Beautiful Soup 4.7+。具体来说,使用子组合器:https://developer.mozilla.org/en-US/docs/Web/CSS/Child_combinator.
from bs4 import BeautifulSoup
html = """
<div class = "parent-div">
<div class = "child-1">
<div class = "child-1.1">
</div>
</div>
<div class = "child-2">
<div class = "child-2.1">
</div>
</div>
</div>
"""
soup = BeautifulSoup(html, 'html.parser')
print(soup.select('div.parent-div > *'))
输出
[<div class="child-1">\n<div class="child-1.1">\n</div>\n</div>, <div class="child-2">\n<div class="child-2.1">\n</div>\n</div>]