如何使用 BeautifulSoup 找到 html 中的 div 的直接 children(不是 children 的 children)?

How to find the direct children (not the children of children) of a div in html using BeautifulSoup?

标记:

<div class = "parent-div">
    <div class = "child-1">
        <div class = "child-1.1">
        </div>
    </div>
    <div class = "child-2">
        <div class = "child-2.1">
        </div>
    </div>
</div>

我想获取div[parent-div]

的直接children列表

即列表为:

[div class = "child-1">
        <div class = "child-1.1">
        </div>
    </div>,<div class = "child-2">
        <div class = "child-2.1">
        </div>
    </div>]

我正在使用下面的 BeautifulSoup 代码:

page_soup = soup(page_html,"html.parser")
main_cont = page_soup.find('div',{'class':'parent-div'}).findAll('div')

此代码为我提供了所有 div 的列表:

[<div class = "child-1">
        <div class = "child-1.1">
        </div>
    </div>,<div class = "child-1.1">
        </div>,<div class = "child-2">
        <div class = "child-2.1">
        </div>
    </div>,<div class = "child-2.1">
        </div>]

如何获取 parent div 的直接 children 列表?

您可以使用 findChildren() 方法获取子标签。

main_cont = soup.find('div',{'class':'parent-div'}).findChildren('div',recursive=False)

输出:

[<div class="child-1"><div class="child-1.1"></div></div>, <div class="child-2"><div class="child-2.1"> </div></div>]

您可以使用 CSS 选择器轻松完成此操作。注意:使用 Beautiful Soup 4.7+。具体来说,使用子组合器:https://developer.mozilla.org/en-US/docs/Web/CSS/Child_combinator.

from bs4 import BeautifulSoup

html = """
<div class = "parent-div">
    <div class = "child-1">
        <div class = "child-1.1">
        </div>
    </div>
    <div class = "child-2">
        <div class = "child-2.1">
        </div>
    </div>
</div>
"""

soup = BeautifulSoup(html, 'html.parser')

print(soup.select('div.parent-div > *'))

输出

[<div class="child-1">\n<div class="child-1.1">\n</div>\n</div>, <div class="child-2">\n<div class="child-2.1">\n</div>\n</div>]