如何使用 BeautifulSoup 获取没有属性的列表的 children？

Question

情况

我尝试从这个 HTML:

中抓取 3 个“市场驱动因素”的嵌套无序列表

   <li>Drivers, Challenges, and Trends
    <ul>
     <li>Market drivers
      <ul>
       <li>Improvement in girth gear manufacturing technologies</li>
       <li>Expansion and installation of new cement plants</li>
       <li>Augmented demand from APAC</li>
      </ul>
     </li>
   <li>Market challenges
    <ul>
     <li>Increased demand for refurbished girth gear segments</li>

问题 #1：

我正在寻找的“市场驱动力”列表没有任何属性，例如 class name 或 id，因此只需按 text / string 在里面。所有教程都展示了如何使用类、id 等

进行查找

问题 #2：

children，即3个列表项，在这个页面恰好是3个，但在其他类似的页面中可能有0个、4个、7个或者其他数量。因此，我希望获得所有 children，而不管有多少（或 none）。我发现了一些关于使用 recursive=False 获取 children 的内容，还有一些其他说明在 BS2.

之后不使用 findChildren

问题 #3：

我尝试使用 find_all_next，但教程没有告诉我如何找到下一个定义点 - 它总是关于获取下一个。而如果在或 有一些停止，我可能会使用 find_all_next 直到你找到 属性.

以下代码显示了我的尝试（但它不起作用）：

import requests
from bs4 import BeautifulSoup

url = 'https://www.marketresearch.com/Infiniti-Research-Limited-v2680/Global-DNA-Microarray-30162580/'

page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')

toc = soup.find("div", id="toc")
drivers = toc.find(string="Market drivers").findAll("li", recursive=False).text

print(drivers)

Answer 1

虽然没有预期输出的示例，但我建议使用以下方法 Beautiful Soup 4.7.0 版 需要

如何select?

通过自己的文本选择一个元素并提取其所有子元素的文本 <li> 您可以使用 css selectors 和 list comprehension:

[x.get_text(strip=True) for x in toc.select('li:-soup-contains-own("Market drivers") li')]

或在 for 循环中：

data = []

for x in toc.select('li:-soup-contains-own("Market drivers") li'):
    data.append(x.get_text(strip=True))  

print(data)

输出：

['Improvement in girth gear manufacturing technologies', 'Expansion and installation of new cement plants', 'Augmented demand from APAC']

如何使用 BeautifulSoup 获取没有属性的列表的 children？

How to get children of a list with no attributes with BeatifulSoup?

python

children

beautifulsoup

web-scraping

情况

问题 #1：

问题 #2：

问题 #3：

以下代码显示了我的尝试（但它不起作用）：

如何select?

输出：