在 HTML 中的 header 之后打印 "p" 标签的内容

Question

我正在尝试完成数据抓取器作业。除最后一部分外，一切正常，我需要根据用户搜索条件打印报告给网站的网络安全漏洞的描述。

for index in range(2): 
    response = requests.get(url_values[index])
    content = response.content
    soup = BeautifulSoup(content,"lxml")
    #find the table content
    for header in soup.find_all("h3", string = "Description"):
        text = find_next.("p")
        print (text)

这就是 HTML 在我试图从中获取信息的区域中的样子：

 ...<section class="content-band">              
        <div class="content">



            <h3>Risk</h3>                           

            <div><p>Low</p></div>






            <h3>Date Discovered</h3>
            <p>February 12, 2019</p>




            <h3>Description</h3>
            <p>Microsoft Windows is prone to a local information-disclosure 
             vulnerability.                                                                        

            Local attackers can exploit this issue to obtain sensitive 
            information that may lead to further attacks.</p>




            <h3>Technologies Affected</h3>...

我想要 "Description" header（h3 元素）的内容（在 p 元素中）。我试过 "find_next_sibling" 类似，但似乎无法正常工作。

如有任何建议，我们将不胜感激。

Answer 1

您可以在同一个 soup 对象上使用两个 .find() 方法来查找 "h3" 元素，然后查找其下的 "p" 元素。

text = soup.find("h3", string="Description").find("p").text

您不需要使用 .find_all() 因为只有一个 "h3" 元素带有文本 "Description."

Answer 2

您可以像这样从 h3 兄弟元素中获取文本：

print(soup.find("h3", string="Description").find_next_sibling().text)

在 HTML 中的 header 之后打印 "p" 标签的内容

print the content of the "p" tag after a header in HTML

html

python

scraper