使用 Beautifulsoup 在页面中查找特定字符串

Question

我正在使用 bs4 并希望 return 从文档中 Python 特定内置函数的描述，例如从此页面获取 abs():

https://docs.python.org/2/library/functions.html

会 return 这个：

abs (x)

Return the absolute value of a number. The argument may be a plain or long integer or a floating point number. If the argument is a complex number, its magnitude is returned.

除了 <p> 元素之外，我一直在寻找我应该寻找的东西，以及如何只获取 <p> 元素及其中的文本。我知道我可以进行 findAll 搜索，但我想在不使用页面中的文本的情况下执行此操作（例如，就好像用户事先不知道文本是什么）：

import requests, bs4, re

res = requests.get('https://docs.python.org/2/library/functions.html')
res.raise_for_status()
abs_soup = bs4.BeautifulSoup(res.text)
abs_elems = abs_soup.body.findAll(text=re.compile('^abs$'))
print abs_elems
abs_desc = abs_soup.select   # this is the part Im stuck on
print abs_desc

Answer 1

我愿意，

>>> func = abs_soup.select('dl.function')
>>> for i in func:
    if i.select('dt#abs'):
        print 'abs\n'
        print i.select('dd')[0].text


abs

Return the absolute value of a number.  The argument may be a plain or long
integer or a floating point number.  If the argument is a complex number, its
magnitude is returned.

>>>

或

用这个替换我代码的最后两行..

    print i.find('dt').text
    print i.find('dd').text

Answer 2

嗯，Python的文档把所有函数都放在了<dl class="function">里面，还有一个<dt id="name_of_the_function">里面

所以我建议只使用：

import requests
from bs4 import BeautifulSoup

res = requests.get('https://docs.python.org/2/library/functions.html')
abs_soup = BeautifulSoup(res.text, "html.parser")

print(abs_soup.find('dt', {'id': 'abs'}).find_next('dd').text)

输出：

Return the absolute value of a number. The argument may be a plain or long integer or a floating point number. If the argument is a complex number, its magnitude is returned.

首先，我们使用 abs_soup.find('dt', {'id': 'abs'}) 找到具有 abs 的 <dt> 标签，因为它是 id，然后我们使用 .find_next('dd') 获得dt 标签后的下一个 <dd> 标签。

最后，使用 .text 获取 <dd> 标签的文本，但是您也可以使用 .find_next('p').text) 代替，输出是相同的。

使用 Beautifulsoup 在页面中查找特定字符串

Finding a specific string in a page with Beautifilsoup

python

regex

bs4