如何使用beautifulsoup获取html中的class内容？

Question

这是我的html代码，我希望在上面工作：

<section id='price'>

<div class="row">
    <h4 class='col-sm-4'>Market Cap: <b><i class="fa fa-inr"></i> 10.64 Crores</b></h4>
    <h4 class='col-sm-4'>Current Price: <b><i class="fa fa-inr"></i> 35.35</b></h4>
    <h4 class='col-sm-4'>Book Value: <b><i class="fa fa-inr"></i> 53.52</b></h4>
</div>

我的问题是如何从"class='col-sm-4'".

中获取市值、当前价格、账面价值

因为如果我尝试：

print soup.row.col-sm-4.fa.fa-inr

它不起作用。我对 python 和网络抓取有点陌生所以请耐心地完成整个过程。提前致谢。

Answer 1

您可以通过文本查找标签并获得 next_element:

from bs4 import BeautifulSoup

data = """
<div class="row">
        <h4 class='col-sm-4'>Market Cap: <b><i class="fa fa-inr"></i> 10.64 Crores</b></h4>
        <h4 class='col-sm-4'>Current Price: <b><i class="fa fa-inr"></i> 35.35</b></h4>
        <h4 class='col-sm-4'>Book Value: <b><i class="fa fa-inr"></i> 53.52</b></h4>
    </div>
"""
soup = BeautifulSoup(data)

titles = ['Market Cap', 'Current Price', 'Book Value']
for title in titles:
    print soup.find(text=lambda x: x.startswith(title)).next_element.text

打印：

10.64 Crores
35.35
53.52

要获取浮点值，您可以简单地除以space并获取第一个元素：

price = soup.find(text=lambda x: x.startswith(title)).strip().split()[0]
print float(price)

您也可以通过 CSS Selector:

获取它们

for item in soup.select('section#price div.row h4.col-sm-4 b'):
    print item.text

Answer 2

这样试试：

>>> for x in soup.find_all("div","row"):
...     print x.text
... 

Market Cap:  10.64 Crores
Current Price:  35.35
Book Value:  53.52

如何使用beautifulsoup获取html中的class内容？

how to obtain class contents in html using beautifulsoup?

html

python

beautifulsoup

html-parsing

web-scraping