如何使用beautifulsoup获取html中的class内容?

how to obtain class contents in html using beautifulsoup?

这是我的html代码,我希望在上面工作:

<section id='price'>

<div class="row">
    <h4 class='col-sm-4'>Market Cap: <b><i class="fa fa-inr"></i> 10.64 Crores</b></h4>
    <h4 class='col-sm-4'>Current Price: <b><i class="fa fa-inr"></i> 35.35</b></h4>
    <h4 class='col-sm-4'>Book Value: <b><i class="fa fa-inr"></i> 53.52</b></h4>
</div>

我的问题是如何从"class='col-sm-4'".

中获取市值、当前价格、账面价值

因为如果我尝试:

print soup.row.col-sm-4.fa.fa-inr

它不起作用。我对 python 和网络抓取有点陌生所以请耐心地完成整个过程。提前致谢。

您可以通过文本查找标签并获得 next_element:

from bs4 import BeautifulSoup

data = """
<div class="row">
        <h4 class='col-sm-4'>Market Cap: <b><i class="fa fa-inr"></i> 10.64 Crores</b></h4>
        <h4 class='col-sm-4'>Current Price: <b><i class="fa fa-inr"></i> 35.35</b></h4>
        <h4 class='col-sm-4'>Book Value: <b><i class="fa fa-inr"></i> 53.52</b></h4>
    </div>
"""
soup = BeautifulSoup(data)

titles = ['Market Cap', 'Current Price', 'Book Value']
for title in titles:
    print soup.find(text=lambda x: x.startswith(title)).next_element.text

打印:

10.64 Crores
35.35
53.52

要获取浮点值,您可以简单地除以space并获取第一个元素:

price = soup.find(text=lambda x: x.startswith(title)).strip().split()[0]
print float(price)

您也可以通过 CSS Selector:

获取它们
for item in soup.select('section#price div.row h4.col-sm-4 b'):
    print item.text

这样试试:

>>> for x in soup.find_all("div","row"):
...     print x.text
... 

Market Cap:  10.64 Crores
Current Price:  35.35
Book Value:  53.52