Beautifulsoup 标签内的文本

Question

我正在尝试 scrape 具有以下 HTML 结构的页面：

<li class="bookie-offer first" data-bookie-code="BB" data-customer-type="existing" data-sport-type="2">

有没有办法从 li 标签中提取数据？具体来说，我想提取 data-customer-type 和 data-sport-type。

Answer 1

来自doc：

A tag may have any number of attributes. The tag <b class="boldest"> has an attribute “class” whose value is “boldest”. You can access a tag’s attributes by treating the tag like a dictionary:

tag['class']

u'boldest'

You can access that dictionary directly as .attrs:

tag.attrs

{u'class': u'boldest'}

你的情况...

>>> soup.find(class_='bookie-offer').attrs

{'class': ['bookie-offer', 'first'],
 'data-bookie-code': 'BB',
 'data-customer-type': 'existing',
 'data-sport-type': '2'}

>>> soup.find(class_='bookie-offer').attrs['data-customer-type']
'existing'

Beautifulsoup 标签内的文本

Beautifulsoup text inside tag

python

beautifulsoup

web-scraping