Beautifulsoup 标签内的文本
Beautifulsoup text inside tag
我正在尝试 scrape 具有以下 HTML 结构的页面:
<li class="bookie-offer first" data-bookie-code="BB" data-customer-type="existing" data-sport-type="2">
有没有办法从 li 标签中提取数据?具体来说,我想提取 data-customer-type 和 data-sport-type。
来自doc:
A tag may have any number of attributes. The tag <b class="boldest">
has an attribute “class” whose value is “boldest”. You can access a
tag’s attributes by treating the tag like a dictionary:
tag['class']
u'boldest'
You can access that dictionary directly as .attrs:
tag.attrs
{u'class': u'boldest'}
你的情况...
>>> soup.find(class_='bookie-offer').attrs
{'class': ['bookie-offer', 'first'],
'data-bookie-code': 'BB',
'data-customer-type': 'existing',
'data-sport-type': '2'}
>>> soup.find(class_='bookie-offer').attrs['data-customer-type']
'existing'
我正在尝试 scrape 具有以下 HTML 结构的页面:
<li class="bookie-offer first" data-bookie-code="BB" data-customer-type="existing" data-sport-type="2">
有没有办法从 li 标签中提取数据?具体来说,我想提取 data-customer-type 和 data-sport-type。
来自doc:
A tag may have any number of attributes. The tag
<b class="boldest">
has an attribute “class” whose value is “boldest”. You can access a tag’s attributes by treating the tag like a dictionary:
tag['class']
u'boldest'
You can access that dictionary directly as .attrs:
tag.attrs
{u'class': u'boldest'}
你的情况...
>>> soup.find(class_='bookie-offer').attrs
{'class': ['bookie-offer', 'first'],
'data-bookie-code': 'BB',
'data-customer-type': 'existing',
'data-sport-type': '2'}
>>> soup.find(class_='bookie-offer').attrs['data-customer-type']
'existing'