Beautiful Soup and Splinter - 获取 href 和 src 属性
Beautiful Soup and Splinter - get href and src attributes
这是代码:
url = f'https://www.premierleague.com/players'
# Initiate a splinter instance of the URL
browser.visit(url)
browser.find_by_tag('div[class="table playerIndex"]')
soup = BeautifulSoup(browser.html, 'html.parser')
for el in soup:
td = el.findAll('td')
for each_td in td:
url = each_td.find('a', href=True)
print (url)
命中目标物品,但紧随其后的是 None
:
<a class="playerName" href="/players/19970/Max-Aarons/overview"><img alt="" class="img" data-player="p232980" data-script="pl_player-image" data-size="40x40" data-widget="player-image" src="//platform-static-files.s3.amazonaws.com/premierleague/photos/players/40x40/Photo-Missing.png"/>Max Aarons</a>
None
None
<a class="playerName" href="/players/13279/Abdul-Rahman-Baba/overview"><img alt="" class="img" data-player="p118335" data-script="pl_player-image" data-size="40x40" data-widget="player-image" src="//platform-static-files.s3.amazonaws.com/premierleague/photos/players/40x40/Photo-Missing.png"/>Abdul Rahman Baba</a>
None
None
如何获取 href
和 src
值?
您可以将元素的特性和特性作为字典来访问。
for el in soup:
td = el.findAll('td')
for each_td in td:
link = each_td.find('a', href=True)
if link:
print(link['href'])
image = each_td.find('img')
if image:
print(image['src'])
这是代码:
url = f'https://www.premierleague.com/players'
# Initiate a splinter instance of the URL
browser.visit(url)
browser.find_by_tag('div[class="table playerIndex"]')
soup = BeautifulSoup(browser.html, 'html.parser')
for el in soup:
td = el.findAll('td')
for each_td in td:
url = each_td.find('a', href=True)
print (url)
命中目标物品,但紧随其后的是 None
:
<a class="playerName" href="/players/19970/Max-Aarons/overview"><img alt="" class="img" data-player="p232980" data-script="pl_player-image" data-size="40x40" data-widget="player-image" src="//platform-static-files.s3.amazonaws.com/premierleague/photos/players/40x40/Photo-Missing.png"/>Max Aarons</a>
None
None
<a class="playerName" href="/players/13279/Abdul-Rahman-Baba/overview"><img alt="" class="img" data-player="p118335" data-script="pl_player-image" data-size="40x40" data-widget="player-image" src="//platform-static-files.s3.amazonaws.com/premierleague/photos/players/40x40/Photo-Missing.png"/>Abdul Rahman Baba</a>
None
None
如何获取 href
和 src
值?
您可以将元素的特性和特性作为字典来访问。
for el in soup:
td = el.findAll('td')
for each_td in td:
link = each_td.find('a', href=True)
if link:
print(link['href'])
image = each_td.find('img')
if image:
print(image['src'])