BeautifulSoup 通过 div
BeautifulSoup passing div
我想通过这个页面:
http://animedigitalnetwork.fr/video/naruto-shippuden
我测试这个:
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen('http://animedigitalnetwork.fr/video/naruto-shippuden')
soup = BeautifulSoup(page)
first_div = soup.find('div',{"class" : "adn-video"})
结果不是火影忍者!?
<div class="adn-video"> <div class="adn-video_screenshot">
<img src="http://image.animedigitalnetwork.fr/license/claymore/tv/web/eps1_328x184.jpg" alt="Claymore 1" /><span class="adn_video_play-button"></span> </div><div class="adn-video_text"><div class="adn-video_title">
<h4>Claymore</h4><span>Épisode 1</span><div class="adn-rating mobile-hide" itemprop="aggregateRating" itemscope="itemscope" itemtype="http://schema.org/AggregateRating"><meta itemprop="ratingValue" content="4.6667" /><meta itemprop="ratingCount" content="10" /><div id="adn-rating"><ul class="adn-rating_empty"><li></li><li></li><li></li><li></li><li></li></ul><ul class="adn-rating_rating"><li></li><li></li><li></li><li></li><li></li></ul></div><p class="adn-rating-message"></p></div></div><div class="adn-video_link">
<a title="Claymore 1" href="/video/claymore/1849-episode-1-la-claymore">Voir la vidéo</a>
</div></div></div>
这里有多个问题:
- 形成视频列表还涉及一个额外的异步调用:
- 首先需要从初始页面中提取
playlist
参数值
- 然后,发出 post 请求以将视频发送到 this url
您正在使用 outdated and not maintained version of BeautifulSoup
- switch to bs4
:
pip install beautifulsoup4
使用requests
instead of urllib2
, maintain a session并传递请求headers
完整的工作代码:
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
# initialize session
session = requests.Session()
# getting playlist
response = session.get('http://animedigitalnetwork.fr/video/naruto-shippuden', headers=headers)
soup = BeautifulSoup(response.content)
playlist = soup.find('a', {'data-playlist': True})['data-playlist']
# getting list of videos
url = 'http://animedigitalnetwork.fr/index.php?option=com_vodvideo&view=playlist&format=raw'
response = session.post(url, data={
'playlist': playlist,
'season': '',
'order': 'DESC'
}, headers=headers)
soup = BeautifulSoup(response.content)
for video in soup.select('div.adn-video'):
print video.a.get('title')
打印(视频标题列表):
Naruto Shippuden 391
Naruto Shippuden 390
Naruto Shippuden 389
...
Naruto Shippuden 3
Naruto Shippuden 2
Naruto Shippuden 1
我想通过这个页面:
http://animedigitalnetwork.fr/video/naruto-shippuden
我测试这个:
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen('http://animedigitalnetwork.fr/video/naruto-shippuden')
soup = BeautifulSoup(page)
first_div = soup.find('div',{"class" : "adn-video"})
结果不是火影忍者!?
<div class="adn-video"> <div class="adn-video_screenshot">
<img src="http://image.animedigitalnetwork.fr/license/claymore/tv/web/eps1_328x184.jpg" alt="Claymore 1" /><span class="adn_video_play-button"></span> </div><div class="adn-video_text"><div class="adn-video_title">
<h4>Claymore</h4><span>Épisode 1</span><div class="adn-rating mobile-hide" itemprop="aggregateRating" itemscope="itemscope" itemtype="http://schema.org/AggregateRating"><meta itemprop="ratingValue" content="4.6667" /><meta itemprop="ratingCount" content="10" /><div id="adn-rating"><ul class="adn-rating_empty"><li></li><li></li><li></li><li></li><li></li></ul><ul class="adn-rating_rating"><li></li><li></li><li></li><li></li><li></li></ul></div><p class="adn-rating-message"></p></div></div><div class="adn-video_link">
<a title="Claymore 1" href="/video/claymore/1849-episode-1-la-claymore">Voir la vidéo</a>
</div></div></div>
这里有多个问题:
- 形成视频列表还涉及一个额外的异步调用:
- 首先需要从初始页面中提取
playlist
参数值 - 然后,发出 post 请求以将视频发送到 this url
- 首先需要从初始页面中提取
您正在使用 outdated and not maintained version of
BeautifulSoup
- switch tobs4
:pip install beautifulsoup4
使用
requests
instead ofurllib2
, maintain a session并传递请求headers
完整的工作代码:
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
# initialize session
session = requests.Session()
# getting playlist
response = session.get('http://animedigitalnetwork.fr/video/naruto-shippuden', headers=headers)
soup = BeautifulSoup(response.content)
playlist = soup.find('a', {'data-playlist': True})['data-playlist']
# getting list of videos
url = 'http://animedigitalnetwork.fr/index.php?option=com_vodvideo&view=playlist&format=raw'
response = session.post(url, data={
'playlist': playlist,
'season': '',
'order': 'DESC'
}, headers=headers)
soup = BeautifulSoup(response.content)
for video in soup.select('div.adn-video'):
print video.a.get('title')
打印(视频标题列表):
Naruto Shippuden 391
Naruto Shippuden 390
Naruto Shippuden 389
...
Naruto Shippuden 3
Naruto Shippuden 2
Naruto Shippuden 1