Beautifulsoup 没有 img
Beautifulsoup no img
我正在尝试使用 bs4 在 python 2.7 中编写脚本来抓取图像并将文件重命名到我的服务器并以低带宽友好的方式显示它,并在 cronjobs 上更新它3 小时覆盖现有图像。
我的代码中的问题是什么都没有出现,甚至连错误都没有。
这是实际代码:
import requests
import random
from bs4 import BeautifulSoup
def download_web_image(url):
name = random.randrange(1, 1000)
full_name = str(name) + "psdata.gif"
urllib.request.urlretrieve(url, full_name)
timecapture = (0, 24, 48, 72)
for time in timecapture:
url = 'http://www.weatheronline.co.uk/marine/weather?LEVEL=4&LANG=en&TIME=' + str(time) + '&CEL=C&SI=mph&MN=gfs&MODELLTYP=pslv&WIND=g205'
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.find('img', src=True):
href = 'http://www.weatheronline.co.uk' + link.get('href')
download_web_image(href)
网页数据在这个标签之间:
<div class="zent">
<img usemap="#karte" class="eMap" id="pictureid" src="/daten/sailcharts/gfs/2015/03/11/pslv_poly_06-2015031018.gif" border="0" alt="We 11.03.2015 06 UTC" width="634" height="490">
</div>
通过id
获取图片。要加入 URL 个部分,请使用 urlparse.urljoin()
:
base_url = 'http://www.weatheronline.co.uk'
print urljoin(base_url, soup.find('img', id='pictureid')['src'])
下面将为您提供任何 'img' 和 'src' 的链接:
samplehtml="""
<div class="zent">
<img usemap="#karte" class="eMap" id="pictureid" src="/daten/sailcharts/gfs/2015/03/11/pslv_poly_06-2015031018.gif" border="0" alt="We 11.03.2015 06 UTC" width="634" height="490">
<img usemap="#karte" class="eMap" id="pictureid" src="/daten/sailcharts/gfs/2015/03/11/pslv_poly_06-2015031018.gif" border="0" alt="We 11.03.2015 06 UTC" width="634" height="490">
</div>
"""
baseurl = 'http://www.weatheronline.co.uk'
from bs4 import BeautifulSoup
imglinks = [baseurl+x['src'] for x in BeautifulSoup(samplehtml).find_all('img',src=True)]
print imglinks
输出:
[u'http://www.weatheronline.co.uk/daten/sailcharts/gfs/2015/03/11/pslv_poly_06-2015031018.gif',
u'http://www.weatheronline.co.uk/daten/sailcharts/gfs/2015/03/11/pslv_poly_06-2015031018.gif']
我正在尝试使用 bs4 在 python 2.7 中编写脚本来抓取图像并将文件重命名到我的服务器并以低带宽友好的方式显示它,并在 cronjobs 上更新它3 小时覆盖现有图像。
我的代码中的问题是什么都没有出现,甚至连错误都没有。
这是实际代码:
import requests
import random
from bs4 import BeautifulSoup
def download_web_image(url):
name = random.randrange(1, 1000)
full_name = str(name) + "psdata.gif"
urllib.request.urlretrieve(url, full_name)
timecapture = (0, 24, 48, 72)
for time in timecapture:
url = 'http://www.weatheronline.co.uk/marine/weather?LEVEL=4&LANG=en&TIME=' + str(time) + '&CEL=C&SI=mph&MN=gfs&MODELLTYP=pslv&WIND=g205'
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.find('img', src=True):
href = 'http://www.weatheronline.co.uk' + link.get('href')
download_web_image(href)
网页数据在这个标签之间:
<div class="zent">
<img usemap="#karte" class="eMap" id="pictureid" src="/daten/sailcharts/gfs/2015/03/11/pslv_poly_06-2015031018.gif" border="0" alt="We 11.03.2015 06 UTC" width="634" height="490">
</div>
通过id
获取图片。要加入 URL 个部分,请使用 urlparse.urljoin()
:
base_url = 'http://www.weatheronline.co.uk'
print urljoin(base_url, soup.find('img', id='pictureid')['src'])
下面将为您提供任何 'img' 和 'src' 的链接:
samplehtml="""
<div class="zent">
<img usemap="#karte" class="eMap" id="pictureid" src="/daten/sailcharts/gfs/2015/03/11/pslv_poly_06-2015031018.gif" border="0" alt="We 11.03.2015 06 UTC" width="634" height="490">
<img usemap="#karte" class="eMap" id="pictureid" src="/daten/sailcharts/gfs/2015/03/11/pslv_poly_06-2015031018.gif" border="0" alt="We 11.03.2015 06 UTC" width="634" height="490">
</div>
"""
baseurl = 'http://www.weatheronline.co.uk'
from bs4 import BeautifulSoup
imglinks = [baseurl+x['src'] for x in BeautifulSoup(samplehtml).find_all('img',src=True)]
print imglinks
输出:
[u'http://www.weatheronline.co.uk/daten/sailcharts/gfs/2015/03/11/pslv_poly_06-2015031018.gif',
u'http://www.weatheronline.co.uk/daten/sailcharts/gfs/2015/03/11/pslv_poly_06-2015031018.gif']