抓取本地加载的图像
Scraping images that were loaded locally
我正在学习 Beautiful Soup,但在尝试抓取从本地目录上传的图像时 运行 遇到了问题。我看到的错误是:
ValueError: unknown url type: 'images/ixa2.png'
我假设发生的情况是图像是从本地目录加载的,而不是通过 URL 托管的。这就是我检查要抓取的元素时的样子:
<img width="200" align="left" hspace="0" src="ixa/cards/axisofmortality.jpg">
我很好奇是否可以抓取这些图像,如果可以,如何抓取。
这是我正在使用的代码:
from urllib import request
import urllib.request
from bs4 import BeautifulSoup as soup
def make_soup(url):
result = request.urlopen(url)
page = result.read()
parsed_page = soup(page, "html.parser")
result.close()
return parsed_page
def get_images(url):
soup = make_soup(url)
images = [img for img in soup.findAll('img')]
print (str(len(images)) + "images found.")
print('Downloading images to current working directory.')
#compile our unicode list of image links
image_links = [each.get('src') for each in images]
for each in image_links:
filename=each.split('/')[-1]
urllib.request.urlretrieve(each, filename)
return image_links
get_images('http://mythicspoiler.com/')
您正在尝试从不完整的 url 下载图像。
我的建议是这样的:
def get_images(url):
soup = make_soup(url)
images = [img for img in soup.findAll('img')]
print (str(len(images)) + "images found.")
print('Downloading images to current working directory.')
#compile our unicode list of image links
image_links = [each.get('src') for each in images]
for each in image_links:
filename=each.split('/')[-1]
urllib.request.urlretrieve('http://mythicspoiler.com/' + each, filename) # <---
return image_links
我正在学习 Beautiful Soup,但在尝试抓取从本地目录上传的图像时 运行 遇到了问题。我看到的错误是:
ValueError: unknown url type: 'images/ixa2.png'
我假设发生的情况是图像是从本地目录加载的,而不是通过 URL 托管的。这就是我检查要抓取的元素时的样子:
<img width="200" align="left" hspace="0" src="ixa/cards/axisofmortality.jpg">
我很好奇是否可以抓取这些图像,如果可以,如何抓取。
这是我正在使用的代码:
from urllib import request
import urllib.request
from bs4 import BeautifulSoup as soup
def make_soup(url):
result = request.urlopen(url)
page = result.read()
parsed_page = soup(page, "html.parser")
result.close()
return parsed_page
def get_images(url):
soup = make_soup(url)
images = [img for img in soup.findAll('img')]
print (str(len(images)) + "images found.")
print('Downloading images to current working directory.')
#compile our unicode list of image links
image_links = [each.get('src') for each in images]
for each in image_links:
filename=each.split('/')[-1]
urllib.request.urlretrieve(each, filename)
return image_links
get_images('http://mythicspoiler.com/')
您正在尝试从不完整的 url 下载图像。
我的建议是这样的:
def get_images(url):
soup = make_soup(url)
images = [img for img in soup.findAll('img')]
print (str(len(images)) + "images found.")
print('Downloading images to current working directory.')
#compile our unicode list of image links
image_links = [each.get('src') for each in images]
for each in image_links:
filename=each.split('/')[-1]
urllib.request.urlretrieve('http://mythicspoiler.com/' + each, filename) # <---
return image_links