Scrapy ImagesPipeline 警告:文件(未知错误):从 <GET 下载图像时出错
Scrapy ImagesPipeline WARNING: File (unknown-error): Error downloading image from <GET
我正在学习 Python 和 Scrapy,我正在学习如何使用它下载图像。我现在有点卡住了,我不知道真正的问题是什么。
我在 运行 蜘蛛
时收到此错误消息
<None>: Unsupported URL scheme '': no handler available for that scheme
和
[imageflip] WARNING: File (unknown-error): Error downloading image from <GET
请看我的pipelines.py这里
import scrapy
from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.exceptions import DropItem
class PriceoflipkartPipeline(object):
def process_item(self, item, spider):
return item
class MyImagesPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield scrapy.Request(image_url)
def item_completed(self, results, item, info):
image_paths = [x['path'] for ok, x in results if ok]
if not image_paths:
raise DropItem("Item contains no images")
item['image_paths'] = image_paths
return item
请看我的settings.py这里
SPIDER_MODULES = ['PriceoFlipkart.spiders']
NEWSPIDER_MODULE = 'PriceoFlipkart.spiders'
ITEM_PIPELINES = ['scrapy.contrib.pipeline.images.ImagesPipeline']
IMAGES_STORE = 'D:\PriceoFlipkart\Images'
IMAGES_EXPIRES = 90
请看这里我的蜘蛛
import scrapy
from PriceoFlipkart.items import PriceoflipkartItem
class FlipkartSpider(scrapy.Spider):
name = "imageflip"
allowed_domains = ["flipkart.com"]
start_urls = [
"http://www.flipkart.com/moto-g-2nd-gen/p/itme5z8n9mt77ajr?pid=MOBDYGZ6SHNB7RFC&srno=b_1&ref=06f4e48c-9548-45fa-b3ac-fa5fdf0e0d22"
]
def parse(self, response):
for sel in response.xpath('//body'):
item = PriceoflipkartItem()
item['image_urls'] = sel.select('//img[@class="productImage current"]').extract()
yield item
并且在我的 item.py 中添加了以下代码
image_urls = scrapy.Field()
images = scrapy.Field()
请告诉我如何正确配置它以便下载图像。我在 Windows 8 机器上。先感谢您。
用于提取图像 URLs 的 XPath 不正确,它应该在末尾包含 /@src
以仅提取图像的 URL。让它像:
item['image_urls'] = sel.select(
'//img[@class="productImage current"]/@src').extract()
我正在学习 Python 和 Scrapy,我正在学习如何使用它下载图像。我现在有点卡住了,我不知道真正的问题是什么。
我在 运行 蜘蛛
时收到此错误消息<None>: Unsupported URL scheme '': no handler available for that scheme
和
[imageflip] WARNING: File (unknown-error): Error downloading image from <GET
请看我的pipelines.py这里
import scrapy
from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.exceptions import DropItem
class PriceoflipkartPipeline(object):
def process_item(self, item, spider):
return item
class MyImagesPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield scrapy.Request(image_url)
def item_completed(self, results, item, info):
image_paths = [x['path'] for ok, x in results if ok]
if not image_paths:
raise DropItem("Item contains no images")
item['image_paths'] = image_paths
return item
请看我的settings.py这里
SPIDER_MODULES = ['PriceoFlipkart.spiders']
NEWSPIDER_MODULE = 'PriceoFlipkart.spiders'
ITEM_PIPELINES = ['scrapy.contrib.pipeline.images.ImagesPipeline']
IMAGES_STORE = 'D:\PriceoFlipkart\Images'
IMAGES_EXPIRES = 90
请看这里我的蜘蛛
import scrapy
from PriceoFlipkart.items import PriceoflipkartItem
class FlipkartSpider(scrapy.Spider):
name = "imageflip"
allowed_domains = ["flipkart.com"]
start_urls = [
"http://www.flipkart.com/moto-g-2nd-gen/p/itme5z8n9mt77ajr?pid=MOBDYGZ6SHNB7RFC&srno=b_1&ref=06f4e48c-9548-45fa-b3ac-fa5fdf0e0d22"
]
def parse(self, response):
for sel in response.xpath('//body'):
item = PriceoflipkartItem()
item['image_urls'] = sel.select('//img[@class="productImage current"]').extract()
yield item
并且在我的 item.py 中添加了以下代码
image_urls = scrapy.Field()
images = scrapy.Field()
请告诉我如何正确配置它以便下载图像。我在 Windows 8 机器上。先感谢您。
用于提取图像 URLs 的 XPath 不正确,它应该在末尾包含 /@src
以仅提取图像的 URL。让它像:
item['image_urls'] = sel.select(
'//img[@class="productImage current"]/@src').extract()