Scrapy Spider Xpath 图像 Url

Question

我有一个 scrapy 蜘蛛，它接收所需关键字的输入，然后产生搜索结果 url。然后它会抓取 URL 以抓取关于 'item' 内每辆汽车结果的所需值。我正在尝试在我生成的项目中添加 url 用于每辆全尺寸汽车图像 link，它伴随着车辆结果列表中的每辆汽车。

当我输入关键字"honda"时，正在抓取的具体url如下： Honda search results example

我一直无法弄清楚编写 xpath 的正确方法，然后将我获得的任何图像列表 url 包含到蜘蛛的 'item' 我在最后一部分屈服我的代码。现在，当使用命令 "scrapy crawl lkq -o items.csv -t csv" lkq.py 蜘蛛运行将 Items 保存到 .csv 文件时，图片的 items.csv 文件的列全为零而不是图像 url 的。

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import scrapy
from scrapy.shell import inspect_response
from scrapy.utils.response import open_in_browser

keyword = raw_input('Keyword: ')
url =     'http://www.lkqpickyourpart.com/DesktopModules/pyp_vehicleInventory/getVehicleInventory.aspx?store=224&page=0&filter=%s&sp=&cl=&carbuyYardCode=1224&pageSize=1000&language=en-US' % (keyword,)
class Cars(scrapy.Item):
Make = scrapy.Field()
Model = scrapy.Field()
Year = scrapy.Field()
Entered_Yard = scrapy.Field()
Section = scrapy.Field()
Color = scrapy.Field()
Picture = scrapy.Field()


class LkqSpider(scrapy.Spider):
name = "lkq"
allowed_domains = ["lkqpickyourpart.com"]
start_urls = (
    url,
)

def parse(self, response):
    picture = response.xpath(
        '//href=/text()').extract()
    section_color = response.xpath(
        '//div[@class="pypvi_notes"]/p/text()').extract()
    info = response.xpath('//td["pypvi_make"]/text()').extract()
    for element in range(0, len(info), 4):
        item = Cars()
        item["Make"] = info[element]
        item["Model"] = info[element + 1]
        item["Year"] = info[element + 2]
        item["Entered_Yard"] = info[element + 3]
        item["Section"] = section_color.pop(
            0).replace("Section:", "").strip()
        item["Color"] = section_color.pop(0).replace("Color:",   "").strip()
        item["Picture"] = picture.pop(0).strip()
        yield item

Answer 1

我真的不明白你为什么要使用像 '//href=/text()' 这样的 xpath，我建议先阅读一些 xpath 教程，here 是一个很好的教程。

如果你想获取所有图片 url，我想这就是你想要的

pictures = response.xpath('//img/@src').extract()

现在 picture.pop(0).strip() 只会给你最后的 url 和 strip 它，记住 .extract() returns 一个列表，所以 pictures 现在包含所有图片链接，只需在那里选择您需要的链接即可。

Scrapy Spider Xpath 图像 Url

Scrapy Spider Xpath Image Url

python

csv

xpath

scrapy

scrapy-spider