蜘蛛没有找到任何页面

Question

我正在尝试运行我的第一只蜘蛛，但很挣扎。我可以将它转到运行，但找不到任何页面。如果有人有任何想法，他们将不胜感激。我的代码是：

from scrapy.spiders import Spider
from scrapy.selector import Selector

from second_hotel.items import Website


class secondhotelSpider(Spider):
    name = "second_hotel_spider.py"
    allowed_domains = ["uk.hotels.com"]
    start_urls = [
        "https://uk.hotels.com/hotel/details.html?FPQ=6&WOE=1&q-localised-check-out=10/04/2017&WOD=1&q-room-0-children=0&pa=1&tab=description&JHR=9&q-localised-check-in=03/04/2017&hotel-id=128604&q-room-0-adults=2&YGF=14&MGT=7&ZSX=0&SYE=3",
        "https://uk.hotels.com/hotel/details.html?FPQ=6&WOE=1&q-localised-check-out=04/04/2016&WOD=7&q-room-0-children=0&pa=1&tab=description&JHR=8&q-localised-check-in=03/04/2016&hotel-id=424807&q-room-0-adults=2&YGF=2&MGT=1&ZSX=0&SYE=3",
    ]

    def parse(self, response):

        sel = Selector(response)
        sites = sel.xpath('//ul[@class="directory-url"]/li')
        items = []

        for site in sites:
            item = Website()
            item['name'] = site.xpath('a/text()').extract()
            item['link'] = site.xpath('a/@href').extract()
            item['description'] = site.xpath('text()').re('-\s[^\n]*\r')
            items.append(item)

        print items
        return items

提前致谢。

Answer 1

您的 xpath 表达式有误，因此 sites 变量为空。

您可以通过 scrapy shell

检查您的 xpath

scrapy shell 'https://uk.hotels.com/hotel/details.html?FPQ=6&WOE=1&q-localised-check-out=04/04/2016&WOD=7&q-room-0-children=0&pa=1&tab=description&JHR=8&q-localised-check-in=03/04/2016&hotel-id=424807&q-room-0-adults=2&YGF=2&MGT=1&ZSX=0&SYE=3'
In [4]: response.xpath('//ul[@class="directory-url"]/li')
Out[4]: []

或 inspect_response(response, self) 在 parse 方法中。

from scrapy.shell import inspect_response
inspect_response(response, self)

start_urls 页面不包含具有 [@class="directory-url"]

的元素

蜘蛛没有找到任何页面

The spider isn't finding any pages

python

scrapy

scrapy-spider