蜘蛛没有找到任何页面
The spider isn't finding any pages
我正在尝试 运行 我的第一只蜘蛛,但很挣扎。我可以将它转到 运行,但找不到任何页面。如果有人有任何想法,他们将不胜感激。
我的代码是:
from scrapy.spiders import Spider
from scrapy.selector import Selector
from second_hotel.items import Website
class secondhotelSpider(Spider):
name = "second_hotel_spider.py"
allowed_domains = ["uk.hotels.com"]
start_urls = [
"https://uk.hotels.com/hotel/details.html?FPQ=6&WOE=1&q-localised-check-out=10/04/2017&WOD=1&q-room-0-children=0&pa=1&tab=description&JHR=9&q-localised-check-in=03/04/2017&hotel-id=128604&q-room-0-adults=2&YGF=14&MGT=7&ZSX=0&SYE=3",
"https://uk.hotels.com/hotel/details.html?FPQ=6&WOE=1&q-localised-check-out=04/04/2016&WOD=7&q-room-0-children=0&pa=1&tab=description&JHR=8&q-localised-check-in=03/04/2016&hotel-id=424807&q-room-0-adults=2&YGF=2&MGT=1&ZSX=0&SYE=3",
]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//ul[@class="directory-url"]/li')
items = []
for site in sites:
item = Website()
item['name'] = site.xpath('a/text()').extract()
item['link'] = site.xpath('a/@href').extract()
item['description'] = site.xpath('text()').re('-\s[^\n]*\r')
items.append(item)
print items
return items
提前致谢。
您的 xpath 表达式有误,因此 sites
变量为空。
您可以通过 scrapy shell
检查您的 xpath
scrapy shell 'https://uk.hotels.com/hotel/details.html?FPQ=6&WOE=1&q-localised-check-out=04/04/2016&WOD=7&q-room-0-children=0&pa=1&tab=description&JHR=8&q-localised-check-in=03/04/2016&hotel-id=424807&q-room-0-adults=2&YGF=2&MGT=1&ZSX=0&SYE=3'
In [4]: response.xpath('//ul[@class="directory-url"]/li')
Out[4]: []
或 inspect_response(response, self)
在 parse
方法中。
from scrapy.shell import inspect_response
inspect_response(response, self)
start_urls
页面不包含具有 [@class="directory-url"]
的元素
我正在尝试 运行 我的第一只蜘蛛,但很挣扎。我可以将它转到 运行,但找不到任何页面。如果有人有任何想法,他们将不胜感激。 我的代码是:
from scrapy.spiders import Spider
from scrapy.selector import Selector
from second_hotel.items import Website
class secondhotelSpider(Spider):
name = "second_hotel_spider.py"
allowed_domains = ["uk.hotels.com"]
start_urls = [
"https://uk.hotels.com/hotel/details.html?FPQ=6&WOE=1&q-localised-check-out=10/04/2017&WOD=1&q-room-0-children=0&pa=1&tab=description&JHR=9&q-localised-check-in=03/04/2017&hotel-id=128604&q-room-0-adults=2&YGF=14&MGT=7&ZSX=0&SYE=3",
"https://uk.hotels.com/hotel/details.html?FPQ=6&WOE=1&q-localised-check-out=04/04/2016&WOD=7&q-room-0-children=0&pa=1&tab=description&JHR=8&q-localised-check-in=03/04/2016&hotel-id=424807&q-room-0-adults=2&YGF=2&MGT=1&ZSX=0&SYE=3",
]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//ul[@class="directory-url"]/li')
items = []
for site in sites:
item = Website()
item['name'] = site.xpath('a/text()').extract()
item['link'] = site.xpath('a/@href').extract()
item['description'] = site.xpath('text()').re('-\s[^\n]*\r')
items.append(item)
print items
return items
提前致谢。
您的 xpath 表达式有误,因此 sites
变量为空。
您可以通过 scrapy shell
scrapy shell 'https://uk.hotels.com/hotel/details.html?FPQ=6&WOE=1&q-localised-check-out=04/04/2016&WOD=7&q-room-0-children=0&pa=1&tab=description&JHR=8&q-localised-check-in=03/04/2016&hotel-id=424807&q-room-0-adults=2&YGF=2&MGT=1&ZSX=0&SYE=3'
In [4]: response.xpath('//ul[@class="directory-url"]/li')
Out[4]: []
或 inspect_response(response, self)
在 parse
方法中。
from scrapy.shell import inspect_response
inspect_response(response, self)
start_urls
页面不包含具有 [@class="directory-url"]