Scrapy 请求，shell 蜘蛛中的 Fetch()

Question

我正在尝试访问特定页面，我们称它为 http://example.com/puppers。使用 scrapy shell 或标准 scrapy.request 模块直接连接时无法访问此页面（结果为 <405> HTTP）。

但是，当我先使用 scrapy shell 'http://example.com/kittens'，然后使用 fetch('http://example.com/puppers') 时，它起作用了，我得到了一个 <200> OK HTTP 代码。我现在可以使用 scrapy shell.

提取数据

我尝试在我的脚本中实现这个，方法是在连接到 [=29= 时更改 referer（使用 url #1）、user-agent 和其他一些]puppers（url #2）页面。我仍然收到 <405> 代码..

感谢所有帮助。谢谢。

Answer 1

start_urls = ['http://example.com/kittens']

def parse(self, response):

    yield scrapy.Request(

        url="http://example.com/puppers",
        callback=self.parse_puppers
    )

def parse_puppers(self, response):
    #process your puppers
    .....

Scrapy 请求，shell 蜘蛛中的 Fetch()

Scrapy request, shell Fetch() in spider

python

scrapy

web-scraping

scrapy-spider