如何识别请求中需要发送的关键信息？

Question

我想从 this website 中获取一些票价，它使用自动完成请求。

这是我的代码：

import scrapy
from scrapy.http import Request, FormRequest
import urllib

class CabforceSpider(scrapy.Spider):
    name = 'cabforce'
    start_urls = ['https://www.cabforce.com']
    complete_url = 'https://www.cabforce.com/v1/geo/autocomplete'

    def parse(self, response):
        payload = {
            'chnl': 'cforce',
            'complete': 'Barcelona Airport',
            'destination': 'Barcelona'
        }
        return Request(
            self.complete_url,
            self.print_json,
            method='POST',
            body=urllib.urlencode(payload),
            headers={'X-Requested-With': 'XMLHttpRequest'})

    def print_json(self, response):
        print response.body

很遗憾，我的回复是这样的：

{"status":"ArgumentError","reason":"Cannot validate input","description":null,"reasonType":2000,"details":[]}

如何找出缺少但需要随请求一起发送的信息？我考虑了 JSESSIONID 和版本，但我不知道该怎么做。感谢您的任何提示，祝您有愉快的一天！

Answer 1

可能在表单中隐藏了您未提交的数据。使用 FormRequest 对象而不是简单的 Request。此请求将自动填充所有字段，您可以仅覆盖您想要更改的字段。

看看documentation。

Answer 2

您甚至不需要随请求一起发送 cookie。问题在于

body=urllib.urlencode(payload),

这会将正文编码为 URL 格式，但是如果您查看浏览器请求的正文，您会看到 JSON 是正文。

所以解决方案是 import json 并将上面提到的行更改为这一行：

body=json.dumps(payload),

在这种情况下，我通过您的蜘蛛得到以下结果：

{"status":"Ok","result":{"autocomplete":{"elements":[{"type":16,"description":"(BCN) - Barcelona Airport, Barcelona, Spain","location":{"lat":41.289545,"lng":2.072639},"raw":{"name":"(BCN) - Barcelona Airport","city":"Barcelona","country":"Spain"}},{"location":{"lat":41.3181887517739,"lng":2.07441323388724},"description":"Barcelona Airport Hotel, Plaza Volatería, 3, El Prat de Llobregat, Spain","raw":{"name":"Barcelona Airport Hotel","city":"El Prat de Llobregat","country":"Spain"},"type":4},{"location":{"lat":41.3176275,"lng":2.0249774},"description":"Airport Barcelona Apartments, Rafael Casanova, 37, Viladecans, Spain","raw":{"name":"Airport Barcelona Apartments","city":"Viladecans","country":"Spain"},"type":4}]}}}

如何识别请求中需要发送的关键信息？

How to identify a request's crucial information that needs to be sent?

session-cookies

scrapy

web-scraping

scrapy-spider