Scrapy 1.8.0 returns 错误 500,但 Python 代码 returns 成功 200
Scrapy 1.8.0 returns error 500, but Python code returns success 200
地址是:'https://planningapi.agileapplications.co.uk/api/application/search?reference=GDO+19%2F12'
我可以通过 Python 请求库轻松下载此页面:
headers = {
'x-client': 'EXMOOR',
'x-product': 'CITIZENPORTAL',
'x-service': 'PA',
}
url='https://planningapi.agileapplications.co.uk/api/application/search?reference=GDO+19%2F12'
resp = requests.get(url, headers=headers)
或者我可以通过 CURL 轻松下载页面:
curl 'https://planningapi.agileapplications.co.uk/api/application/search?reference=GDO+19%2F12' -H 'x-product: CITIZENPORTAL' -H 'x-service: PA' -H 'x-client: EXMOOR'
他们都 return 状态 200 结果:
{"total":1,"results":[{"id":18468,"reference":"GDO 19/12","proposal":"Prior notification for excavations to bury tanks and trenches to lay water pipes","location":"Land North West of North and South Ley, Exford, Minehead, Somerset.","username":"","applicantSurname":"Mr & Mrs M Burnett","agentName":"JCH Planning Limited","decisionText":null,"registrationDate":"2019-10-04","decisionDate":"2019-10-30","finalGrantDate":null,"appealLodgedDate":null,"appealDecisionDate":null,"areaId":[],"wardId":[],"parishId":[3],"responded":null,"lastLetterDate":null,"targetResponseDate":null}]}
但是Scrapy return的状态500错误:
formdata = {'reference': 'GDO 19/12', }
headers = {
'x-client': 'EXMOOR',
'x-product': 'CITIZENPORTAL',
'x-service': 'PA',
}
fr = scrapy.FormRequest(
url='https://planningapi.agileapplications.co.uk/api/application/search',
method='GET',
meta=response.meta,
headers=headers,
formdata=formdata,
dont_filter=True,
callback=self.ref_result_2,
)
yield fr
可能是因为 Scrapy 将 headers 键大写(我试过 un-capitalizing 它们,但后来 Twisted 做了同样的事情 - 它再次将它们大写),也许是出于其他原因。
如何调整我的 Scrapy 1.8.0 代码以成功获得与 Python 请求相同的结果?
确实是Scrapy将header字段大写造成的。如果您尝试在 cURL 命令中将 then 大写,您将得到与使用 Scrapy 时相同的错误(您可以在蜘蛛 class 中的 Scrapy 设置 handle_httpstatus_list
中测试它并打印 response.text
在解析方法中)。正如您也已经说过的那样,Twisted 也是如此,因此覆盖 scrapy.http.Headers
不是解决方案。
但是,根据 this issue comment:
,您可以采取一些技巧使 Twisted 不将特定的 header 大写
# -*- coding: utf-8 -*-
from pprint import pprint
import scrapy
from twisted.web.http_headers import Headers as TwistedHeaders
TwistedHeaders._caseMappings.update({
b'x-client': b'x-client',
b'x-product': b'x-product',
b'x-service': b'x-service',
})
class Foo(scrapy.Spider):
name = 'foo'
handle_httpstatus_list = [500]
def start_requests(self):
formdata = {'reference': 'GDO 19/12'}
headers = {
'x-client': 'EXMOOR',
'x-product': 'CITIZENPORTAL',
'x-service': 'PA'
}
yield scrapy.FormRequest(
'https://planningapi.agileapplications.co.uk/api/application/search',
method='GET', headers=headers, formdata=formdata, callback=self.parse)
def parse(self, response):
pprint(response.text)
现在你会得到结果。另一方面,根据 RFC 7230,第 3.2 节,header 字段应为 case-insensitive.
地址是:'https://planningapi.agileapplications.co.uk/api/application/search?reference=GDO+19%2F12'
我可以通过 Python 请求库轻松下载此页面:
headers = {
'x-client': 'EXMOOR',
'x-product': 'CITIZENPORTAL',
'x-service': 'PA',
}
url='https://planningapi.agileapplications.co.uk/api/application/search?reference=GDO+19%2F12'
resp = requests.get(url, headers=headers)
或者我可以通过 CURL 轻松下载页面:
curl 'https://planningapi.agileapplications.co.uk/api/application/search?reference=GDO+19%2F12' -H 'x-product: CITIZENPORTAL' -H 'x-service: PA' -H 'x-client: EXMOOR'
他们都 return 状态 200 结果:
{"total":1,"results":[{"id":18468,"reference":"GDO 19/12","proposal":"Prior notification for excavations to bury tanks and trenches to lay water pipes","location":"Land North West of North and South Ley, Exford, Minehead, Somerset.","username":"","applicantSurname":"Mr & Mrs M Burnett","agentName":"JCH Planning Limited","decisionText":null,"registrationDate":"2019-10-04","decisionDate":"2019-10-30","finalGrantDate":null,"appealLodgedDate":null,"appealDecisionDate":null,"areaId":[],"wardId":[],"parishId":[3],"responded":null,"lastLetterDate":null,"targetResponseDate":null}]}
但是Scrapy return的状态500错误:
formdata = {'reference': 'GDO 19/12', }
headers = {
'x-client': 'EXMOOR',
'x-product': 'CITIZENPORTAL',
'x-service': 'PA',
}
fr = scrapy.FormRequest(
url='https://planningapi.agileapplications.co.uk/api/application/search',
method='GET',
meta=response.meta,
headers=headers,
formdata=formdata,
dont_filter=True,
callback=self.ref_result_2,
)
yield fr
可能是因为 Scrapy 将 headers 键大写(我试过 un-capitalizing 它们,但后来 Twisted 做了同样的事情 - 它再次将它们大写),也许是出于其他原因。
如何调整我的 Scrapy 1.8.0 代码以成功获得与 Python 请求相同的结果?
确实是Scrapy将header字段大写造成的。如果您尝试在 cURL 命令中将 then 大写,您将得到与使用 Scrapy 时相同的错误(您可以在蜘蛛 class 中的 Scrapy 设置 handle_httpstatus_list
中测试它并打印 response.text
在解析方法中)。正如您也已经说过的那样,Twisted 也是如此,因此覆盖 scrapy.http.Headers
不是解决方案。
但是,根据 this issue comment:
,您可以采取一些技巧使 Twisted 不将特定的 header 大写# -*- coding: utf-8 -*-
from pprint import pprint
import scrapy
from twisted.web.http_headers import Headers as TwistedHeaders
TwistedHeaders._caseMappings.update({
b'x-client': b'x-client',
b'x-product': b'x-product',
b'x-service': b'x-service',
})
class Foo(scrapy.Spider):
name = 'foo'
handle_httpstatus_list = [500]
def start_requests(self):
formdata = {'reference': 'GDO 19/12'}
headers = {
'x-client': 'EXMOOR',
'x-product': 'CITIZENPORTAL',
'x-service': 'PA'
}
yield scrapy.FormRequest(
'https://planningapi.agileapplications.co.uk/api/application/search',
method='GET', headers=headers, formdata=formdata, callback=self.parse)
def parse(self, response):
pprint(response.text)
现在你会得到结果。另一方面,根据 RFC 7230,第 3.2 节,header 字段应为 case-insensitive.