在 Scrapy 中发送 Post 请求
Send Post Request in Scrapy
我正在尝试从 google Play 商店中抓取最新评论,为了获得这些评论,我需要发出 post 请求。
使用 Postman,效果很好,我得到了想要的回复。
但是终端中的 post 请求给我一个服务器错误
例如:此页面 https://play.google.com/store/apps/details?id=com.supercell.boombeach
curl -H "Content-Type: application/json" -X POST -d '{"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}' https://play.google.com/store/getreviews
出现服务器错误并且
Scrapy 忽略这一行:
frmdata = {"id": "com.supercell.boombeach", "reviewType": 0, "reviewSortOrder": 0, "pageNum":0}
url = "https://play.google.com/store/getreviews"
yield Request(url, callback=self.parse, method="POST", body=urllib.urlencode(frmdata))
确保 formdata
中的每个元素都是 string/unicode
类型
frmdata = {"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}
url = "https://play.google.com/store/getreviews"
yield FormRequest(url, callback=self.parse, formdata=frmdata)
我想这样就可以了
In [1]: from scrapy.http import FormRequest
In [2]: frmdata = {"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}
In [3]: url = "https://play.google.com/store/getreviews"
In [4]: r = FormRequest(url, formdata=frmdata)
In [5]: fetch(r)
2015-05-20 14:40:09+0530 [default] DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: None)
[s] Available Scrapy objects:
[s] crawler <scrapy.crawler.Crawler object at 0x7f3ea4258890>
[s] item {}
[s] r <POST https://play.google.com/store/getreviews>
[s] request <POST https://play.google.com/store/getreviews>
[s] response <200 https://play.google.com/store/getreviews>
[s] settings <scrapy.settings.Settings object at 0x7f3eaa205450>
[s] spider <Spider 'default' at 0x7f3ea3449cd0>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
在 Scrapy 中使用 Post 的示例页面遍历:
def directory_page(self,response):
if response:
profiles = response.xpath("//div[@class='heading-h']/h3/a/@href").extract()
for profile in profiles:
yield Request(urljoin(response.url,profile),callback=self.profile_collector)
page = response.meta['page'] + 1
if page :
yield FormRequest('https://rotmanconnect.com/AlumniDirectory/getmorerecentjoineduser',
formdata={'isSortByName':'false','pageNumber':str(page)},
callback= self.directory_page,
meta={'page':page})
else:
print "No more page available"
上面的回答并没有真正解决问题。他们将数据作为参数发送,而不是 JSON 数据作为请求的主体。
来自http://bajiecc.cc/questions/1135255/scrapy-formrequest-sending-json:
my_data = {'field1': 'value1', 'field2': 'value2'}
request = scrapy.Request( url, method='POST',
body=json.dumps(my_data),
headers={'Content-Type':'application/json'} )
我正在尝试从 google Play 商店中抓取最新评论,为了获得这些评论,我需要发出 post 请求。
使用 Postman,效果很好,我得到了想要的回复。
但是终端中的 post 请求给我一个服务器错误
例如:此页面 https://play.google.com/store/apps/details?id=com.supercell.boombeach
curl -H "Content-Type: application/json" -X POST -d '{"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}' https://play.google.com/store/getreviews
出现服务器错误并且
Scrapy 忽略这一行:
frmdata = {"id": "com.supercell.boombeach", "reviewType": 0, "reviewSortOrder": 0, "pageNum":0}
url = "https://play.google.com/store/getreviews"
yield Request(url, callback=self.parse, method="POST", body=urllib.urlencode(frmdata))
确保 formdata
中的每个元素都是 string/unicode
frmdata = {"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}
url = "https://play.google.com/store/getreviews"
yield FormRequest(url, callback=self.parse, formdata=frmdata)
我想这样就可以了
In [1]: from scrapy.http import FormRequest
In [2]: frmdata = {"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}
In [3]: url = "https://play.google.com/store/getreviews"
In [4]: r = FormRequest(url, formdata=frmdata)
In [5]: fetch(r)
2015-05-20 14:40:09+0530 [default] DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: None)
[s] Available Scrapy objects:
[s] crawler <scrapy.crawler.Crawler object at 0x7f3ea4258890>
[s] item {}
[s] r <POST https://play.google.com/store/getreviews>
[s] request <POST https://play.google.com/store/getreviews>
[s] response <200 https://play.google.com/store/getreviews>
[s] settings <scrapy.settings.Settings object at 0x7f3eaa205450>
[s] spider <Spider 'default' at 0x7f3ea3449cd0>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
在 Scrapy 中使用 Post 的示例页面遍历:
def directory_page(self,response):
if response:
profiles = response.xpath("//div[@class='heading-h']/h3/a/@href").extract()
for profile in profiles:
yield Request(urljoin(response.url,profile),callback=self.profile_collector)
page = response.meta['page'] + 1
if page :
yield FormRequest('https://rotmanconnect.com/AlumniDirectory/getmorerecentjoineduser',
formdata={'isSortByName':'false','pageNumber':str(page)},
callback= self.directory_page,
meta={'page':page})
else:
print "No more page available"
上面的回答并没有真正解决问题。他们将数据作为参数发送,而不是 JSON 数据作为请求的主体。
来自http://bajiecc.cc/questions/1135255/scrapy-formrequest-sending-json:
my_data = {'field1': 'value1', 'field2': 'value2'}
request = scrapy.Request( url, method='POST',
body=json.dumps(my_data),
headers={'Content-Type':'application/json'} )