Scrapy FormRequest ,试图发送一个 post 请求 (FormRequest) 与货币变化 formdata
Scrapy FormRequest , trying to send a post request (FormRequest) with currency change formdata
我一直在尝试抓取以下内容 Website
但是随着货币从左上角的设置表单更改为 'SAR',我尝试发送一个像这样的 scrapy 请求:
r = Request(url='https://www.mooda.com/en/', cookies=[{'name': 'currency',
'value': 'SAR',
'domain': '.www.mooda.com',
'path': '/'}, {'name':'country','value':'SA','domain': '.www.mooda.com','path':'/'}],dont_filter=True)
我仍然得到 EG 的价格
In [10]: response.css('.price').xpath('text()').extract()
Out[10]:
[u'1,957 EG\xa3',
u'3,736 EG\xa3',
u'2,802 EG\xa3',
u'10,380 EG\xa3',
u'1,823 EG\xa3']
我也尝试发送一个带有指定表单数据的 post 请求
像这样:
from scrapy.http.request.form import FormRequest
url = 'https://www.mooda.com/en/'
r = FormRequest(url=url,formdata={'selectCurrency':'https://www.mooda.com/en/directory/currency/switch/currency/SAR/uenc/aHR0cHM6Ly93d3cubW9vZGEuY29tL2VuLw,,/'})
fetch(r)
仍然它永远不会工作,也尝试使用 FormRequest.from_response() 但它永远不会工作,我真的很喜欢一些建议,我是 scrapy 表单请求的新手,如果有人可以提供帮助,我会感恩
都是关于前端 cookie的,我会先告诉你如何用请求来做,逻辑和Scrapy完全一样:
head = { "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}
#
import requests
from bs4 import BeautifulSoup
with requests.Session() as s:
soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content)
r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
soup2 = BeautifulSoup(r.content)
print(soup2.select_one(".price").text)
您需要在id为selectCurrency
的选项下向url发出请求,然后将您向https://www.mooda.com/en?currency=sar
发出请求时返回的cookie传递。没有帖子,都是 get 请求,但是 get 中的 frontend cookie 是必不可少的。
如果我们 运行 代码,您会看到它确实为我们提供了正确的数据:
In [9]: with requests.Session() as s:
...: soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content,"lxml")
...: r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
...: r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
...: soup2 = BeautifulSoup(r.content,"lxml")
...: print(soup2.select_one(".price").text)
...:
825 SR
使用 scrapy:
class S(Spider):
name = "foo"
allowed_domains = ["www.mooda.com"]
start_urls = ["https://www.mooda.com/en"]
def parse(self, resp):
curr = resp.css("#selectCurrency option[value*='SAR']::attr(value)").extract_first()
return Request(curr, callback=self.parse2)
def parse2(self, resp):
print( resp.headers.getlist('Set-Cookie'))
return Request("https://www.mooda.com/en?currency=sar",cookies=cookies, callback=self.parse3)
def parse3(self, resp):
print(resp.css('.price').xpath('text()').extract())
如果你运行会给你:
['frontend=c95er9h1at2srhtqu5rkfo13g0; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com', 'currency=SAR; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com']
[u'825 SR', u'1,575 SR', u'1,181 SR', u'4,377 SR', u'769 SR']
get to curr returns 没什么,它只是设置 cookie
我一直在尝试抓取以下内容 Website 但是随着货币从左上角的设置表单更改为 'SAR',我尝试发送一个像这样的 scrapy 请求:
r = Request(url='https://www.mooda.com/en/', cookies=[{'name': 'currency',
'value': 'SAR',
'domain': '.www.mooda.com',
'path': '/'}, {'name':'country','value':'SA','domain': '.www.mooda.com','path':'/'}],dont_filter=True)
我仍然得到 EG 的价格
In [10]: response.css('.price').xpath('text()').extract()
Out[10]:
[u'1,957 EG\xa3',
u'3,736 EG\xa3',
u'2,802 EG\xa3',
u'10,380 EG\xa3',
u'1,823 EG\xa3']
我也尝试发送一个带有指定表单数据的 post 请求 像这样:
from scrapy.http.request.form import FormRequest
url = 'https://www.mooda.com/en/'
r = FormRequest(url=url,formdata={'selectCurrency':'https://www.mooda.com/en/directory/currency/switch/currency/SAR/uenc/aHR0cHM6Ly93d3cubW9vZGEuY29tL2VuLw,,/'})
fetch(r)
仍然它永远不会工作,也尝试使用 FormRequest.from_response() 但它永远不会工作,我真的很喜欢一些建议,我是 scrapy 表单请求的新手,如果有人可以提供帮助,我会感恩
都是关于前端 cookie的,我会先告诉你如何用请求来做,逻辑和Scrapy完全一样:
head = { "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}
#
import requests
from bs4 import BeautifulSoup
with requests.Session() as s:
soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content)
r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
soup2 = BeautifulSoup(r.content)
print(soup2.select_one(".price").text)
您需要在id为selectCurrency
的选项下向url发出请求,然后将您向https://www.mooda.com/en?currency=sar
发出请求时返回的cookie传递。没有帖子,都是 get 请求,但是 get 中的 frontend cookie 是必不可少的。
如果我们 运行 代码,您会看到它确实为我们提供了正确的数据:
In [9]: with requests.Session() as s:
...: soup = BeautifulSoup(s.get("https://www.mooda.com/en/").content,"lxml")
...: r2 = s.get(soup.select_one("#selectCurrency option[value*=SAR]")["value"])
...: r = s.get("https://www.mooda.com/en/", params={"currency": "sar"}, headers=head, cookies=dict(r2.cookies.items()))
...: soup2 = BeautifulSoup(r.content,"lxml")
...: print(soup2.select_one(".price").text)
...:
825 SR
使用 scrapy:
class S(Spider):
name = "foo"
allowed_domains = ["www.mooda.com"]
start_urls = ["https://www.mooda.com/en"]
def parse(self, resp):
curr = resp.css("#selectCurrency option[value*='SAR']::attr(value)").extract_first()
return Request(curr, callback=self.parse2)
def parse2(self, resp):
print( resp.headers.getlist('Set-Cookie'))
return Request("https://www.mooda.com/en?currency=sar",cookies=cookies, callback=self.parse3)
def parse3(self, resp):
print(resp.css('.price').xpath('text()').extract())
如果你运行会给你:
['frontend=c95er9h1at2srhtqu5rkfo13g0; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com', 'currency=SAR; expires=Wed, 28-Jun-2017 08:56:08 GMT; path=/; domain=www.mooda.com']
[u'825 SR', u'1,575 SR', u'1,181 SR', u'4,377 SR', u'769 SR']
get to curr returns 没什么,它只是设置 cookie