尝试使用 Scrapy FormRequest 填写表单,出现意外结果
Trying to fill a form with Scrapy FormRequest, unexpected results
我正在尝试填写位于 www.wetseal.com/Stores 的表格,该表格允许选择要显示商店的州。
<form action="http://www.wetseal.com/Stores?dwcont=C73689620" method="post" id="dwfrm_storelocator_state">
<fieldset>
<div class="form-row required ">
<label for="dwfrm_storelocator_address_states_stateUSCA">
<span>State</span>
<span class="required-indicator">*</span>
</label>
<select id="dwfrm_storelocator_address_states_stateUSCA" class="input-select required" name="dwfrm_storelocator_address_states_stateUSCA">
<option value="">Select...</option>
<option value="AK">Alaska</option>
<option value="AZ">Arizona</option>
<option value="AR">Arkansas</option>
<option value="CA">California</option>
<option value="CO">Colorado</option>
<option value="CT">Connecticut</option>
<option value="DE">Delaware</option>
<option value="FL">Florida</option>
<option value="GA">Georgia</option>
<option value="HI">Hawaii</option>
<option value="ID">Idaho</option>
<option value="IL">Illinois</option>
<option value="IN">Indiana</option>
<option value="KS">Kansas</option>
<option value="KY">Kentucky</option>
<option value="MD">Maryland</option>
<option value="MA">Massachusetts</option>
<option value="MI">Michigan</option>
<option value="MN">Minnesota</option>
<option value="MS">Mississippi</option>
<option value="MO">Missouri</option>
<option value="NE">Nebraska</option>
<option value="NV">Nevada</option>
<option value="NH">New Hampshire</option>
<option value="NJ">New Jersey</option>
<option value="NM">New Mexico</option>
<option value="NY">New York</option>
<option value="NC">North Carolina</option>
<option value="ND">North Dakota</option>
<option value="OH">Ohio</option>
<option value="OK">Oklahoma</option>
<option value="OR">Oregon</option>
<option value="PA">Pennsylvania</option>
<option value="PR">Puerto Rico</option>
<option value="RI">Rhode Island</option>
<option value="SC">South Carolina</option>
<option value="SD">South Dakota</option>
<option value="TN">Tennessee</option>
<option value="TX">Texas</option>
<option value="VA">Virginia</option>
<option value="WA">Washington</option>
<option value="WV">West Virginia</option>
<option value="WI">Wisconsin</option>
</select>
</div>
<button type="submit" name="dwfrm_storelocator_findbystate" value="Search">
Search
</button>
</fieldset>
</form>
查看 Chrome 我可以看到正在发出的请求和表单参数:
就是说,我有一个非常简单的蜘蛛,查看文档,向 URL 发送一个 FormRequest 来填写表格(在这种情况下,我正在测试亚利桑那州的商店 - AZ):
class WetSealStoreSpider(Spider):
name = "wetseal_store_spider"
allowed_domains = ["wetseal.com"]
start_urls = [
"http://www.wetseal.com/Stores"
]
def parse(self, response):
yield FormRequest.from_response(response,
formname='dwfrm_storelocator_state',
formdata={'dwfrm_storelocator_address_states_stateUSCA': 'AZ',
'dwfrm_storelocator_findbystate': 'Search'},
callback=self.parse1)
def parse1(self, response):
print response.status
print response.body
当它发出 FormRequest 时,查看响应,一切似乎都正常:
但是在回调方法中,我在响应中看到了这个:
最后好像是GET请求,url全错了:
'http://www.wetseal.com/Search?q=&dwfrm_storelocator_findbystate=Search&dwfrm_storelocator_address_states_stateUSCA=AZ'
知道我做错了什么吗?
谢谢!
您正在使用 formname
但表单没有名称。
请尝试使用 formxpath='id("dwfrm_storelocator_state")'
。
试试这个
states = response.xpath(
".//select[@id='dwfrm_storelocator_address_states_stateUSCA']//option[@value!='']/@value").extract()
url = self.get_text_from_node(response.xpath("//form[@id='dwfrm_storelocator_state']/@action"))
for state in states:
form_data = {'dwfrm_storelocator_address_states_stateUSCA': state,
"dwfrm_storelocator_findbystate": "Search"}
yield FormRequest(url,
formdata=form_data,
callback=self.your_Callback)
我正在尝试填写位于 www.wetseal.com/Stores 的表格,该表格允许选择要显示商店的州。
<form action="http://www.wetseal.com/Stores?dwcont=C73689620" method="post" id="dwfrm_storelocator_state">
<fieldset>
<div class="form-row required ">
<label for="dwfrm_storelocator_address_states_stateUSCA">
<span>State</span>
<span class="required-indicator">*</span>
</label>
<select id="dwfrm_storelocator_address_states_stateUSCA" class="input-select required" name="dwfrm_storelocator_address_states_stateUSCA">
<option value="">Select...</option>
<option value="AK">Alaska</option>
<option value="AZ">Arizona</option>
<option value="AR">Arkansas</option>
<option value="CA">California</option>
<option value="CO">Colorado</option>
<option value="CT">Connecticut</option>
<option value="DE">Delaware</option>
<option value="FL">Florida</option>
<option value="GA">Georgia</option>
<option value="HI">Hawaii</option>
<option value="ID">Idaho</option>
<option value="IL">Illinois</option>
<option value="IN">Indiana</option>
<option value="KS">Kansas</option>
<option value="KY">Kentucky</option>
<option value="MD">Maryland</option>
<option value="MA">Massachusetts</option>
<option value="MI">Michigan</option>
<option value="MN">Minnesota</option>
<option value="MS">Mississippi</option>
<option value="MO">Missouri</option>
<option value="NE">Nebraska</option>
<option value="NV">Nevada</option>
<option value="NH">New Hampshire</option>
<option value="NJ">New Jersey</option>
<option value="NM">New Mexico</option>
<option value="NY">New York</option>
<option value="NC">North Carolina</option>
<option value="ND">North Dakota</option>
<option value="OH">Ohio</option>
<option value="OK">Oklahoma</option>
<option value="OR">Oregon</option>
<option value="PA">Pennsylvania</option>
<option value="PR">Puerto Rico</option>
<option value="RI">Rhode Island</option>
<option value="SC">South Carolina</option>
<option value="SD">South Dakota</option>
<option value="TN">Tennessee</option>
<option value="TX">Texas</option>
<option value="VA">Virginia</option>
<option value="WA">Washington</option>
<option value="WV">West Virginia</option>
<option value="WI">Wisconsin</option>
</select>
</div>
<button type="submit" name="dwfrm_storelocator_findbystate" value="Search">
Search
</button>
</fieldset>
</form>
查看 Chrome 我可以看到正在发出的请求和表单参数:
就是说,我有一个非常简单的蜘蛛,查看文档,向 URL 发送一个 FormRequest 来填写表格(在这种情况下,我正在测试亚利桑那州的商店 - AZ):
class WetSealStoreSpider(Spider):
name = "wetseal_store_spider"
allowed_domains = ["wetseal.com"]
start_urls = [
"http://www.wetseal.com/Stores"
]
def parse(self, response):
yield FormRequest.from_response(response,
formname='dwfrm_storelocator_state',
formdata={'dwfrm_storelocator_address_states_stateUSCA': 'AZ',
'dwfrm_storelocator_findbystate': 'Search'},
callback=self.parse1)
def parse1(self, response):
print response.status
print response.body
当它发出 FormRequest 时,查看响应,一切似乎都正常:
但是在回调方法中,我在响应中看到了这个:
最后好像是GET请求,url全错了:
'http://www.wetseal.com/Search?q=&dwfrm_storelocator_findbystate=Search&dwfrm_storelocator_address_states_stateUSCA=AZ'
知道我做错了什么吗?
谢谢!
您正在使用 formname
但表单没有名称。
请尝试使用 formxpath='id("dwfrm_storelocator_state")'
。
试试这个
states = response.xpath(
".//select[@id='dwfrm_storelocator_address_states_stateUSCA']//option[@value!='']/@value").extract()
url = self.get_text_from_node(response.xpath("//form[@id='dwfrm_storelocator_state']/@action"))
for state in states:
form_data = {'dwfrm_storelocator_address_states_stateUSCA': state,
"dwfrm_storelocator_findbystate": "Search"}
yield FormRequest(url,
formdata=form_data,
callback=self.your_Callback)