Scrapy,如何更改输入表单中的值,提交然后抓取页面

Scrapy, how to change value in input form, submit and then scrape page

我想在文本输入字段中输入一个值,然后提交表单,并在表单提交后抓取页面上的新数据 这怎么可能?

这是页面上的 html 表格。我想将输入值从 10 更改为 100 并提交表单

<form action="https://de.iss.fst.com/ba-u6-72-nbr-902-112-x-140-x-13-12-mm-simmerringr-ba-a-mit-feder-fst-40411416#product-offers-anchor" method="post" _lpchecked="1">
            <div class="fieldset">
               <div class="field qty">
                  <div class="control">
                        <label class="label" for="qty-2">
                           <span>Preise für</span>
                        </label>
                        <input type="text" name="pieces" class="validate-length maximum-length-10 qty" maxlength="12" id="qty-2" value="10">
                        <label class="label" for="qty-2">
                           <span>Teile</span>
                        </label>
                        <span class="actions">
                           <button type="submit" title="Absenden" class="action">
                              <span>Absenden</span>
                           </button>
                        </span>
                  </div>
               </div>
            </div>
      </form>

更新! 新的工作代码。

import scrapy
import pymongo
from scrapy_splash import SplashRequest, SplashFormRequest
from issfst.items import IssfstItem


class IssSpider(scrapy.Spider):
    name = "issfst_spider"
    start_urls = ["https://de.iss.fst.com/dichtungen/radialwellendichtringe/rwdr-mit-geschlossenem-kafig/ba"]
    custom_settings = {
        # specifies exported fields and order
        'FEED_EXPORT_FIELDS': ["imgurl",
                               "Produktdatenblatt",
                               "Materialdatenblatt",]
    }

    def parse(self, response):
        self.log("I just visted:" + response.url)
        urls = response.css('.details-button > a::attr(href)').extract()

        for url in urls:
            formdata = {'pieces': '200'}
            yield SplashFormRequest.from_response(
                response,
                url=url,
                formdata=formdata,
                callback=self.parse_details,
                args={'wait': 3}
            )

        # follow paignation link
        next_page_url = response.css('li.item  > a.next::attr(href)').extract_first()
        if next_page_url:
            next_page_url = response.urljoin(next_page_url)
            yield scrapy.Request(url=next_page_url, callback=self.parse)

    def parse_details(self, response):
        item = IssfstItem()
        # scrape image url
        item['imgurl'] = response.css('img.fotorama__img::attr(src)').extract(),
        # scrape download pdf links
        item['Produktdatenblatt'] = response.css('a.action[data-group="productdatasheet"]::attr(href)').extract_first(),
        item['Materialdatenblatt'] = response.css( 'a.action[data-group="materialdatasheet"]::attr(href)').extract_first(),
        item['Beschreibung'] = response.css('.description > p::text').extract_first(),
        yield item

您不应该参考 html 源代码来了解 POST 请求的参数名称。您应该使用您喜欢的浏览器的开发者工具,并在保存日志的同时查看网络。

因此,您正在使用参数 piecesform_key.

寻找 url https://de.iss.fst.com/ba-72-nbr-902-155-x-174-x-12-0-mm-simmerringr-ba-a-mit-feder-fst-40411424#product-offers-anchor 和 POST

当您使用错误的名称 'value' 设置表单数据时出错,而网站需要名称 'pieces'

现在,作为 scrapy shell 会话中的演示:

scrapy shell "https://de.iss.fst.com/ba-72-nbr-902-155-x-174-x-12-0-mm-simmerringr-ba-a-mit-feder-fst-40411424"
... 
from scrapy import FormRequest

##SETTING POST'S PARAMETERS
form_key = response.css('[name="form_key"]::attr(value)').get()
#Note response.xpath('input[@name="form_key"]/@value') returns nothing
#as far as I know for hidden element like this, css selection is the basic solution
pieces = "100"
form_data = {'form_key':form_key,'pieces':pieces} #with the correct names

##POST THE REQUEST
fetch(
     FormRequest(
    'https://de.iss.fst.com/ba-72-nbr-902-155-x-174-x-12-0-mm-simmerringr-ba-a-mit-feder-fst-40411424#product-offers-anchor',
    formdata=form_data)
)#note the add of '#product-offers-anchor' to the url, instead it won't work
view(response) #to see the page your default browser

现在您可以根据您的代码调整以上内容。