Scrapy 在项目处理器中使用项目的其他变量

Scrapy use other variables of item in item processor

我正在请求网络服务的地址信息,以交叉检查我已有的地址是否与我请求的网络服务格式相同。

为此我有以下项目 input_processor:


class AdresItem(scrapy.Item):

    postal_code = scrapy.Field()
    house_number = scrapy.Field()
    addition = scrapy.Field()
    scraped_addition = scrapy.Field(
                                 input_processor = MapCompose(MyFunction),
                                 output_processor = TakeFirst()
                              )


def MyFunction(scraped_addition):
    if scraped_addition == addition
        return scraped_addition
    else:
        return None

当然我无法通过这种方式访问​​原始添加内容。在输入处理器中使用项目的另一个变量的好方法是什么?

通过item context设置变量并在函数中加载变量

示例:

import scrapy
from scrapy.loader import ItemLoader
from scrapy.loader.processors import MapCompose


def MyFunction(scraped_addition, loader_context):
    addition = loader_context.get('addition')
    if scraped_addition == addition:
        return scraped_addition
    else:
        return None


class ExampleItem(scrapy.Item):
    scraped_addition = scrapy.Field(input_processor=MapCompose(MyFunction))


class ExampleSpider(scrapy.Spider):
    name = 'exampleSpider'
    start_urls = ['https://scrapingclub.com/exercise/detail_basic/']

    def parse(self, response):
        l = ItemLoader(item=ExampleItem(), response=response)
        l.context['addition'] = 'Long-sleeved Jersey Top'
        l.add_xpath('scraped_addition', '//h3/text()')
        yield l.load_item()