Scrapy 在项目处理器中使用项目的其他变量
Scrapy use other variables of item in item processor
我正在请求网络服务的地址信息,以交叉检查我已有的地址是否与我请求的网络服务格式相同。
为此我有以下项目 input_processor:
class AdresItem(scrapy.Item):
postal_code = scrapy.Field()
house_number = scrapy.Field()
addition = scrapy.Field()
scraped_addition = scrapy.Field(
input_processor = MapCompose(MyFunction),
output_processor = TakeFirst()
)
def MyFunction(scraped_addition):
if scraped_addition == addition
return scraped_addition
else:
return None
当然我无法通过这种方式访问原始添加内容。在输入处理器中使用项目的另一个变量的好方法是什么?
通过item context设置变量并在函数中加载变量
示例:
import scrapy
from scrapy.loader import ItemLoader
from scrapy.loader.processors import MapCompose
def MyFunction(scraped_addition, loader_context):
addition = loader_context.get('addition')
if scraped_addition == addition:
return scraped_addition
else:
return None
class ExampleItem(scrapy.Item):
scraped_addition = scrapy.Field(input_processor=MapCompose(MyFunction))
class ExampleSpider(scrapy.Spider):
name = 'exampleSpider'
start_urls = ['https://scrapingclub.com/exercise/detail_basic/']
def parse(self, response):
l = ItemLoader(item=ExampleItem(), response=response)
l.context['addition'] = 'Long-sleeved Jersey Top'
l.add_xpath('scraped_addition', '//h3/text()')
yield l.load_item()
我正在请求网络服务的地址信息,以交叉检查我已有的地址是否与我请求的网络服务格式相同。
为此我有以下项目 input_processor:
class AdresItem(scrapy.Item):
postal_code = scrapy.Field()
house_number = scrapy.Field()
addition = scrapy.Field()
scraped_addition = scrapy.Field(
input_processor = MapCompose(MyFunction),
output_processor = TakeFirst()
)
def MyFunction(scraped_addition):
if scraped_addition == addition
return scraped_addition
else:
return None
当然我无法通过这种方式访问原始添加内容。在输入处理器中使用项目的另一个变量的好方法是什么?
通过item context设置变量并在函数中加载变量
示例:
import scrapy
from scrapy.loader import ItemLoader
from scrapy.loader.processors import MapCompose
def MyFunction(scraped_addition, loader_context):
addition = loader_context.get('addition')
if scraped_addition == addition:
return scraped_addition
else:
return None
class ExampleItem(scrapy.Item):
scraped_addition = scrapy.Field(input_processor=MapCompose(MyFunction))
class ExampleSpider(scrapy.Spider):
name = 'exampleSpider'
start_urls = ['https://scrapingclub.com/exercise/detail_basic/']
def parse(self, response):
l = ItemLoader(item=ExampleItem(), response=response)
l.context['addition'] = 'Long-sleeved Jersey Top'
l.add_xpath('scraped_addition', '//h3/text()')
yield l.load_item()