Scrapy - 激活项目管道组件 - ITEM_PIPELINES 设置
Scrapy - Activating an Item Pipeline component - ITEM_PIPELINES setting
在 scrapy 文档中有这样的信息:
Activating an Item Pipeline component
To activate an Item Pipeline component you must add its class to the
ITEM_PIPELINES setting, like in the following example:
ITEM_PIPELINES = {
'myproject.pipelines.PricePipeline': 300,
'myproject.pipelines.JsonWriterPipeline': 800, }
The integer values you assign to classes in this setting determine the
order they run in- items go through pipelines from order number low to
high. It’s customary to define these numbers in the 0-1000 range.
最后一段没看懂,主要是“确定
命令他们 运行 in- 项目通过从订单号低到
high”,你能换句话解释一下吗?选择数字是因为什么?范围是 0-1000 如何选择值?
来自docs
ITEM_PIPELINES
Default: {}
A dict containing the item pipelines to use, and their orders. The
dict is empty by default order values are arbitrary but it’s customary
to define them in the 0-1000 range.
因为 Python 中的 字典是无序集合 并且 ITEM_PIPELINES
必须是字典(因为许多其他设置,例如,例如,SPIDER_MIDDLEWARES
),您需要以某种方式定义应用管道的顺序。这就是为什么您需要为您定义的每个管道分配一个从 0 到 1000 的数字。
仅供参考,如果你查看 Scrapy 源代码,你会发现 build_component_list()
函数会为每个设置调用,例如 ITEM_PIPELINES
- 它会从你的字典中创建一个列表(有序集合)在 ITEM_PIPELINES
中使用字典值进行排序定义:
def build_component_list(base, custom):
"""Compose a component list based on a custom and base dict of components
(typically middlewares or extensions), unless custom is already a list, in
which case it's returned.
"""
if isinstance(custom, (list, tuple)):
return custom
compdict = base.copy()
compdict.update(custom)
items = (x for x in six.iteritems(compdict) if x[1] is not None)
return [x[0] for x in sorted(items, key=itemgetter(1))]
在 scrapy 文档中有这样的信息:
Activating an Item Pipeline component
To activate an Item Pipeline component you must add its class to the ITEM_PIPELINES setting, like in the following example:
ITEM_PIPELINES = { 'myproject.pipelines.PricePipeline': 300, 'myproject.pipelines.JsonWriterPipeline': 800, }
The integer values you assign to classes in this setting determine the order they run in- items go through pipelines from order number low to high. It’s customary to define these numbers in the 0-1000 range.
最后一段没看懂,主要是“确定 命令他们 运行 in- 项目通过从订单号低到 high”,你能换句话解释一下吗?选择数字是因为什么?范围是 0-1000 如何选择值?
来自docs
ITEM_PIPELINES
Default: {}
A dict containing the item pipelines to use, and their orders. The dict is empty by default order values are arbitrary but it’s customary to define them in the 0-1000 range.
因为 Python 中的 字典是无序集合 并且 ITEM_PIPELINES
必须是字典(因为许多其他设置,例如,例如,SPIDER_MIDDLEWARES
),您需要以某种方式定义应用管道的顺序。这就是为什么您需要为您定义的每个管道分配一个从 0 到 1000 的数字。
仅供参考,如果你查看 Scrapy 源代码,你会发现 build_component_list()
函数会为每个设置调用,例如 ITEM_PIPELINES
- 它会从你的字典中创建一个列表(有序集合)在 ITEM_PIPELINES
中使用字典值进行排序定义:
def build_component_list(base, custom):
"""Compose a component list based on a custom and base dict of components
(typically middlewares or extensions), unless custom is already a list, in
which case it's returned.
"""
if isinstance(custom, (list, tuple)):
return custom
compdict = base.copy()
compdict.update(custom)
items = (x for x in six.iteritems(compdict) if x[1] is not None)
return [x[0] for x in sorted(items, key=itemgetter(1))]