Scrapy: AttributeError: 'list' object has no attribute 'iteritems'
Scrapy: AttributeError: 'list' object has no attribute 'iteritems'
这是我关于堆栈溢出的第一个问题。最近想用linked-in-scraper,所以下载了指令"scrapy crawl linkedin.com",得到下面的错误信息。供您参考,我使用 anaconda 2.3.0 和 python 2.7.11。所有相关包,包括 scrapy 和 six,在执行程序之前通过 pip 更新。
Traceback (most recent call last):
File "/Users/byeongsuyu/anaconda/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/cmdline.py", line 108, in execute
settings = get_project_settings()
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/utils/project.py", line 60, in get_project_settings
settings.setmodule(settings_module_path, priority='project')
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 285, in setmodule
self.set(key, getattr(module, key), priority)
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 260, in set
self.attributes[name].set(value, priority)
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 55, in set
value = BaseSettings(value, priority=priority)
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 91, in __init__
self.update(values, priority)
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 317, in update
for name, value in six.iteritems(values):
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/six.py", line 599, in iteritems
return d.iteritems(**kw)
AttributeError: 'list' object has no attribute 'iteritems'
我了解到这个错误源于d不是字典类型而是列表类型。而且由于错误来自scrapy上的代码,所以可能是scrapy包或六包上的问题。我该如何尝试修复此错误?
编辑: 这是来自 scrapy.cfg
的代码
# Automatically created by: scrapy start project
#
# For more information about the [deploy] section see:
# http://doc.scrapy.org/topics/scrapyd.html
[settings]
default = linkedIn.settings
[deploy]
#url = http://localhost:6800/
project = linkedIn
这是由于内链爬虫的 settings:
ITEM_PIPELINES = ['linkedIn.pipelines.LinkedinPipeline']
然而,ITEM_PIPELINES
应该是一个字典,according to the doc:
To activate an Item Pipeline component you must add its class to the ITEM_PIPELINES
setting, like in the following example:
ITEM_PIPELINES = {
'myproject.pipelines.PricePipeline': 300,
'myproject.pipelines.JsonWriterPipeline': 800,
}
The integer values you assign to classes in this setting determine the order in which they run: items go through from lower valued to higher valued classes. It’s customary to define these numbers in the 0-1000 range.
根据this question,它曾经是一个列表,这就解释了为什么这个爬虫使用列表。
所以你将不得不要求你的爬虫开发者更新他们的代码,或者自己设置 ITEM_PIPELINES
。
简短的回答是 ITEM_PIPELINES 应该是一个字典而不是一个列表,键作为管道 class 并且值是一个整数,它决定了它们 运行 的顺序:项目从低价值到高价值 classes。通常将这些数字定义在 0-1000 范围内。正如@valentin Lorentz
所解释的
这是我关于堆栈溢出的第一个问题。最近想用linked-in-scraper,所以下载了指令"scrapy crawl linkedin.com",得到下面的错误信息。供您参考,我使用 anaconda 2.3.0 和 python 2.7.11。所有相关包,包括 scrapy 和 six,在执行程序之前通过 pip 更新。
Traceback (most recent call last):
File "/Users/byeongsuyu/anaconda/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/cmdline.py", line 108, in execute
settings = get_project_settings()
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/utils/project.py", line 60, in get_project_settings
settings.setmodule(settings_module_path, priority='project')
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 285, in setmodule
self.set(key, getattr(module, key), priority)
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 260, in set
self.attributes[name].set(value, priority)
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 55, in set
value = BaseSettings(value, priority=priority)
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 91, in __init__
self.update(values, priority)
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 317, in update
for name, value in six.iteritems(values):
File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/six.py", line 599, in iteritems
return d.iteritems(**kw)
AttributeError: 'list' object has no attribute 'iteritems'
我了解到这个错误源于d不是字典类型而是列表类型。而且由于错误来自scrapy上的代码,所以可能是scrapy包或六包上的问题。我该如何尝试修复此错误?
编辑: 这是来自 scrapy.cfg
的代码 # Automatically created by: scrapy start project
#
# For more information about the [deploy] section see:
# http://doc.scrapy.org/topics/scrapyd.html
[settings]
default = linkedIn.settings
[deploy]
#url = http://localhost:6800/
project = linkedIn
这是由于内链爬虫的 settings:
ITEM_PIPELINES = ['linkedIn.pipelines.LinkedinPipeline']
然而,ITEM_PIPELINES
应该是一个字典,according to the doc:
To activate an Item Pipeline component you must add its class to the
ITEM_PIPELINES
setting, like in the following example:ITEM_PIPELINES = { 'myproject.pipelines.PricePipeline': 300, 'myproject.pipelines.JsonWriterPipeline': 800, }
The integer values you assign to classes in this setting determine the order in which they run: items go through from lower valued to higher valued classes. It’s customary to define these numbers in the 0-1000 range.
根据this question,它曾经是一个列表,这就解释了为什么这个爬虫使用列表。
所以你将不得不要求你的爬虫开发者更新他们的代码,或者自己设置 ITEM_PIPELINES
。
简短的回答是 ITEM_PIPELINES 应该是一个字典而不是一个列表,键作为管道 class 并且值是一个整数,它决定了它们 运行 的顺序:项目从低价值到高价值 classes。通常将这些数字定义在 0-1000 范围内。正如@valentin Lorentz
所解释的