Scrapinghub 在信号处理程序上捕获错误:<绑定方法?产量
Scrapinghub Getting Error caught on signal handler: <bound method ? on Yield
我有一个在本地运行的 scrapy 脚本,但是当我将它部署到 Scrapinghub 时,它给出了所有错误。调试后,错误来自Yielding the item。
这是我得到的错误。
ERROR [scrapy.utils.signal] Error caught on signal handler: <bound method ?.item_scraped of <sh_scrapy.extension.HubstorageExtension object at 0x7fd39e6141d0>> Less
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
result = f(*args, **kw)
File "/usr/local/lib/python2.7/site-packages/pydispatch/robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "/usr/local/lib/python2.7/site-packages/sh_scrapy/extension.py", line 45, in item_scraped
item = self.exporter.export_item(item)
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 304, in export_item
result = dict(self._get_serialized_fields(item))
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 75, in _get_serialized_fields
value = self.serialize_field(field, field_name, item[field_name])
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 284, in serialize_field
return serializer(value)
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 290, in _serialize_value
return dict(self._serialize_dict(value))
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 300, in _serialize_dict
key = to_bytes(key) if self.binary else key
File "/usr/local/lib/python2.7/site-packages/scrapy/utils/python.py", line 117, in to_bytes
'object, got %s' % type(text).__name__)
TypeError: to_bytes must receive a unicode, str or bytes object, got int
它没有指定有问题的字段,但是通过排除过程,我开始意识到它是这部分代码:
try:
item["media"] = {}
media_index = 0
media_content = response.xpath("//audio/source/@src").extract_first()
if media_content is not None:
item["media"][media_index] = {}
preview = item["media"][media_index]
preview["Media URL"] = media_content
preview["Media Type"] = "Audio"
media_index += 1
except IndexError:
print "Index error for media " + item["asset_url"]
我清理了一些部分以便更容易解决,但基本上这部分就是问题所在。它不喜欢项目媒体的地方。
我是 Python 和 Scrapy 的初学者。很抱歉,如果这被证明是愚蠢的基本 Python 错误。有什么想法吗?
编辑:所以在从 ThunderMind 得到答案后,解决方案是简单地为 key
做 str(media_index)
是的,就在这里:
item["media"][media_index] = {}
media_index 是可变的。并且 Keys 不能是可变的。
阅读 Python dict,了解应将什么用作键。
我有一个在本地运行的 scrapy 脚本,但是当我将它部署到 Scrapinghub 时,它给出了所有错误。调试后,错误来自Yielding the item。
这是我得到的错误。
ERROR [scrapy.utils.signal] Error caught on signal handler: <bound method ?.item_scraped of <sh_scrapy.extension.HubstorageExtension object at 0x7fd39e6141d0>> Less
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
result = f(*args, **kw)
File "/usr/local/lib/python2.7/site-packages/pydispatch/robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "/usr/local/lib/python2.7/site-packages/sh_scrapy/extension.py", line 45, in item_scraped
item = self.exporter.export_item(item)
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 304, in export_item
result = dict(self._get_serialized_fields(item))
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 75, in _get_serialized_fields
value = self.serialize_field(field, field_name, item[field_name])
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 284, in serialize_field
return serializer(value)
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 290, in _serialize_value
return dict(self._serialize_dict(value))
File "/usr/local/lib/python2.7/site-packages/scrapy/exporters.py", line 300, in _serialize_dict
key = to_bytes(key) if self.binary else key
File "/usr/local/lib/python2.7/site-packages/scrapy/utils/python.py", line 117, in to_bytes
'object, got %s' % type(text).__name__)
TypeError: to_bytes must receive a unicode, str or bytes object, got int
它没有指定有问题的字段,但是通过排除过程,我开始意识到它是这部分代码:
try:
item["media"] = {}
media_index = 0
media_content = response.xpath("//audio/source/@src").extract_first()
if media_content is not None:
item["media"][media_index] = {}
preview = item["media"][media_index]
preview["Media URL"] = media_content
preview["Media Type"] = "Audio"
media_index += 1
except IndexError:
print "Index error for media " + item["asset_url"]
我清理了一些部分以便更容易解决,但基本上这部分就是问题所在。它不喜欢项目媒体的地方。
我是 Python 和 Scrapy 的初学者。很抱歉,如果这被证明是愚蠢的基本 Python 错误。有什么想法吗?
编辑:所以在从 ThunderMind 得到答案后,解决方案是简单地为 key
做 str(media_index)是的,就在这里:
item["media"][media_index] = {}
media_index 是可变的。并且 Keys 不能是可变的。 阅读 Python dict,了解应将什么用作键。