要解析的 Scrapy 管道
Scrapy Pipeline to Parse
我制作了一个管道将 scrapy 数据放入我的 Parse 后端
解析度 = 'api.parse.com'
端口 = 443
但是,我找不到 post Parse 中数据的正确方法。 因为每次它都会在我的 Parse 数据库中创建未定义的对象。
class Newscrawlbotv01Pipeline(object):
def process_item(self, item, spider):
for data in item:
if not data:
raise DropItem("Missing data!")
connection = httplib.HTTPSConnection(
settings['PARSE'],
settings['PORT']
)
connection.connect()
connection.request('POST', '/1/classes/articlulos', json.dumps({item}), {
"X-Parse-Application-Id": "XXXXXXXXXXXXXXXX",
"X-Parse-REST-API-Key": "XXXXXXXXXXXXXXXXXXX",
"Content-Type": "application/json"
})
log.msg("Question added to PARSE !", level=log.DEBUG, spider=spider)
return item
错误示例:
TypeError: set([{'image': 'http://apps.site.lefigaro.fr/sites/apps/files/styles/large/public/thumbnails/image/sport24.png?itok=caKsKUzV',
'language': 'FR',
'publishedDate': datetime.datetime(2016, 3, 16, 21, 53, 10, 289000),
'publisher': 'Le Figaro Sport',
'theme': 'Sport',
'title': u'Pogba aurait rencontr\xe9 les dirigeants du PSG',
'url': u'sport24.lefigaro.fr/football/ligue-des-champions/fil-info/prolongation-entre-le-bayern-et-la-juve-796778'}]) is not JSON serializable
看起来您在 item['data']
中有一个 set
,JSON 不接受。
您需要将该字段改回列表,然后才能使其 JSON 可接受。
我找到了解决方案
class Newscrawlbotv01Pipeline(object):
def process_item(self, item, spider):
for data in item:
if not data:
raise DropItem("Missing data!")
connection = httplib.HTTPSConnection(
settings['PARSE'],
settings['PORT']
)
connection.connect()
connection.request('POST', '/1/classes/Articles', json.dumps(dict(item)), {
"X-Parse-Application-Id": "WW",
"X-Parse-REST-API-Key": "WW",
"Content-Type": "application/json"
})
log.msg("Question added to PARSE !", level=log.DEBUG, spider=spider)
return item
#self.collection.update({'url': item['url']}, dict(item), upsert=True)
我制作了一个管道将 scrapy 数据放入我的 Parse 后端
解析度 = 'api.parse.com' 端口 = 443
但是,我找不到 post Parse 中数据的正确方法。 因为每次它都会在我的 Parse 数据库中创建未定义的对象。
class Newscrawlbotv01Pipeline(object):
def process_item(self, item, spider):
for data in item:
if not data:
raise DropItem("Missing data!")
connection = httplib.HTTPSConnection(
settings['PARSE'],
settings['PORT']
)
connection.connect()
connection.request('POST', '/1/classes/articlulos', json.dumps({item}), {
"X-Parse-Application-Id": "XXXXXXXXXXXXXXXX",
"X-Parse-REST-API-Key": "XXXXXXXXXXXXXXXXXXX",
"Content-Type": "application/json"
})
log.msg("Question added to PARSE !", level=log.DEBUG, spider=spider)
return item
错误示例:
TypeError: set([{'image': 'http://apps.site.lefigaro.fr/sites/apps/files/styles/large/public/thumbnails/image/sport24.png?itok=caKsKUzV',
'language': 'FR',
'publishedDate': datetime.datetime(2016, 3, 16, 21, 53, 10, 289000),
'publisher': 'Le Figaro Sport',
'theme': 'Sport',
'title': u'Pogba aurait rencontr\xe9 les dirigeants du PSG',
'url': u'sport24.lefigaro.fr/football/ligue-des-champions/fil-info/prolongation-entre-le-bayern-et-la-juve-796778'}]) is not JSON serializable
看起来您在 item['data']
中有一个 set
,JSON 不接受。
您需要将该字段改回列表,然后才能使其 JSON 可接受。
我找到了解决方案
class Newscrawlbotv01Pipeline(object):
def process_item(self, item, spider):
for data in item:
if not data:
raise DropItem("Missing data!")
connection = httplib.HTTPSConnection(
settings['PARSE'],
settings['PORT']
)
connection.connect()
connection.request('POST', '/1/classes/Articles', json.dumps(dict(item)), {
"X-Parse-Application-Id": "WW",
"X-Parse-REST-API-Key": "WW",
"Content-Type": "application/json"
})
log.msg("Question added to PARSE !", level=log.DEBUG, spider=spider)
return item
#self.collection.update({'url': item['url']}, dict(item), upsert=True)