Scrapy-MySQL管道不保存数据
Scrapy-MySQL pipeline does not save data
我正在使用 scrapy 抓取网站的外部链接并将这些链接存储到 MYSQl 数据库。我在代码中使用了 snippet。当我 运行 蜘蛛时,我看到链接被废弃但给出错误
2018-03-07 13:33:27 [scrapy.log] ERROR: not all arguments converted during string formatting
很明显,由于点、斜杠、逗号和破折号,链接没有被转换为字符串。那么我如何在没有 error.TIA
的情况下传递链接并存储它们
pipeline.py
from scrapy import log
from twisted.enterprise import adbapi
import MySQLdb.cursors
class MySQLStorePipeline(object):
def __init__(self):
self.dbpool = adbapi.ConnectionPool('MySQLdb', db='usalogic_testdb',
user='root', passwd='1234', cursorclass=MySQLdb.cursors.DictCursor,
charset='utf8', use_unicode=True)
def process_item(self, item, spider):
# run db query in thread pool
query = self.dbpool.runInteraction(self._conditional_insert, item)
query.addErrback(self.handle_error)
return item
def _conditional_insert(self, tx, item):
# create record if doesn't exist.
# all this block run on it's own thread
tx.execute("select * from test where link = %s", (item['link'], ))
result = tx.fetchone()
if result:
log.msg("Item already stored in db: %s" % item, level=log.DEBUG)
else:
tx.execute(\
"insert into test (link) "
"values (%s)",
(item['link'])
)
log.msg("Item stored in db: %s" % item, level=log.DEBUG)
def handle_error(self, e):
log.err(e)
当给出运行命令时
ITEMS.py
class CollectUrlItem(scrapy.Item):
link = scrapy.Field()
settings.py
ITEM_PIPELINES = {
'rvca4.pipelines.MySQLStorePipeline': 800,
}
我想如果你使用列表而不是元组,它会起作用
tx.execute(\
"insert into test (link) "
"values (%s)",
[ item['link'] ]
)
或者,在元组中添加一个逗号
tx.execute(\
"insert into test (link) "
"values (%s)",
(item['link'], )
)
因为在元组中添加尾随逗号实际上使它成为元组。阅读下文
(1) # the number 1 (the parentheses are wrapping the expression `1`)
(1,) # a 1-tuple holding a number 1
我正在使用 scrapy 抓取网站的外部链接并将这些链接存储到 MYSQl 数据库。我在代码中使用了 snippet。当我 运行 蜘蛛时,我看到链接被废弃但给出错误
2018-03-07 13:33:27 [scrapy.log] ERROR: not all arguments converted during string formatting
很明显,由于点、斜杠、逗号和破折号,链接没有被转换为字符串。那么我如何在没有 error.TIA
的情况下传递链接并存储它们pipeline.py
from scrapy import log
from twisted.enterprise import adbapi
import MySQLdb.cursors
class MySQLStorePipeline(object):
def __init__(self):
self.dbpool = adbapi.ConnectionPool('MySQLdb', db='usalogic_testdb',
user='root', passwd='1234', cursorclass=MySQLdb.cursors.DictCursor,
charset='utf8', use_unicode=True)
def process_item(self, item, spider):
# run db query in thread pool
query = self.dbpool.runInteraction(self._conditional_insert, item)
query.addErrback(self.handle_error)
return item
def _conditional_insert(self, tx, item):
# create record if doesn't exist.
# all this block run on it's own thread
tx.execute("select * from test where link = %s", (item['link'], ))
result = tx.fetchone()
if result:
log.msg("Item already stored in db: %s" % item, level=log.DEBUG)
else:
tx.execute(\
"insert into test (link) "
"values (%s)",
(item['link'])
)
log.msg("Item stored in db: %s" % item, level=log.DEBUG)
def handle_error(self, e):
log.err(e)
当给出运行命令时 ITEMS.py
class CollectUrlItem(scrapy.Item):
link = scrapy.Field()
settings.py
ITEM_PIPELINES = {
'rvca4.pipelines.MySQLStorePipeline': 800,
}
我想如果你使用列表而不是元组,它会起作用
tx.execute(\
"insert into test (link) "
"values (%s)",
[ item['link'] ]
)
或者,在元组中添加一个逗号
tx.execute(\
"insert into test (link) "
"values (%s)",
(item['link'], )
)
因为在元组中添加尾随逗号实际上使它成为元组。阅读下文
(1) # the number 1 (the parentheses are wrapping the expression `1`)
(1,) # a 1-tuple holding a number 1