数据库插入失败，没有错误，scrapy

Question

我正在使用 scrapy 和数据集 (https://dataset.readthedocs.io/en/latest/quickstart.html#storing-data) which is a layer on top of sqlalchemy , trying to load data into a sqllite table as a follow up to .

使用我拥有的数据集包：

class DynamicSQLlitePipeline(object):

    def __init__(self,table_name):

        db_path = "sqlite:///"+settings.SETTINGS_PATH+"\data.db"
        db = dataset.connect(db_path)
        self.table = db[table_name].table


    def process_item(self, item, spider):

        try:
            print('TEST DATASET..')
            self.table.insert(dict(name='John Doe', age=46, country='China'))
            print('INSERTED')
        except IntegrityError:
                print('THIS IS A DUP')
        return item

在我的蜘蛛运行之后，我看到在 try except 块中打印出打印语句，没有错误，但完成后，我查看 table 并查看屏幕截图。 table 中没有数据。我做错了什么？

Answer 1

Db 连接可能有些问题。将您的此代码段放入尝试中，以检查是否存在问题。

try:
   db_path = "sqlite:///"+settings.SETTINGS_PATH+"\data.db"
   db = dataset.connect(db_path)
   self.table = db[table_name].table
except Exception:
   traceback.exec_print()

Answer 2

您发布的代码对我来说不起作用：

TypeError: __init__() takes exactly 2 arguments (1 given)

那是因为 __init__ 方法需要一个未被传递的 table_name 参数。您需要在管道对象中实现 from_crawler class 方法，例如：

@classmethod
def from_crawler(cls, crawler):
    return cls(table_name=crawler.spider.name)

这将创建一个使用蜘蛛名称作为 table 名称的管道对象，您当然可以使用任何您想要的名称。

此外，行 self.table = db[table_name].table 应替换为 self.table = db[table_name] (https://dataset.readthedocs.io/en/latest/quickstart.html#storing-data)

之后，数据存储：

数据库插入失败，没有错误，scrapy

Database insertion fails without error with scrapy

python

sqlite

sqlalchemy

scrapy

python-dataset