pandas unable to write to Postgres db throws "KeyError: ("SELECT name FROM sqlite_master ..."

pandas unable to write to Postgres db throws "KeyError: ("SELECT name FROM sqlite_master ..."

我创建了一个包,允许用户将数据写入 sqlite 或 Postgres 数据库。我创建了一个用于连接到数据库的模块和一个提供写入功能的单独模块。在后一个模块中,写入是一个简单的 pandas 内部函数调用:

indata.to_sql('pay_' + table, con, if_exists='append', index=False)

写入 sqlite 数据库(使用 'sqlite3' 连接)成功,但是当写入 Postgres 数据库时出现以下错误:

Traceback (most recent call last):
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pg8000/core.py", line 1778, in execute
    ps = cache['ps'][key]
KeyError: ("SELECT name FROM sqlite_master WHERE type='table' AND name=?;", ((705, 0, <function Connection.__init__.<locals>.text_out at 0x7fc3205fb510>),))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pandas/io/sql.py", line 1595, in execute
    cur.execute(*args)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pg8000/core.py", line 861, in execute
    self._c.execute(self, operation, args)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pg8000/core.py", line 1837, in execute
    self.handle_messages(cursor)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pg8000/core.py", line 1976, in handle_messages
    raise self.error
pg8000.core.ProgrammingError: {'S': 'ERROR', 'V': 'ERROR', 'C': '42P01', 'M': 'relation "sqlite_master" does not exist', 'P': '18', 'F': 'parse_relation.c', 'L': '1180', 'R': 'parserOpenTable'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pandas/io/sql.py", line 1610, in execute
    raise_with_traceback(ex)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pandas/compat/__init__.py", line 46, in raise_with_traceback
    raise exc.with_traceback(traceback)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pandas/io/sql.py", line 1595, in execute
    cur.execute(*args)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pg8000/core.py", line 861, in execute
    self._c.execute(self, operation, args)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pg8000/core.py", line 1837, in execute
    self.handle_messages(cursor)
  File "/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pg8000/core.py", line 1976, in handle_messages
    raise self.error
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': {'S': 'ERROR', 'V': 'ERROR', 'C': '42P01', 'M': 'relation "sqlite_master" does not exist', 'P': '18', 'F': 'parse_relation.c', 'L': '1180', 'R': 'parserOpenTable'}

我将错误追踪到以下文件:

/anaconda3/envs/PCAN_v1/lib/python3.7/site-packages/pandas/io/sql.py

似乎正在发生的事情是,'.to_sql' 函数被配置为在 'sql.py' 文件中此时尝试写入名为 'sqlite_master' 的数据库:

    def has_table(self, name, schema=None):
    # TODO(wesm): unused?
    # escape = _get_valid_sqlite_name
    # esc_name = escape(name)

    wld = "?"
    query = (
        "SELECT name FROM sqlite_master " "WHERE type='table' AND name={wld};"
    ).format(wld=wld)

    return len(self.execute(query, [name]).fetchall()) > 0

仔细查看错误,您可以看到已正确连接到数据库,但 pandas 正在寻找 sqlite 数据库:

我知道数据库名称是半年前我第一次开始使用 sqlite 时使用的名称,所以我想我在某个地方设置了一个配置值。所以:

  1. 我的推理正确吗?
  2. 如果是这样,我该如何更改配置?
  3. 如果不是,可能是怎么回事?

根据 pandas.DataFrame.to_sql 文档:

con : sqlalchemy.engine.Engine or sqlite3.Connection

Using SQLAlchemy makes it possible to use any DB supported by that library. Legacy support is provided for sqlite3.Connection objects.

这意味着只有 SQLite 允许 to_sql 方法的原始连接。包括 Postgres 在内的所有其他 RDBM 必须为此方法使用 SQLAlchemy 连接来创建结构和附加数据。请注意:read_sql 不需要 SQLAlchemy,因为它不会进行持久更改。

因此,此原始 DB-API 连接无法工作:

import psycopg2
con = psycopg2.connect(host="localhost", port=5432, dbname="mydb", user="myuser", password="mypwd")

indata.to_sql('pay_' + table, con, if_exists='append', index=False)

但是,这个 SQLAlchemy 连接可以工作:

from sqlalchemy import create_engine    

engine = create_engine('postgresql+psycopg2://myuser:mypwd@localhost:5432/mydb')

indata.to_sql('pay_' + table, engine, if_exists='append', index=False)

最好对两个数据库都使用 SQLAlchemy,这里 SQLite:

engine = create_engine("sqlite:///path/to/mydb.db")