Pandas to_sql 参数化数据类型,如 NUMERIC(10,2)
Pandas to_sql with parameterized data types like NUMERIC(10,2)
Pandas 有一个可爱的 to_sql
方法可以将数据帧写入 SQLAlchemy 支持的任何 RDBMS。
假设我有一个这样生成的数据框:
df = pd.DataFrame([-1.04, 0.70, 0.11, -0.43, 1.0], columns=['value'])
如果我尝试在没有任何特殊行为的情况下将其写入数据库,我会得到一个双精度列类型:
df.to_sql('foo_test', an_engine)
如果我想要不同的数据类型,我可以指定它(这很好用):
df.to_sql('foo_test', an_engine, dtype={'value': sqlalchemy.types.NUMERIC})
但是如果我想设置NUMERIC
列的精度和小数位数,它会在我面前爆炸:
df.to_sql('foo_test', an_engine, dtype={'value': sqlalchemy.types.NUMERIC(10,2)})
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-77-dc008463fbfc> in <module>()
1 df = pd.DataFrame([-1.04, 0.70, 0.11, -0.43, 1.0], columns=['value'])
----> 2 df.to_sql('foo_test', cosd_engine, dtype={'value': sqlalchemy.types.NUMERIC(10,2)})
/Users/igazit/.virtualenvs/myproject/lib/python2.7/site-packages/pandas/core/generic.pyc in to_sql(self, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
964 self, name, con, flavor=flavor, schema=schema, if_exists=if_exists,
965 index=index, index_label=index_label, chunksize=chunksize,
--> 966 dtype=dtype)
967
968 def to_pickle(self, path):
/Users/igazit/.virtualenvs/myproject/lib/python2.7/site-packages/pandas/io/sql.pyc in to_sql(frame, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
536 pandas_sql.to_sql(frame, name, if_exists=if_exists, index=index,
537 index_label=index_label, schema=schema,
--> 538 chunksize=chunksize, dtype=dtype)
539
540
/Users/igazit/.virtualenvs/myproject/lib/python2.7/site-packages/pandas/io/sql.pyc in to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype)
1162 import sqlalchemy.sql.type_api as type_api
1163 for col, my_type in dtype.items():
-> 1164 if not issubclass(my_type, type_api.TypeEngine):
1165 raise ValueError('The type of %s is not a SQLAlchemy '
1166 'type ' % col)
TypeError: issubclass() arg 1 must be a class
我想深入了解为什么 sqlalchemy.types.NUMERIC
的类型通过了 1164 测试,而 sqlalchemy.types.NUMERIC(10,2)
没有。它们确实有不同的类型(sqlalchemy.sql.visitors.VisitableType
vs sqlalchemy.sql.sqltypes.NUMERIC
)。
任何线索将不胜感激!
更新:此错误已针对 pandas >= 0.16.0
修复
这是 post 关于最近 pandas 与 0.15.2 具有相同错误的错误。
https://github.com/pydata/pandas/issues/9083
合作者建议 to_sql 猴子补丁作为解决问题的方法
from pandas.io.sql import SQLTable
def to_sql(self, frame, name, if_exists='fail', index=True,
index_label=None, schema=None, chunksize=None, dtype=None):
"""
patched version of https://github.com/pydata/pandas/blob/v0.15.2/pandas/io/sql.py#L1129
"""
if dtype is not None:
from sqlalchemy.types import to_instance, TypeEngine
for col, my_type in dtype.items():
if not isinstance(to_instance(my_type), TypeEngine):
raise ValueError('The type of %s is not a SQLAlchemy '
'type ' % col)
table = SQLTable(name, self, frame=frame, index=index,
if_exists=if_exists, index_label=index_label,
schema=schema, dtype=dtype)
table.create()
table.insert(chunksize)
# check for potentially case sensitivity issues (GH7815)
if name not in self.engine.table_names(schema=schema or self.meta.schema):
warnings.warn("The provided table name '{0}' is not found exactly "
"as such in the database after writing the table, "
"possibly due to case sensitivity issues. Consider "
"using lower case table names.".format(name), UserWarning)
pd.io.sql.SQLDatabase.to_sql = to_sql
Pandas 有一个可爱的 to_sql
方法可以将数据帧写入 SQLAlchemy 支持的任何 RDBMS。
假设我有一个这样生成的数据框:
df = pd.DataFrame([-1.04, 0.70, 0.11, -0.43, 1.0], columns=['value'])
如果我尝试在没有任何特殊行为的情况下将其写入数据库,我会得到一个双精度列类型:
df.to_sql('foo_test', an_engine)
如果我想要不同的数据类型,我可以指定它(这很好用):
df.to_sql('foo_test', an_engine, dtype={'value': sqlalchemy.types.NUMERIC})
但是如果我想设置NUMERIC
列的精度和小数位数,它会在我面前爆炸:
df.to_sql('foo_test', an_engine, dtype={'value': sqlalchemy.types.NUMERIC(10,2)})
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-77-dc008463fbfc> in <module>()
1 df = pd.DataFrame([-1.04, 0.70, 0.11, -0.43, 1.0], columns=['value'])
----> 2 df.to_sql('foo_test', cosd_engine, dtype={'value': sqlalchemy.types.NUMERIC(10,2)})
/Users/igazit/.virtualenvs/myproject/lib/python2.7/site-packages/pandas/core/generic.pyc in to_sql(self, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
964 self, name, con, flavor=flavor, schema=schema, if_exists=if_exists,
965 index=index, index_label=index_label, chunksize=chunksize,
--> 966 dtype=dtype)
967
968 def to_pickle(self, path):
/Users/igazit/.virtualenvs/myproject/lib/python2.7/site-packages/pandas/io/sql.pyc in to_sql(frame, name, con, flavor, schema, if_exists, index, index_label, chunksize, dtype)
536 pandas_sql.to_sql(frame, name, if_exists=if_exists, index=index,
537 index_label=index_label, schema=schema,
--> 538 chunksize=chunksize, dtype=dtype)
539
540
/Users/igazit/.virtualenvs/myproject/lib/python2.7/site-packages/pandas/io/sql.pyc in to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype)
1162 import sqlalchemy.sql.type_api as type_api
1163 for col, my_type in dtype.items():
-> 1164 if not issubclass(my_type, type_api.TypeEngine):
1165 raise ValueError('The type of %s is not a SQLAlchemy '
1166 'type ' % col)
TypeError: issubclass() arg 1 must be a class
我想深入了解为什么 sqlalchemy.types.NUMERIC
的类型通过了 1164 测试,而 sqlalchemy.types.NUMERIC(10,2)
没有。它们确实有不同的类型(sqlalchemy.sql.visitors.VisitableType
vs sqlalchemy.sql.sqltypes.NUMERIC
)。
任何线索将不胜感激!
更新:此错误已针对 pandas >= 0.16.0
修复这是 post 关于最近 pandas 与 0.15.2 具有相同错误的错误。
https://github.com/pydata/pandas/issues/9083
合作者建议 to_sql 猴子补丁作为解决问题的方法
from pandas.io.sql import SQLTable
def to_sql(self, frame, name, if_exists='fail', index=True,
index_label=None, schema=None, chunksize=None, dtype=None):
"""
patched version of https://github.com/pydata/pandas/blob/v0.15.2/pandas/io/sql.py#L1129
"""
if dtype is not None:
from sqlalchemy.types import to_instance, TypeEngine
for col, my_type in dtype.items():
if not isinstance(to_instance(my_type), TypeEngine):
raise ValueError('The type of %s is not a SQLAlchemy '
'type ' % col)
table = SQLTable(name, self, frame=frame, index=index,
if_exists=if_exists, index_label=index_label,
schema=schema, dtype=dtype)
table.create()
table.insert(chunksize)
# check for potentially case sensitivity issues (GH7815)
if name not in self.engine.table_names(schema=schema or self.meta.schema):
warnings.warn("The provided table name '{0}' is not found exactly "
"as such in the database after writing the table, "
"possibly due to case sensitivity issues. Consider "
"using lower case table names.".format(name), UserWarning)
pd.io.sql.SQLDatabase.to_sql = to_sql