pandas to_sql 带有数组数据类型的 postgresql
pandas to_sql postgresql with array dtypes
我有一个数据框,其中包含日期时间值和数组以及未来可能包含的其他数据类型。
我希望 to_sql 它到 PostgreSQL,其中 datetime 是(没有时区的时间戳)并且数组是(字节)类型,但我不知道为 dtype 参数放什么。
有没有一种方法可以根据数据框列的数据类型动态执行数据类型?
table 的外观:
CREATE TABLE IF NOT EXISTS data2_mcw_tmp (
time timestamp without time zone,
u bytea,
v bytea,
w bytea,
spd bytea,
dir bytea,
temp bytea
);
到目前为止我的代码(在用户 rftr 的帮助下):
dtypedict = {}
Data2_mcw_conv = Data2_mcw.copy()
for row in Data2_mcw_conv.index:
for col in Data2_mcw_conv.columns:
value = Data2_mcw_conv[col][row]
try:
if type(Data2_mcw_conv[col].iloc[0]).__module__ == np.__name__:
dtypedict.update({col:BYTEA})
value = Data2_mcw_conv[col].loc[row]
print('before: ')
print (value.flags)
print('---------------------')
value = value.copy(order='C')
print('after: ')
print (value.flags)
print('=====================')
value = pickle.dumps(value)
except:
if isinstance(Data2_mcw_conv[col].iloc[0], datetime.date):
dtypedict.update({col:TIMESTAMP})
Data2_mcw_conv[col][row] = value
Data2_mcw_conv.to_sql(name='data2_mcw_tmp',con=conn,
if_exists = 'replace',
dtype=dtypedict)
但是,我得到这个错误:
Traceback (most recent call last):
File "C:\Users\myname\Desktop\database\pickletopdb2.py", line 145, in <module>
postgres_conv()
File "C:\Users\myname\Desktop\database\pickletopdb2.py", line 124, in postgres_conv
Data2_mcw_conv.to_sql(name='data2_mcw_tmp',con=conn,
File "C:\Python38\lib\site-packages\pandas\core\generic.py", line 2778, in to_sql
sql.to_sql(
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 590, in to_sql
pandas_sql.to_sql(
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 1397, in to_sql
table.insert(chunksize, method=method)
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 831, in insert
exec_insert(conn, keys, chunk_iter)
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 748, in _execute_insert
conn.execute(self.table.insert(), data)
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1286, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "C:\Python38\lib\site-packages\sqlalchemy\sql\elements.py", line 325, in _execute_on_connection
return connection._execute_clauseelement(
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1478, in _execute_clauseelement
ret = self._execute_context(
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1842, in _execute_context
self._handle_dbapi_exception(
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 2027, in _handle_dbapi_exception
util.raise_(exc_info[1], with_traceback=exc_info[2])
File "C:\Python38\lib\site-packages\sqlalchemy\util\compat.py", line 207, in raise_
raise exception
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1779, in _execute_context
self.dialect.do_executemany(
File "C:\Python38\lib\site-packages\sqlalchemy\dialects\postgresql\psycopg2.py", line 951, in do_executemany
context._psycopg2_fetched_rows = xtras.execute_values(
File "C:\Python38\lib\site-packages\psycopg2\extras.py", line 1267, in execute_values
parts.append(cur.mogrify(template, args))
ValueError: ndarray is not C-contiguous
value.flag 输出 before/after value = value.copy(order='C'):
before:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
---------------------
after:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
=====================
为什么会出现此错误,知道如何解决吗?
遵循 and customize the dtypes to their PostgreSQL
equivalents from here 的第二个代码片段。所以在你的情况下,例如:
from sqlalchemy.dialects.postgresql import BYTEA, TIMESTAMP
def sqlcol(dfparam):
# ...
if "datetime" in str(j):
dtypedict.update({i: TIMESTAMP})
if "object" in str(j): # Depending on what your other column's datatypes are
dtypesdict.update({i: BYTEA})
# ...
备注:
- 根据docs,
TIMESTAMP
默认没有时区。
- 您的(字节)列通常在
pandas
中用数据类型 object
表示。如果您将来添加更多数据,您应该考虑到这一点。
我有一个数据框,其中包含日期时间值和数组以及未来可能包含的其他数据类型。 我希望 to_sql 它到 PostgreSQL,其中 datetime 是(没有时区的时间戳)并且数组是(字节)类型,但我不知道为 dtype 参数放什么。
有没有一种方法可以根据数据框列的数据类型动态执行数据类型?
table 的外观:
CREATE TABLE IF NOT EXISTS data2_mcw_tmp (
time timestamp without time zone,
u bytea,
v bytea,
w bytea,
spd bytea,
dir bytea,
temp bytea
);
到目前为止我的代码(在用户 rftr 的帮助下):
dtypedict = {}
Data2_mcw_conv = Data2_mcw.copy()
for row in Data2_mcw_conv.index:
for col in Data2_mcw_conv.columns:
value = Data2_mcw_conv[col][row]
try:
if type(Data2_mcw_conv[col].iloc[0]).__module__ == np.__name__:
dtypedict.update({col:BYTEA})
value = Data2_mcw_conv[col].loc[row]
print('before: ')
print (value.flags)
print('---------------------')
value = value.copy(order='C')
print('after: ')
print (value.flags)
print('=====================')
value = pickle.dumps(value)
except:
if isinstance(Data2_mcw_conv[col].iloc[0], datetime.date):
dtypedict.update({col:TIMESTAMP})
Data2_mcw_conv[col][row] = value
Data2_mcw_conv.to_sql(name='data2_mcw_tmp',con=conn,
if_exists = 'replace',
dtype=dtypedict)
但是,我得到这个错误:
Traceback (most recent call last):
File "C:\Users\myname\Desktop\database\pickletopdb2.py", line 145, in <module>
postgres_conv()
File "C:\Users\myname\Desktop\database\pickletopdb2.py", line 124, in postgres_conv
Data2_mcw_conv.to_sql(name='data2_mcw_tmp',con=conn,
File "C:\Python38\lib\site-packages\pandas\core\generic.py", line 2778, in to_sql
sql.to_sql(
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 590, in to_sql
pandas_sql.to_sql(
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 1397, in to_sql
table.insert(chunksize, method=method)
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 831, in insert
exec_insert(conn, keys, chunk_iter)
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 748, in _execute_insert
conn.execute(self.table.insert(), data)
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1286, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "C:\Python38\lib\site-packages\sqlalchemy\sql\elements.py", line 325, in _execute_on_connection
return connection._execute_clauseelement(
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1478, in _execute_clauseelement
ret = self._execute_context(
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1842, in _execute_context
self._handle_dbapi_exception(
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 2027, in _handle_dbapi_exception
util.raise_(exc_info[1], with_traceback=exc_info[2])
File "C:\Python38\lib\site-packages\sqlalchemy\util\compat.py", line 207, in raise_
raise exception
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1779, in _execute_context
self.dialect.do_executemany(
File "C:\Python38\lib\site-packages\sqlalchemy\dialects\postgresql\psycopg2.py", line 951, in do_executemany
context._psycopg2_fetched_rows = xtras.execute_values(
File "C:\Python38\lib\site-packages\psycopg2\extras.py", line 1267, in execute_values
parts.append(cur.mogrify(template, args))
ValueError: ndarray is not C-contiguous
value.flag 输出 before/after value = value.copy(order='C'):
before:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
---------------------
after:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
=====================
为什么会出现此错误,知道如何解决吗?
遵循 PostgreSQL
equivalents from here 的第二个代码片段。所以在你的情况下,例如:
from sqlalchemy.dialects.postgresql import BYTEA, TIMESTAMP
def sqlcol(dfparam):
# ...
if "datetime" in str(j):
dtypedict.update({i: TIMESTAMP})
if "object" in str(j): # Depending on what your other column's datatypes are
dtypesdict.update({i: BYTEA})
# ...
备注:
- 根据docs,
TIMESTAMP
默认没有时区。 - 您的(字节)列通常在
pandas
中用数据类型object
表示。如果您将来添加更多数据,您应该考虑到这一点。