使用 df.to_sql() 使用 pyodbc 写入 MySQL 时编码未知
Unknown encoding while using df.to_sql() to write to MySQL using pyodbc
我已经使用 pyodbc 和 sqlalchemy 创建了到 MySQL 数据库的连接。但是当我使用 pd.to_sql()
时,它给我一个错误。错误似乎是 pandas 正在尝试进行一些编码。正在转换的参数是字符串数据类型,数据库编码是 latin-1
。
但是当我用相同的连接执行 connection.execute(insert query,params)
时,它工作正常。
此外,当我将 pd.to_sql()
与 sqlalchemy
和 mysqlconnector
建立的连接一起使用时,它的工作效率很高。
params = urllib3.parse.quote_plus("DRIVER={MySQL ODBC 8.0 ANSI Driver};"
f"SERVER={host}:{port};"
f"DATABASE={db};"
f"UID={username};"
f"PWD={password};"
f"charset=utf8")
db_engine = create_engine(f"mysql+pyodbc:///?odbc_connect={params}")
connection = db_engine.connect()
# main_df is a pd.DataFrame(). It contains a long text field which is most of the time getting affected.
# the error mostly come from this column.
maindf = pd.DataFrame()
maindf['transcript'] = ['This is a sample 1', 'This is sample2']
maindf.to_sql("mytable", connection, if_exists="append", index=False, chunksize=1000)
错误如下:
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1685, in _execute_context
self.dialect.do_executemany(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\default.py", line 713, in do_executemany
cursor.executemany(statement, parameters)
pyodbc.ProgrammingError: ('42000', "[42000] [MySQL][ODBC 8.0(a) Driver][mysqld-5.7.31-log]You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '4' at line 1 (1064) (SQLParamData)")
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "E:/myusername/cx-index-score/nice_rpa/pipeline.py", line 17, in <module>
uploader.split_upload(os.path.abspath(Path('./datasets')))
File "E:\myusername\cx-index-score\nice_rpa\processandupload.py", line 163, in split_upload
self.writetosandbox()
File "E:\myusername\cx-index-score\nice_rpa\processandupload.py", line 216, in writetosandbox
self.maindf.to_sql("nice_daily_update", self.connection, if_exists="append",
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\core\generic.py", line 2779, in to_sql
sql.to_sql(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 590, in to_sql
pandas_sql.to_sql(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 1405, in to_sql
raise err
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 1397, in to_sql
table.insert(chunksize, method=method)
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 831, in insert
exec_insert(conn, keys, chunk_iter)
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 748, in _execute_insert
conn.execute(self.table.insert(), data)
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1200, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\sql\elements.py", line 313, in _execute_on_connection
return connection._execute_clauseelement(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1389, in _execute_clauseelement
ret = self._execute_context(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1748, in _execute_context
self._handle_dbapi_exception(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1929, in _handle_dbapi_exception
util.raise_(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\util\compat.py", line 211, in raise_
raise exception
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1685, in _execute_context
self.dialect.do_executemany(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\default.py", line 713, in do_executemany
cursor.executemany(statement, parameters)
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [MySQL][ODBC 8.0(a) Driver][mysqld-5.7.31-log]You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '33\x03@333333\x03@333333\x03@' at line 1 (1064) (SQLParamData)")
如果看到语法错误的部分,其实应该是一个简单的字符串,就是utf-8
。数据库编码为'latin1'。有趣的是,即使每次数据和错误都相同,但“错误语法”部分总是在变化。一次是“4”,然后是“33\x03@333333\x03@333333\x03@”,每个 运行 都会改变,输入数据始终相同不过
你知道如何停止 pandas 在将我的参数发送到数据库之前对其进行预处理吗?如果那不可能,你能建议一个替代方案来有效地写入多列(范围在 1000 秒内)吗?
我使用了错误的驱动程序。服务器中的驱动程序是 MySQL ODBC 5.1 Driver
,我在 MySQL ODBC 8.0 ANSI Driver
中使用。这解释了奇怪的编码。
params = urllib3.parse.quote_plus("DRIVER={MySQL ODBC 5.1 Driver};"
f"SERVER={host}:{port};"
f"DATABASE={db};"
f"UID={username};"
f"PWD={password};"
f"charset=utf8")
db_engine = create_engine(f"mysql+pyodbc:///?odbc_connect={params}")
connection = db_engine.connect()
我已经使用 pyodbc 和 sqlalchemy 创建了到 MySQL 数据库的连接。但是当我使用 pd.to_sql()
时,它给我一个错误。错误似乎是 pandas 正在尝试进行一些编码。正在转换的参数是字符串数据类型,数据库编码是 latin-1
。
但是当我用相同的连接执行 connection.execute(insert query,params)
时,它工作正常。
此外,当我将 pd.to_sql()
与 sqlalchemy
和 mysqlconnector
建立的连接一起使用时,它的工作效率很高。
params = urllib3.parse.quote_plus("DRIVER={MySQL ODBC 8.0 ANSI Driver};"
f"SERVER={host}:{port};"
f"DATABASE={db};"
f"UID={username};"
f"PWD={password};"
f"charset=utf8")
db_engine = create_engine(f"mysql+pyodbc:///?odbc_connect={params}")
connection = db_engine.connect()
# main_df is a pd.DataFrame(). It contains a long text field which is most of the time getting affected.
# the error mostly come from this column.
maindf = pd.DataFrame()
maindf['transcript'] = ['This is a sample 1', 'This is sample2']
maindf.to_sql("mytable", connection, if_exists="append", index=False, chunksize=1000)
错误如下:
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1685, in _execute_context
self.dialect.do_executemany(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\default.py", line 713, in do_executemany
cursor.executemany(statement, parameters)
pyodbc.ProgrammingError: ('42000', "[42000] [MySQL][ODBC 8.0(a) Driver][mysqld-5.7.31-log]You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '4' at line 1 (1064) (SQLParamData)")
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "E:/myusername/cx-index-score/nice_rpa/pipeline.py", line 17, in <module>
uploader.split_upload(os.path.abspath(Path('./datasets')))
File "E:\myusername\cx-index-score\nice_rpa\processandupload.py", line 163, in split_upload
self.writetosandbox()
File "E:\myusername\cx-index-score\nice_rpa\processandupload.py", line 216, in writetosandbox
self.maindf.to_sql("nice_daily_update", self.connection, if_exists="append",
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\core\generic.py", line 2779, in to_sql
sql.to_sql(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 590, in to_sql
pandas_sql.to_sql(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 1405, in to_sql
raise err
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 1397, in to_sql
table.insert(chunksize, method=method)
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 831, in insert
exec_insert(conn, keys, chunk_iter)
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 748, in _execute_insert
conn.execute(self.table.insert(), data)
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1200, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\sql\elements.py", line 313, in _execute_on_connection
return connection._execute_clauseelement(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1389, in _execute_clauseelement
ret = self._execute_context(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1748, in _execute_context
self._handle_dbapi_exception(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1929, in _handle_dbapi_exception
util.raise_(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\util\compat.py", line 211, in raise_
raise exception
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1685, in _execute_context
self.dialect.do_executemany(
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\default.py", line 713, in do_executemany
cursor.executemany(statement, parameters)
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [MySQL][ODBC 8.0(a) Driver][mysqld-5.7.31-log]You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '33\x03@333333\x03@333333\x03@' at line 1 (1064) (SQLParamData)")
如果看到语法错误的部分,其实应该是一个简单的字符串,就是utf-8
。数据库编码为'latin1'。有趣的是,即使每次数据和错误都相同,但“错误语法”部分总是在变化。一次是“4”,然后是“33\x03@333333\x03@333333\x03@”,每个 运行 都会改变,输入数据始终相同不过
你知道如何停止 pandas 在将我的参数发送到数据库之前对其进行预处理吗?如果那不可能,你能建议一个替代方案来有效地写入多列(范围在 1000 秒内)吗?
我使用了错误的驱动程序。服务器中的驱动程序是 MySQL ODBC 5.1 Driver
,我在 MySQL ODBC 8.0 ANSI Driver
中使用。这解释了奇怪的编码。
params = urllib3.parse.quote_plus("DRIVER={MySQL ODBC 5.1 Driver};"
f"SERVER={host}:{port};"
f"DATABASE={db};"
f"UID={username};"
f"PWD={password};"
f"charset=utf8")
db_engine = create_engine(f"mysql+pyodbc:///?odbc_connect={params}")
connection = db_engine.connect()