当 np.nan 时,sqlalchemy orm 从 pandas 数据帧批量插入
sqlalchemy orm bulk insert from pandas data frame when np.nan
我正在使用 sqlalchemy ORM 工具将 Pandas DataFrame 批量插入到 Microsoft SQL 服务器数据库中:
my_engine = create_engine(url.URL(**my_db_url))
Session = sessionmaker(bind=my_engine )
my_session = Session()
start = time.time()
my_session.bulk_insert_mappings(TableObject, mysample)
my_session.commit()
durata = time.time() -start
my_session.close()
这里 mysample
是创建为的字典列表:
mysample=myDataFrame.to_dict(orient='records')
适合TableObject,声明如下:
from sqlalchemy import Column, BigInteger, String, Integer, Sequence, DateTime,Date, Float, ForeignKey, Boolean, VARCHAR, MetaData
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.schema import PrimaryKeyConstraint
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import null
Base = declarative_base()
class TableObject(Base):
__tablename__ = 'mytable'
__table_args__ = {"schema": "dbo"}
Key1= Column('Key1',String(1), nullable=False)
Key2= Column('Key2',Integer, nullable=False)
Key3= Column('Key3',Integer, nullable=False)
Key4= Column('Key4',BigInteger, nullable=False)
SCORE_DATE= Column('SCORE_DATE',DateTime)
ScoreVal= Column("ScoreVal",Float)
__table_args__ = (
PrimaryKeyConstraint(
Key1, Key2,Key3,Key4
), {}
)
ScoreVal 可能很少是np.nan。批量插入 DataFrame 的最佳方法是什么?
正如在 SO 上的某处发现的那样,在使用批量插入之前,可能需要将 np.nan
替换为 None
:
mysample= mysample.replace({np.nan: None})
这适用于 MSSSQL SERVER 和 ORACLE。
我正在使用 sqlalchemy ORM 工具将 Pandas DataFrame 批量插入到 Microsoft SQL 服务器数据库中:
my_engine = create_engine(url.URL(**my_db_url))
Session = sessionmaker(bind=my_engine )
my_session = Session()
start = time.time()
my_session.bulk_insert_mappings(TableObject, mysample)
my_session.commit()
durata = time.time() -start
my_session.close()
这里 mysample
是创建为的字典列表:
mysample=myDataFrame.to_dict(orient='records')
适合TableObject,声明如下:
from sqlalchemy import Column, BigInteger, String, Integer, Sequence, DateTime,Date, Float, ForeignKey, Boolean, VARCHAR, MetaData
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.schema import PrimaryKeyConstraint
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import null
Base = declarative_base()
class TableObject(Base):
__tablename__ = 'mytable'
__table_args__ = {"schema": "dbo"}
Key1= Column('Key1',String(1), nullable=False)
Key2= Column('Key2',Integer, nullable=False)
Key3= Column('Key3',Integer, nullable=False)
Key4= Column('Key4',BigInteger, nullable=False)
SCORE_DATE= Column('SCORE_DATE',DateTime)
ScoreVal= Column("ScoreVal",Float)
__table_args__ = (
PrimaryKeyConstraint(
Key1, Key2,Key3,Key4
), {}
)
ScoreVal 可能很少是np.nan。批量插入 DataFrame 的最佳方法是什么?
正如在 SO 上的某处发现的那样,在使用批量插入之前,可能需要将 np.nan
替换为 None
:
mysample= mysample.replace({np.nan: None})
这适用于 MSSSQL SERVER 和 ORACLE。