当 np.nan 时,sqlalchemy orm 从 pandas 数据帧批量插入

sqlalchemy orm bulk insert from pandas data frame when np.nan

我正在使用 sqlalchemy ORM 工具将 Pandas DataFrame 批量插入到 Microsoft SQL 服务器数据库中:

my_engine = create_engine(url.URL(**my_db_url))
Session = sessionmaker(bind=my_engine )
my_session = Session()
start = time.time()
my_session.bulk_insert_mappings(TableObject, mysample)
my_session.commit()
durata = time.time() -start
my_session.close()

这里 mysample 是创建为的字典列表:

mysample=myDataFrame.to_dict(orient='records')

适合TableObject,声明如下:

from sqlalchemy import Column, BigInteger, String, Integer, Sequence, DateTime,Date, Float, ForeignKey, Boolean, VARCHAR, MetaData
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.schema import PrimaryKeyConstraint
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import null

Base = declarative_base()
class TableObject(Base):

    __tablename__ = 'mytable'
    __table_args__ = {"schema": "dbo"}

    Key1= Column('Key1',String(1), nullable=False)
    Key2= Column('Key2',Integer, nullable=False)
    Key3= Column('Key3',Integer, nullable=False)
    Key4= Column('Key4',BigInteger, nullable=False)
    SCORE_DATE= Column('SCORE_DATE',DateTime)
    ScoreVal= Column("ScoreVal",Float)

    __table_args__ = (
        PrimaryKeyConstraint(
        Key1, Key2,Key3,Key4
        ), {}
    )

ScoreVal 可能很少是np.nan。批量插入 DataFrame 的最佳方法是什么?

正如在 SO 上的某处发现的那样,在使用批量插入之前,可能需要将 np.nan 替换为 None:

mysample= mysample.replace({np.nan: None})

这适用于 MSSSQL SERVER 和 ORACLE。