(dataframe.to_sql with reference_or_insert): 当找不到外键时，如何在引用的 table 中自动插入丢失的记录？

Question

描述

我正在尝试将数据从 Pandas DataFrame 迁移到 MySQL 数据库 table 但该数据存在一些我想解决的不一致问题我还没有想出办法。非常感谢任何帮助解决这个问题的人。

我的数据示例：

user_type (table)

code	detail
a	Secretary
b	Accountant

user_df（包含我要迁移到用户 table的数据的DataFrame）

id	name	user_type_code (FK: user_type)
1	Jane Doe	a
2	John Doe	a
3	James Doe	b
4	Jeff Doe	c
5	Jennifer Doe	d

从以上数据可以看出，user_type_code 的值为 c & d 在 user_type table.

中找不到

我要实现的是把那些user_type缺失的数据自动插入到虚拟信息中，以适应以后更正的需要，并保留所有的用户记录。

user_type table（我希望它在最后如何）

code	detail
a	Secretary
b	Accountant
c	Unknown c
d	Unknown d

我当前的实现

import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.dialects.mysql import insert
from sqlalchemy.exc import NoReferenceError

# I want to add an implementation of inserting the dummy data in the referenced table (user_type) in this function
def insert_ignore_on_duplicates(table, conn, keys, data_iter):
    """ Insert ignore on duplicate primary keys """
    try:
        insert_stmt = insert(table.table).values(list(data_iter))
        on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(
            insert_stmt.inserted
        )
        conn.execute(on_duplicate_key_stmt)
    except NoReferenceError as error:
        print("Error: {}".format(error))

db_engine = create_engine("mysql+mysqlconnector://username:password@localhost:3306/")

user_df = pd.DataFrame()  # Assume this contains all the users' data

user_df.to_sql(
    "user",
    con=db_engine,
    if_exists="append",
    index=False,
    method=insert_ignore_on_duplicates,
    chunksize=5000,
)

我正在寻求帮助以了解如何修改此 insert_ignore_on_duplicates function/method 以允许自动插入丢失的外键引用或可以执行该操作的任何其他方法。

我发现的一些相关问题

Does SQLAlchemy have an equivalent of Django's get_or_create?
SQLAlchemy Automatically Create Entry If Doesn't Exist As Foreign Key
Fastest way to insert object if it doesn't exist with SQLAlchemy

P.S. 我之所以需要这个实现是因为数据很大（>400 万条记录）并且它包含许多不存在的外键因此 实际上无法手动检查 。添加这些主要的虚拟数据将有助于保留所有数据并允许 suitable 将来更正，也许会更新记录 c: Unknown c 到 c: 审计员

Answer 1

您真正需要的是 user_type table 中缺失代码的列表。你可以这样得到：

import pandas as pd

# example data
user_type = pd.DataFrame(
    [("a", "Secretary"), ("b", "Accountant")], columns=["code", "detail"]
)
# (the above would actually be retrieved via `pd.read_sql_table("user_type", engine)`)
user_df = pd.DataFrame(
    [
        (1, "Jane Doe", "a"),
        (2, "John Doe", "a"),
        (3, "James Doe", "b"),
        (4, "Jeff Doe", "c"),
        (5, "Jennifer Doe", "d"),
    ],
    columns=["id", "name", "user_type_code"],
)

# real code starts here
user_type_code_list = user_type["code"].unique()
user_df_code_list = user_df["user_type_code"].unique()
user_types_to_add = pd.DataFrame(
    [
        (f"{x}", f"Unknown {x}")
        for x in user_df_code_list
        if x not in user_type_code_list
    ],
    columns=["code", "detail"],
)
print(user_types_to_add)
"""
  code     detail
0    c  Unknown c
1    d  Unknown d
"""

然后您可以使用

user_types_to_add.to_sql("user_type", db_engine, index=False, if_exists="append")

将缺少的行添加到 user_type table，然后是

user_df.to_sql("user", db_engine, index=False, if_exists="append", …)

(dataframe.to_sql with reference_or_insert): 当找不到外键时，如何在引用的 table 中自动插入丢失的记录？

(dataframe.to_sql with reference_or_insert): How to automatically insert a missing record in a referenced table when a foreign key is not found?

python

sqlalchemy

dataframe

pandas

pandas-to-sql

描述

我当前的实现

我发现的一些相关问题