(dataframe.to_sql with reference_or_insert): 当找不到外键时,如何在引用的 table 中自动插入丢失的记录?
(dataframe.to_sql with reference_or_insert): How to automatically insert a missing record in a referenced table when a foreign key is not found?
描述
我正在尝试将数据从 Pandas DataFrame 迁移到 MySQL 数据库 table 但该数据存在一些我想解决的不一致问题 我还没有想出办法。非常感谢任何帮助解决这个问题的人。
我的数据示例:
user_type (table)
code
detail
a
Secretary
b
Accountant
user_df(包含我要迁移到用户 table的数据的DataFrame)
id
name
user_type_code (FK: user_type)
1
Jane Doe
a
2
John Doe
a
3
James Doe
b
4
Jeff Doe
c
5
Jennifer Doe
d
从以上数据可以看出,user_type_code
的值为 c & d 在 user_type table.
中找不到
我要实现的是把那些user_type
缺失的数据自动插入到虚拟信息中,以适应以后更正的需要,并保留所有的用户记录。
user_type table(我希望它在最后如何)
code
detail
a
Secretary
b
Accountant
c
Unknown c
d
Unknown d
我当前的实现
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.dialects.mysql import insert
from sqlalchemy.exc import NoReferenceError
# I want to add an implementation of inserting the dummy data in the referenced table (user_type) in this function
def insert_ignore_on_duplicates(table, conn, keys, data_iter):
""" Insert ignore on duplicate primary keys """
try:
insert_stmt = insert(table.table).values(list(data_iter))
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(
insert_stmt.inserted
)
conn.execute(on_duplicate_key_stmt)
except NoReferenceError as error:
print("Error: {}".format(error))
db_engine = create_engine("mysql+mysqlconnector://username:password@localhost:3306/")
user_df = pd.DataFrame() # Assume this contains all the users' data
user_df.to_sql(
"user",
con=db_engine,
if_exists="append",
index=False,
method=insert_ignore_on_duplicates,
chunksize=5000,
)
我正在寻求帮助以了解如何修改此 insert_ignore_on_duplicates
function/method 以允许自动插入丢失的外键引用或可以执行该操作的任何其他方法。
我发现的一些相关问题
- Does SQLAlchemy have an equivalent of Django's get_or_create?
- SQLAlchemy Automatically Create Entry If Doesn't Exist As Foreign Key
- Fastest way to insert object if it doesn't exist with SQLAlchemy
P.S. 我之所以需要这个实现是因为数据很大(>400 万条记录)并且它包含许多不存在的外键因此 实际上无法手动检查 。添加这些主要的虚拟数据将有助于保留所有数据并允许 suitable 将来更正,也许会更新记录 c: Unknown c 到 c: 审计员
您真正需要的是 user_type
table 中缺失代码的列表。你可以这样得到:
import pandas as pd
# example data
user_type = pd.DataFrame(
[("a", "Secretary"), ("b", "Accountant")], columns=["code", "detail"]
)
# (the above would actually be retrieved via `pd.read_sql_table("user_type", engine)`)
user_df = pd.DataFrame(
[
(1, "Jane Doe", "a"),
(2, "John Doe", "a"),
(3, "James Doe", "b"),
(4, "Jeff Doe", "c"),
(5, "Jennifer Doe", "d"),
],
columns=["id", "name", "user_type_code"],
)
# real code starts here
user_type_code_list = user_type["code"].unique()
user_df_code_list = user_df["user_type_code"].unique()
user_types_to_add = pd.DataFrame(
[
(f"{x}", f"Unknown {x}")
for x in user_df_code_list
if x not in user_type_code_list
],
columns=["code", "detail"],
)
print(user_types_to_add)
"""
code detail
0 c Unknown c
1 d Unknown d
"""
然后您可以使用
user_types_to_add.to_sql("user_type", db_engine, index=False, if_exists="append")
将缺少的行添加到 user_type
table,然后是
user_df.to_sql("user", db_engine, index=False, if_exists="append", …)
描述
我正在尝试将数据从 Pandas DataFrame 迁移到 MySQL 数据库 table 但该数据存在一些我想解决的不一致问题 我还没有想出办法。非常感谢任何帮助解决这个问题的人。
我的数据示例:
user_type (table)
code | detail |
---|---|
a | Secretary |
b | Accountant |
user_df(包含我要迁移到用户 table的数据的DataFrame)
id | name | user_type_code (FK: user_type) |
---|---|---|
1 | Jane Doe | a |
2 | John Doe | a |
3 | James Doe | b |
4 | Jeff Doe | c |
5 | Jennifer Doe | d |
从以上数据可以看出,user_type_code
的值为 c & d 在 user_type table.
我要实现的是把那些user_type
缺失的数据自动插入到虚拟信息中,以适应以后更正的需要,并保留所有的用户记录。
user_type table(我希望它在最后如何)
code | detail |
---|---|
a | Secretary |
b | Accountant |
c | Unknown c |
d | Unknown d |
我当前的实现
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.dialects.mysql import insert
from sqlalchemy.exc import NoReferenceError
# I want to add an implementation of inserting the dummy data in the referenced table (user_type) in this function
def insert_ignore_on_duplicates(table, conn, keys, data_iter):
""" Insert ignore on duplicate primary keys """
try:
insert_stmt = insert(table.table).values(list(data_iter))
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(
insert_stmt.inserted
)
conn.execute(on_duplicate_key_stmt)
except NoReferenceError as error:
print("Error: {}".format(error))
db_engine = create_engine("mysql+mysqlconnector://username:password@localhost:3306/")
user_df = pd.DataFrame() # Assume this contains all the users' data
user_df.to_sql(
"user",
con=db_engine,
if_exists="append",
index=False,
method=insert_ignore_on_duplicates,
chunksize=5000,
)
我正在寻求帮助以了解如何修改此 insert_ignore_on_duplicates
function/method 以允许自动插入丢失的外键引用或可以执行该操作的任何其他方法。
我发现的一些相关问题
- Does SQLAlchemy have an equivalent of Django's get_or_create?
- SQLAlchemy Automatically Create Entry If Doesn't Exist As Foreign Key
- Fastest way to insert object if it doesn't exist with SQLAlchemy
P.S. 我之所以需要这个实现是因为数据很大(>400 万条记录)并且它包含许多不存在的外键因此 实际上无法手动检查 。添加这些主要的虚拟数据将有助于保留所有数据并允许 suitable 将来更正,也许会更新记录 c: Unknown c 到 c: 审计员
您真正需要的是 user_type
table 中缺失代码的列表。你可以这样得到:
import pandas as pd
# example data
user_type = pd.DataFrame(
[("a", "Secretary"), ("b", "Accountant")], columns=["code", "detail"]
)
# (the above would actually be retrieved via `pd.read_sql_table("user_type", engine)`)
user_df = pd.DataFrame(
[
(1, "Jane Doe", "a"),
(2, "John Doe", "a"),
(3, "James Doe", "b"),
(4, "Jeff Doe", "c"),
(5, "Jennifer Doe", "d"),
],
columns=["id", "name", "user_type_code"],
)
# real code starts here
user_type_code_list = user_type["code"].unique()
user_df_code_list = user_df["user_type_code"].unique()
user_types_to_add = pd.DataFrame(
[
(f"{x}", f"Unknown {x}")
for x in user_df_code_list
if x not in user_type_code_list
],
columns=["code", "detail"],
)
print(user_types_to_add)
"""
code detail
0 c Unknown c
1 d Unknown d
"""
然后您可以使用
user_types_to_add.to_sql("user_type", db_engine, index=False, if_exists="append")
将缺少的行添加到 user_type
table,然后是
user_df.to_sql("user", db_engine, index=False, if_exists="append", …)