如何构造 Pandas 中不存在的地方?
How to structure WHERE NOT EXISTS in Pandas?
假设您有 dataframe1 和 dataframe2。
然后你需要这样做:
SELECT col1, col2, col3, col4, col5, col6
FROM dataframe1
WHERE NOT EXISTS (
SELECT 1
FROM dataframe2
WHERE dataframe2.col1 = dataframe1.col1
AND dataframe2.col2 = dataframe1.col2
)
我有子查询的结果集。我只是不知道如何将 WHERE NOT EXISTS 翻译成 Pandas。问题是如何将 sql WHERE NOT EXISTS 转换成我可以用 Pandas 做的事情?任何指导表示赞赏。
您可以使用 sqlalchemy
享受两个世界
import numpy as np
import pandas as pd
import sqlalchemy as sa
#generate sample datasets
df1 = pd.DataFrame(np.random.randint(1, 10, 600).reshape(100, 6), None, [f'col{i}' for i in range(1, 7)])
df2 = pd.DataFrame(np.random.randint(1, 10, 200).reshape(100, 2), None, [f'col{i}' for i in range(1, 3)])
db = sa.create_engine(r'sqlite://') # or sqlite:///:memory:
df1.to_sql('dataframe1', db)
df2.to_sql('dataframe2', db)
query = '''
SELECT col1, col2, col3, col4, col5, col6
FROM dataframe1
WHERE NOT EXISTS (
SELECT 1
FROM dataframe2
WHERE dataframe2.col1 = dataframe1.col1
AND dataframe2.col2 = dataframe1.col2
)
'''
df_result = pd.read_sql_query(query, db)
-
+----+------+------+------+------+------+------+
| | col1 | col2 | col3 | col4 | col5 | col6 |
+----+------+------+------+------+------+------+
| 0 | 4 | 3 | 9 | 5 | 7 | 6 |
| 1 | 6 | 7 | 3 | 5 | 5 | 2 |
| 2 | 1 | 5 | 2 | 7 | 5 | 2 |
| 3 | 1 | 3 | 3 | 8 | 6 | 1 |
| 4 | 6 | 1 | 8 | 5 | 7 | 2 |
| 5 | 5 | 4 | 7 | 3 | 2 | 5 |
| 6 | 9 | 5 | 4 | 3 | 5 | 3 |
| 7 | 6 | 3 | 1 | 4 | 2 | 5 |
| 8 | 2 | 2 | 6 | 6 | 1 | 8 |
| 9 | 9 | 9 | 4 | 6 | 4 | 1 |
| 10 | 8 | 2 | 3 | 9 | 6 | 1 |
| 11 | 5 | 1 | 3 | 4 | 6 | 8 |
| 12 | 5 | 2 | 7 | 4 | 3 | 3 |
| 13 | 1 | 6 | 1 | 4 | 5 | 2 |
| 14 | 5 | 7 | 3 | 9 | 1 | 7 |
| 15 | 5 | 2 | 9 | 5 | 9 | 7 |
.
.
.
假设您有 dataframe1 和 dataframe2。
然后你需要这样做:
SELECT col1, col2, col3, col4, col5, col6
FROM dataframe1
WHERE NOT EXISTS (
SELECT 1
FROM dataframe2
WHERE dataframe2.col1 = dataframe1.col1
AND dataframe2.col2 = dataframe1.col2
)
我有子查询的结果集。我只是不知道如何将 WHERE NOT EXISTS 翻译成 Pandas。问题是如何将 sql WHERE NOT EXISTS 转换成我可以用 Pandas 做的事情?任何指导表示赞赏。
您可以使用 sqlalchemy
享受两个世界import numpy as np
import pandas as pd
import sqlalchemy as sa
#generate sample datasets
df1 = pd.DataFrame(np.random.randint(1, 10, 600).reshape(100, 6), None, [f'col{i}' for i in range(1, 7)])
df2 = pd.DataFrame(np.random.randint(1, 10, 200).reshape(100, 2), None, [f'col{i}' for i in range(1, 3)])
db = sa.create_engine(r'sqlite://') # or sqlite:///:memory:
df1.to_sql('dataframe1', db)
df2.to_sql('dataframe2', db)
query = '''
SELECT col1, col2, col3, col4, col5, col6
FROM dataframe1
WHERE NOT EXISTS (
SELECT 1
FROM dataframe2
WHERE dataframe2.col1 = dataframe1.col1
AND dataframe2.col2 = dataframe1.col2
)
'''
df_result = pd.read_sql_query(query, db)
-
+----+------+------+------+------+------+------+
| | col1 | col2 | col3 | col4 | col5 | col6 |
+----+------+------+------+------+------+------+
| 0 | 4 | 3 | 9 | 5 | 7 | 6 |
| 1 | 6 | 7 | 3 | 5 | 5 | 2 |
| 2 | 1 | 5 | 2 | 7 | 5 | 2 |
| 3 | 1 | 3 | 3 | 8 | 6 | 1 |
| 4 | 6 | 1 | 8 | 5 | 7 | 2 |
| 5 | 5 | 4 | 7 | 3 | 2 | 5 |
| 6 | 9 | 5 | 4 | 3 | 5 | 3 |
| 7 | 6 | 3 | 1 | 4 | 2 | 5 |
| 8 | 2 | 2 | 6 | 6 | 1 | 8 |
| 9 | 9 | 9 | 4 | 6 | 4 | 1 |
| 10 | 8 | 2 | 3 | 9 | 6 | 1 |
| 11 | 5 | 1 | 3 | 4 | 6 | 8 |
| 12 | 5 | 2 | 7 | 4 | 3 | 3 |
| 13 | 1 | 6 | 1 | 4 | 5 | 2 |
| 14 | 5 | 7 | 3 | 9 | 1 | 7 |
| 15 | 5 | 2 | 9 | 5 | 9 | 7 |
.
.
.