如何构造 Pandas 中不存在的地方?

How to structure WHERE NOT EXISTS in Pandas?

假设您有 dataframe1 和 dataframe2。

然后你需要这样做:

SELECT col1, col2, col3, col4, col5, col6
FROM dataframe1
WHERE NOT EXISTS (
SELECT 1
FROM dataframe2
WHERE dataframe2.col1 = dataframe1.col1
AND dataframe2.col2 = dataframe1.col2
)

我有子查询的结果集。我只是不知道如何将 WHERE NOT EXISTS 翻译成 Pandas。问题是如何将 sql WHERE NOT EXISTS 转换成我可以用 Pandas 做的事情?任何指导表示赞赏。

您可以使用 sqlalchemy

享受两个世界
import numpy      as np
import pandas     as pd
import sqlalchemy as sa


#generate sample datasets
df1 = pd.DataFrame(np.random.randint(1, 10, 600).reshape(100, 6), None, [f'col{i}' for i in range(1, 7)])
df2 = pd.DataFrame(np.random.randint(1, 10, 200).reshape(100, 2), None, [f'col{i}' for i in range(1, 3)])


db = sa.create_engine(r'sqlite://') # or sqlite:///:memory:
df1.to_sql('dataframe1', db)
df2.to_sql('dataframe2', db)

query = '''
SELECT col1, col2, col3, col4, col5, col6
FROM dataframe1
WHERE NOT EXISTS (
SELECT 1
FROM dataframe2
WHERE dataframe2.col1 = dataframe1.col1
AND dataframe2.col2 = dataframe1.col2
)
'''

df_result = pd.read_sql_query(query, db)

-

+----+------+------+------+------+------+------+
|    | col1 | col2 | col3 | col4 | col5 | col6 |
+----+------+------+------+------+------+------+
|  0 |    4 |    3 |    9 |    5 |    7 |    6 |
|  1 |    6 |    7 |    3 |    5 |    5 |    2 |
|  2 |    1 |    5 |    2 |    7 |    5 |    2 |
|  3 |    1 |    3 |    3 |    8 |    6 |    1 |
|  4 |    6 |    1 |    8 |    5 |    7 |    2 |
|  5 |    5 |    4 |    7 |    3 |    2 |    5 |
|  6 |    9 |    5 |    4 |    3 |    5 |    3 |
|  7 |    6 |    3 |    1 |    4 |    2 |    5 |
|  8 |    2 |    2 |    6 |    6 |    1 |    8 |
|  9 |    9 |    9 |    4 |    6 |    4 |    1 |
| 10 |    8 |    2 |    3 |    9 |    6 |    1 |
| 11 |    5 |    1 |    3 |    4 |    6 |    8 |
| 12 |    5 |    2 |    7 |    4 |    3 |    3 |
| 13 |    1 |    6 |    1 |    4 |    5 |    2 |
| 14 |    5 |    7 |    3 |    9 |    1 |    7 |
| 15 |    5 |    2 |    9 |    5 |    9 |    7 |
.
.
.