根据同一 table 中是否存在相关行获取行
Get rows based on the existence of related rows in the same table
这是我的数据的样子。如果我执行以下查询:
select * from gdax_trades where order_type='limit' limit 5;
我得到一个 return,看起来像这样:
row_id | order_id | price | funds | maker_order_id | taker_order_id | trade_id | product_id | client_oid | reason | remaining_size | size | sequence | side | time | order_type | event_type
---------+--------------------------------------+---------+-------+----------------+----------------+----------+------------+--------------------------------------+--------+----------------+------------+------------+------+-------------------------+------------+------------
3697499 | 01d63a5b-a5b7-4153-b93d-bd18c249d9c3 | 4113.06 | | | | | BTC-USD | 50028bab-81da-4842-98f0-2a1206669567 | | | 0.01 | 7446101470 | buy | 2018-11-29 04:15:39.047 | limit | received
3697501 | 9295111b-2e23-445c-9f52-52d2f26fb418 | 4131.93 | | | | | BTC-USD | de58f4a6-4577-4680-b083-df34ade6c001 | | | 0.12792387 | 7446101472 | sell | 2018-11-29 04:15:39.071 | limit | received
3697504 | 4c09878d-8bf9-49d7-9fc7-ca81b7da9e42 | 4131.19 | | | | | BTC-USD | a55e0315-8b65-4525-a7a7-debcf6f17bb5 | | | 0.10898271 | 7446101475 | sell | 2018-11-29 04:15:39.155 | limit | received
3697506 | 0a157570-a811-420e-81ff-0ead9cc34984 | 4132.69 | | | | | BTC-USD | 45086077-34be-441e-947f-99fe60bd88ef | | | 0.12146031 | 7446101477 | sell | 2018-11-29 04:15:39.24 | limit | received
3697508 | e8e1d02f-e627-4eac-a2e5-61c08399d6ef | 4117.83 | | | | | BTC-USD | 00000000-818a-0006-0001-000011037107 | | | 0.001 | 7446101479 | sell | 2018-11-29 04:15:39.259 | limit | received
(5 rows)
table 中还有其他行对应每个 order_id
但没有 order_type='limit'
。例如,如果我尝试查找与第一个 order_id
:
对应的所有行
select * from gdax_trades where order_id='01d63a5b-a5b7-4153-b93d-bd18c249d9c3';
我得到:
row_id | order_id | price | funds | maker_order_id | taker_order_id | trade_id | product_id | client_oid | reason | remaining_size | size | sequence | side | time | order_type | event_type
---------+--------------------------------------+---------+-------+----------------+----------------+----------+------------+--------------------------------------+----------+----------------+------+------------+------+-------------------------+------------+------------
3697499 | 01d63a5b-a5b7-4153-b93d-bd18c249d9c3 | 4113.06 | | | | | BTC-USD | 50028bab-81da-4842-98f0-2a1206669567 | | | 0.01 | 7446101470 | buy | 2018-11-29 04:15:39.047 | limit | received
3697500 | 01d63a5b-a5b7-4153-b93d-bd18c249d9c3 | 4113.06 | | | | | BTC-USD | | | 0.01 | | 7446101471 | buy | 2018-11-29 04:15:39.047 | | open
3697662 | 01d63a5b-a5b7-4153-b93d-bd18c249d9c3 | 4113.06 | | | | | BTC-USD | | canceled | 0.01 | | 7446101633 | buy | 2018-11-29 04:15:40.522 | | done
(3 rows)
我想要的是一个 SQLAlchemy 查询,它 return 对我的行具有 order_id
对应于 "limit" 订单。我尝试进行自引用连接:
GDAXTradeAlias = aliased(GDAXTrade)
orders = (
sess
.query(GDAXTrade)
.filter( GDAXTrade.time.between(start_dt, end_dt) )
.filter(GDAXTrade.order_type=='limit')
.join(GDAXTradeAlias, GDAXTrade.order_id==GDAXTradeAlias.order_id)
.filter( GDAXTrade.time.between(start_dt, end_dt) )
.all() )
但这并没有让我得到想要的结果。有人有什么建议吗?
多种方式。我建议使用 EXISTS
半连接。可能是最快的,而且阅读起来非常清晰:
SELECT *
FROM gdax_trades g
WHERE EXISTS (
SELECT FROM gdax_trades
WHERE order_type = 'limit'
AND order_id = g.order_id
);
EXISTS
表达式的 SELECT
列表可以留空。只有至少一行的存在是相关的。
我们至少需要一个 table 别名(示例中的 g
),当两次寻址相同的 table 时。我没有 table-qualify 引用子查询中本地 table 的列,因为它首先是可见的。仅将对外部查询的引用限定为 g.order_id
。这是明确的最低要求。如果你愿意,你可以更明确。
这 在结果中包含 "limit" 个订单。您可以通过添加最后一个来轻松排除它们:
...
WHERE order_type IS DISTINCT FROM 'limit'
IS DISTINCT FROM
因为 order_type
似乎可以为空(不清楚样本结果中的那些是 ''
还是 NULL
)。 WHERE order_type <> 'limit'
将排除具有 order_type IS NULL
.
的行
查询 return 来自外部 table 的唯一行,即使有多个 "limit" 订单具有相同的 order_id
。在这种情况下,具有连接或子查询的各种替代查询技术 return 重复。相关:
- How do I (or can I) SELECT DISTINCT on multiple columns?
我使用子查询找到了答案。我很好奇人们对此有何看法
sub_query = (
sess
.query(GDAXTrade)
.filter( GDAXTrade.time.between(start_dt, end_dt) )
.filter(GDAXTrade.order_type=='limit')
.subquery()
)
orders = (
sess
.query(GDAXTrade)
.join(sub_query, GDAXTrade.order_id==sub_query.c.order_id, isouter=True)
.filter(GDAXTrade.order_id==sub_query.c.order_id)
.filter( GDAXTrade.time.between(start_dt, end_dt) )
.order_by(GDAXTrade.time.asc())
.all()
)
这是我的数据的样子。如果我执行以下查询:
select * from gdax_trades where order_type='limit' limit 5;
我得到一个 return,看起来像这样:
row_id | order_id | price | funds | maker_order_id | taker_order_id | trade_id | product_id | client_oid | reason | remaining_size | size | sequence | side | time | order_type | event_type
---------+--------------------------------------+---------+-------+----------------+----------------+----------+------------+--------------------------------------+--------+----------------+------------+------------+------+-------------------------+------------+------------
3697499 | 01d63a5b-a5b7-4153-b93d-bd18c249d9c3 | 4113.06 | | | | | BTC-USD | 50028bab-81da-4842-98f0-2a1206669567 | | | 0.01 | 7446101470 | buy | 2018-11-29 04:15:39.047 | limit | received
3697501 | 9295111b-2e23-445c-9f52-52d2f26fb418 | 4131.93 | | | | | BTC-USD | de58f4a6-4577-4680-b083-df34ade6c001 | | | 0.12792387 | 7446101472 | sell | 2018-11-29 04:15:39.071 | limit | received
3697504 | 4c09878d-8bf9-49d7-9fc7-ca81b7da9e42 | 4131.19 | | | | | BTC-USD | a55e0315-8b65-4525-a7a7-debcf6f17bb5 | | | 0.10898271 | 7446101475 | sell | 2018-11-29 04:15:39.155 | limit | received
3697506 | 0a157570-a811-420e-81ff-0ead9cc34984 | 4132.69 | | | | | BTC-USD | 45086077-34be-441e-947f-99fe60bd88ef | | | 0.12146031 | 7446101477 | sell | 2018-11-29 04:15:39.24 | limit | received
3697508 | e8e1d02f-e627-4eac-a2e5-61c08399d6ef | 4117.83 | | | | | BTC-USD | 00000000-818a-0006-0001-000011037107 | | | 0.001 | 7446101479 | sell | 2018-11-29 04:15:39.259 | limit | received
(5 rows)
table 中还有其他行对应每个 order_id
但没有 order_type='limit'
。例如,如果我尝试查找与第一个 order_id
:
select * from gdax_trades where order_id='01d63a5b-a5b7-4153-b93d-bd18c249d9c3';
我得到:
row_id | order_id | price | funds | maker_order_id | taker_order_id | trade_id | product_id | client_oid | reason | remaining_size | size | sequence | side | time | order_type | event_type
---------+--------------------------------------+---------+-------+----------------+----------------+----------+------------+--------------------------------------+----------+----------------+------+------------+------+-------------------------+------------+------------
3697499 | 01d63a5b-a5b7-4153-b93d-bd18c249d9c3 | 4113.06 | | | | | BTC-USD | 50028bab-81da-4842-98f0-2a1206669567 | | | 0.01 | 7446101470 | buy | 2018-11-29 04:15:39.047 | limit | received
3697500 | 01d63a5b-a5b7-4153-b93d-bd18c249d9c3 | 4113.06 | | | | | BTC-USD | | | 0.01 | | 7446101471 | buy | 2018-11-29 04:15:39.047 | | open
3697662 | 01d63a5b-a5b7-4153-b93d-bd18c249d9c3 | 4113.06 | | | | | BTC-USD | | canceled | 0.01 | | 7446101633 | buy | 2018-11-29 04:15:40.522 | | done
(3 rows)
我想要的是一个 SQLAlchemy 查询,它 return 对我的行具有 order_id
对应于 "limit" 订单。我尝试进行自引用连接:
GDAXTradeAlias = aliased(GDAXTrade)
orders = (
sess
.query(GDAXTrade)
.filter( GDAXTrade.time.between(start_dt, end_dt) )
.filter(GDAXTrade.order_type=='limit')
.join(GDAXTradeAlias, GDAXTrade.order_id==GDAXTradeAlias.order_id)
.filter( GDAXTrade.time.between(start_dt, end_dt) )
.all() )
但这并没有让我得到想要的结果。有人有什么建议吗?
多种方式。我建议使用 EXISTS
半连接。可能是最快的,而且阅读起来非常清晰:
SELECT *
FROM gdax_trades g
WHERE EXISTS (
SELECT FROM gdax_trades
WHERE order_type = 'limit'
AND order_id = g.order_id
);
EXISTS
表达式的 SELECT
列表可以留空。只有至少一行的存在是相关的。
我们至少需要一个 table 别名(示例中的 g
),当两次寻址相同的 table 时。我没有 table-qualify 引用子查询中本地 table 的列,因为它首先是可见的。仅将对外部查询的引用限定为 g.order_id
。这是明确的最低要求。如果你愿意,你可以更明确。
这 在结果中包含 "limit" 个订单。您可以通过添加最后一个来轻松排除它们:
...
WHERE order_type IS DISTINCT FROM 'limit'
IS DISTINCT FROM
因为 order_type
似乎可以为空(不清楚样本结果中的那些是 ''
还是 NULL
)。 WHERE order_type <> 'limit'
将排除具有 order_type IS NULL
.
查询 return 来自外部 table 的唯一行,即使有多个 "limit" 订单具有相同的 order_id
。在这种情况下,具有连接或子查询的各种替代查询技术 return 重复。相关:
- How do I (or can I) SELECT DISTINCT on multiple columns?
我使用子查询找到了答案。我很好奇人们对此有何看法
sub_query = (
sess
.query(GDAXTrade)
.filter( GDAXTrade.time.between(start_dt, end_dt) )
.filter(GDAXTrade.order_type=='limit')
.subquery()
)
orders = (
sess
.query(GDAXTrade)
.join(sub_query, GDAXTrade.order_id==sub_query.c.order_id, isouter=True)
.filter(GDAXTrade.order_id==sub_query.c.order_id)
.filter( GDAXTrade.time.between(start_dt, end_dt) )
.order_by(GDAXTrade.time.asc())
.all()
)