Select 一张信用卡的最新余额大于另一张信用卡的人
Select people with latest balance for one credit card being greater than for another
在 PostgreSQL 9.5.3 数据库中,我有一个 credit_card_balances
table 引用一个 persons
table 跟踪与特定人关联的各种信用卡余额:
CREATE TABLE persons (
id serial PRIMARY KEY,
name text
);
CREATE credit_card_balances (
id serial PRIMARY KEY,
card_provider text,
person int REFERENCES persons,
balance decimal,
timestamp timestamp
);
credit_card_balances
的示例行:
id | card_provider | person | balance | timestamp
123 | visa | 1234 | 1.00 | 16-07-26 17:00
我需要检索同时拥有 'visa' 和 'amex' 卡的人的集合,这样 'amex' 上的最新余额=42=] 卡大于 'amex' 卡上的最新余额。
对于每个 (person, card_provider)
,table 中最多可能有大约 100 行。理想情况下,输出列为:
person, provider1_balance, provider2_balance, provider1_timestamp, provider2_timestamp
我知道我可以做类似
的事情
SELECT DISTINCT ON (card_provider) *
FROM credit_card_balances
WHERE person=1234
ORDER BY card_provider, timestamp DESC;
获取特定人每张卡的最新余额。但我不确定如何对所有人执行此操作并验证上述条件,或者这是否是正确的方法。
编辑:正如答案中部分建议的那样,我也可以做类似
的事情
SELECT * from credit_card_balances b1, credit_card_balances b2
WHERE b1.person = b2.person
AND (b1.card_provider = 'amex'
AND b1.timestamp in
(SELECT MAX(time_stamp)
FROM credit_card_balances
WHERE card_provider = 'amex'))
AND (b2.card_provider = 'visa'
AND <... same as above>)
AND b1.balance > b2.balance;
但我注意到这会导致糟糕的性能。所以我觉得这不是一个很好的选择。
使用自联接。类似于:
SELECT * from credit_card_balances b1, credit_card_balances b2
WHERE b1.person = b2.person
AND b1.card_provider = 'amex'
AND b2.card_provider = 'visa'
AND b1.balance > b2.balance;
将此与您已经得出的或多或少的结果相结合,使用视图使查询更易于理解。
CREATE VIEW most_recent_balance AS
SELECT DISTINCT ON (person, card_provider) *
FROM credit_card_balances
GROUP BY id, person
ORDER BY person, card_provider, timestamp DESC;
用此 most_recent_balance 视图代替自连接查询中的 table。
您可以借助嵌套的 select 和 window 函数
select * from (
select *,
rank() over(partition by card_provider order by balance desc) as rank
from credit_card_balances
) credit_card_balances_ranked
where rank = 1
这道题是两个经典的结合:greatest-n-per-group and relational-division.
鉴于您更新后的规格,并且 每个 (person, card_provider)
最多约 100 行,我希望此查询是比我们目前的速度快得多:
SELECT a.person
, a.balance AS amex_balance
, v.balance AS visa_balance
, a.timestamp AS amex_timestamp
, v.timestamp AS visa_timestamp
FROM persons p
CROSS JOIN LATERAL (
SELECT balance, timestamp
FROM credit_card_balances
WHERE person = p.id
AND card_provider = 'amex' -- more selective credit card first to optimize
ORDER BY timestamp DESC
LIMIT 1
) a
JOIN LATERAL (
SELECT balance, timestamp
FROM credit_card_balances
WHERE person = p.id
AND card_provider = 'visa' -- 2nd cc
ORDER BY timestamp DESC
LIMIT 1
) v ON v.balance > a.balance;
索引支持至关重要。这将是理想的案例:
CREATE INDEX ON credit_card_balances (person, card_provider, timestamp DESC, balance);
添加 balance
作为最后一个索引列只有在您从中进行仅索引扫描时才有意义。
这是假设 timestamp
被定义为 NOT NULL
,否则您可能需要添加 need NULLS LAST
来查询 和 索引。
相关:
Optimize GROUP BY query to retrieve latest record per user
How to filter SQL results in a has-many-through relation
对于每个 (person, card_provider)
只有 几行 行,使用 DISTINCT ON
的方法可能更快。单独的 persons
table 无济于事。甜蜜点取决于许多因素。
假设至少有几张张不同的信用卡。
DISTINCT ON
用于一张信用卡,LATERAL
子查询用于另一张:
SELECT a.person
, a.balance AS amex_balance
, v.balance AS visa_balance
, a.timestamp AS amex_timestamp
, v.timestamp AS visa_timestamp
FROM (
SELECT DISTINCT ON (person)
person, balance, timestamp
FROM credit_card_balances
WHERE card_provider = 'amex' -- the more selective credit card first
ORDER BY person, timestamp DESC
) a
JOIN LATERAL (
SELECT balance, timestamp
FROM credit_card_balances
WHERE card_provider = 'visa'
AND person = a.person
ORDER BY timestamp DESC
LIMIT 1
) v ON v.balance > a.balance
DISTINCT ON
每张信用卡,然后加入:
SELECT a.person
, a.balance AS amex_balance
, v.balance AS visa_balance
, a.timestamp AS amex_timestamp
, v.timestamp AS visa_timestamp
FROM (
SELECT DISTINCT ON (person)
person, balance, timestamp
FROM credit_card_balances
WHERE card_provider = 'amex'
ORDER BY person, timestamp DESC
) a
JOIN (
SELECT DISTINCT ON (person)
person, balance, timestamp
FROM credit_card_balances
WHERE card_provider = 'visa'
ORDER BY person, timestamp DESC
) v USING (person)
WHERE v.balance > a.balance;
或者,我的最爱:一张DISTINCT ON
用于两张信用卡,然后用[=27=过滤聚合] 条件:
SELECT person
, max(balance) FILTER (WHERE card_provider = 'amex') AS amex_balance
, max(balance) FILTER (WHERE card_provider = 'visa') AS visa_balance
, max(timestamp) FILTER (WHERE card_provider = 'amex') AS amex_timestamp
, max(timestamp) FILTER (WHERE card_provider = 'visa') AS visa_timestamp
FROM (
SELECT DISTINCT ON (person, card_provider)
person, card_provider, balance, timestamp
FROM credit_card_balances
WHERE card_provider IN ('amex', 'visa')
ORDER BY person, card_provider, timestamp DESC
) c
GROUP BY person
HAVING max(balance) FILTER (WHERE card_provider = 'visa')
> max(balance) FILTER (WHERE card_provider = 'amex');
聚合 FILTER
子句需要 Postgres 9.4+:
How can I simplify this game statistics query?
Select first row in each GROUP BY group?
在 PostgreSQL 9.5.3 数据库中,我有一个 credit_card_balances
table 引用一个 persons
table 跟踪与特定人关联的各种信用卡余额:
CREATE TABLE persons (
id serial PRIMARY KEY,
name text
);
CREATE credit_card_balances (
id serial PRIMARY KEY,
card_provider text,
person int REFERENCES persons,
balance decimal,
timestamp timestamp
);
credit_card_balances
的示例行:
id | card_provider | person | balance | timestamp
123 | visa | 1234 | 1.00 | 16-07-26 17:00
我需要检索同时拥有 'visa' 和 'amex' 卡的人的集合,这样 'amex' 上的最新余额=42=] 卡大于 'amex' 卡上的最新余额。
对于每个 (person, card_provider)
,table 中最多可能有大约 100 行。理想情况下,输出列为:
person, provider1_balance, provider2_balance, provider1_timestamp, provider2_timestamp
我知道我可以做类似
的事情SELECT DISTINCT ON (card_provider) *
FROM credit_card_balances
WHERE person=1234
ORDER BY card_provider, timestamp DESC;
获取特定人每张卡的最新余额。但我不确定如何对所有人执行此操作并验证上述条件,或者这是否是正确的方法。
编辑:正如答案中部分建议的那样,我也可以做类似
的事情SELECT * from credit_card_balances b1, credit_card_balances b2
WHERE b1.person = b2.person
AND (b1.card_provider = 'amex'
AND b1.timestamp in
(SELECT MAX(time_stamp)
FROM credit_card_balances
WHERE card_provider = 'amex'))
AND (b2.card_provider = 'visa'
AND <... same as above>)
AND b1.balance > b2.balance;
但我注意到这会导致糟糕的性能。所以我觉得这不是一个很好的选择。
使用自联接。类似于:
SELECT * from credit_card_balances b1, credit_card_balances b2
WHERE b1.person = b2.person
AND b1.card_provider = 'amex'
AND b2.card_provider = 'visa'
AND b1.balance > b2.balance;
将此与您已经得出的或多或少的结果相结合,使用视图使查询更易于理解。
CREATE VIEW most_recent_balance AS
SELECT DISTINCT ON (person, card_provider) *
FROM credit_card_balances
GROUP BY id, person
ORDER BY person, card_provider, timestamp DESC;
用此 most_recent_balance 视图代替自连接查询中的 table。
您可以借助嵌套的 select 和 window 函数
select * from (
select *,
rank() over(partition by card_provider order by balance desc) as rank
from credit_card_balances
) credit_card_balances_ranked
where rank = 1
这道题是两个经典的结合:greatest-n-per-group and relational-division.
鉴于您更新后的规格,并且 每个 (person, card_provider)
最多约 100 行,我希望此查询是比我们目前的速度快得多:
SELECT a.person
, a.balance AS amex_balance
, v.balance AS visa_balance
, a.timestamp AS amex_timestamp
, v.timestamp AS visa_timestamp
FROM persons p
CROSS JOIN LATERAL (
SELECT balance, timestamp
FROM credit_card_balances
WHERE person = p.id
AND card_provider = 'amex' -- more selective credit card first to optimize
ORDER BY timestamp DESC
LIMIT 1
) a
JOIN LATERAL (
SELECT balance, timestamp
FROM credit_card_balances
WHERE person = p.id
AND card_provider = 'visa' -- 2nd cc
ORDER BY timestamp DESC
LIMIT 1
) v ON v.balance > a.balance;
索引支持至关重要。这将是理想的案例:
CREATE INDEX ON credit_card_balances (person, card_provider, timestamp DESC, balance);
添加 balance
作为最后一个索引列只有在您从中进行仅索引扫描时才有意义。
这是假设 timestamp
被定义为 NOT NULL
,否则您可能需要添加 need NULLS LAST
来查询 和 索引。
相关:
Optimize GROUP BY query to retrieve latest record per user
How to filter SQL results in a has-many-through relation
对于每个 (person, card_provider)
只有 几行 行,使用 DISTINCT ON
的方法可能更快。单独的 persons
table 无济于事。甜蜜点取决于许多因素。
假设至少有几张张不同的信用卡。
DISTINCT ON
用于一张信用卡,LATERAL
子查询用于另一张:
SELECT a.person
, a.balance AS amex_balance
, v.balance AS visa_balance
, a.timestamp AS amex_timestamp
, v.timestamp AS visa_timestamp
FROM (
SELECT DISTINCT ON (person)
person, balance, timestamp
FROM credit_card_balances
WHERE card_provider = 'amex' -- the more selective credit card first
ORDER BY person, timestamp DESC
) a
JOIN LATERAL (
SELECT balance, timestamp
FROM credit_card_balances
WHERE card_provider = 'visa'
AND person = a.person
ORDER BY timestamp DESC
LIMIT 1
) v ON v.balance > a.balance
DISTINCT ON
每张信用卡,然后加入:
SELECT a.person
, a.balance AS amex_balance
, v.balance AS visa_balance
, a.timestamp AS amex_timestamp
, v.timestamp AS visa_timestamp
FROM (
SELECT DISTINCT ON (person)
person, balance, timestamp
FROM credit_card_balances
WHERE card_provider = 'amex'
ORDER BY person, timestamp DESC
) a
JOIN (
SELECT DISTINCT ON (person)
person, balance, timestamp
FROM credit_card_balances
WHERE card_provider = 'visa'
ORDER BY person, timestamp DESC
) v USING (person)
WHERE v.balance > a.balance;
或者,我的最爱:一张DISTINCT ON
用于两张信用卡,然后用[=27=过滤聚合] 条件:
SELECT person
, max(balance) FILTER (WHERE card_provider = 'amex') AS amex_balance
, max(balance) FILTER (WHERE card_provider = 'visa') AS visa_balance
, max(timestamp) FILTER (WHERE card_provider = 'amex') AS amex_timestamp
, max(timestamp) FILTER (WHERE card_provider = 'visa') AS visa_timestamp
FROM (
SELECT DISTINCT ON (person, card_provider)
person, card_provider, balance, timestamp
FROM credit_card_balances
WHERE card_provider IN ('amex', 'visa')
ORDER BY person, card_provider, timestamp DESC
) c
GROUP BY person
HAVING max(balance) FILTER (WHERE card_provider = 'visa')
> max(balance) FILTER (WHERE card_provider = 'amex');
聚合 FILTER
子句需要 Postgres 9.4+:
How can I simplify this game statistics query?
Select first row in each GROUP BY group?