Select 一张信用卡的最新余额大于另一张信用卡的人

Select people with latest balance for one credit card being greater than for another

在 PostgreSQL 9.5.3 数据库中,我有一个 credit_card_balances table 引用一个 persons table 跟踪与特定人关联的各种信用卡余额:

CREATE TABLE persons (
  id serial PRIMARY KEY,
  name text
);

CREATE credit_card_balances (
  id serial PRIMARY KEY,
  card_provider text, 
  person int REFERENCES persons,
  balance decimal, 
  timestamp timestamp
);

credit_card_balances 的示例行:

id  |  card_provider | person  | balance | timestamp
123 |  visa          | 1234    | 1.00    | 16-07-26 17:00

我需要检索同时拥有 'visa' 和 'amex' 卡的人的集合,这样 'amex' 上的最新余额=42=] 卡大于 'amex' 卡上的最新余额。

对于每个 (person, card_provider),table 中最多可能有大约 100 行。理想情况下,输出列为:

person, provider1_balance, provider2_balance, provider1_timestamp, provider2_timestamp

我知道我可以做类似

的事情
SELECT DISTINCT ON (card_provider) *
FROM credit_card_balances 
WHERE person=1234
ORDER BY card_provider, timestamp DESC;

获取特定人每张卡的最新余额。但我不确定如何对所有人执行此操作并验证上述条件,或者这是否是正确的方法。

编辑:正如答案中部分建议的那样,我也可以做类似

的事情
SELECT * from credit_card_balances b1, credit_card_balances b2
WHERE b1.person = b2.person
AND (b1.card_provider = 'amex' 
     AND b1.timestamp in
        (SELECT MAX(time_stamp) 
         FROM credit_card_balances 
         WHERE card_provider = 'amex'))

AND (b2.card_provider = 'visa'
     AND <... same as above>)
AND b1.balance > b2.balance;

但我注意到这会导致糟糕的性能。所以我觉得这不是一个很好的选择。

使用自联接。类似于:

SELECT * from credit_card_balances b1, credit_card_balances b2
WHERE b1.person = b2.person
  AND b1.card_provider = 'amex'
  AND b2.card_provider = 'visa'
  AND b1.balance > b2.balance;

将此与您已经得出的或多或少的结果相结合,使用视图使查询更易于理解。

CREATE VIEW most_recent_balance AS
  SELECT DISTINCT ON (person, card_provider) *
    FROM credit_card_balances 
   GROUP BY id, person
   ORDER BY person, card_provider, timestamp DESC;

用此 most_recent_balance 视图代替自连接查询中的 table。

您可以借助嵌套的 select 和 window 函数

select * from (
     select *, 
       rank() over(partition by card_provider order by balance desc) as rank 
     from credit_card_balances
) credit_card_balances_ranked
where rank = 1

这道题是两个经典的结合: and .

鉴于您更新后的规格,并且 每个 (person, card_provider) 最多约 100 行,我希望此查询是比我们目前的速度快得多:

SELECT a.person
     , a.balance   AS amex_balance
     , v.balance   AS visa_balance
     , a.timestamp AS amex_timestamp
     , v.timestamp AS visa_timestamp
FROM   persons p
CROSS  JOIN LATERAL (
   SELECT balance, timestamp
   FROM   credit_card_balances 
   WHERE  person = p.id
   AND    card_provider = 'amex'  -- more selective credit card first to optimize
   ORDER  BY timestamp DESC
   LIMIT  1
   ) a
JOIN   LATERAL (
   SELECT balance, timestamp
   FROM   credit_card_balances 
   WHERE  person = p.id
   AND    card_provider = 'visa'  -- 2nd cc
   ORDER  BY timestamp DESC
   LIMIT  1
   ) v ON v.balance > a.balance;

索引支持至关重要。这将是理想的案例:

CREATE INDEX ON credit_card_balances (person, card_provider, timestamp DESC, balance);

添加 balance 作为最后一个索引列只有在您从中进行仅索引扫描时才有意义。

这是假设 timestamp 被定义为 NOT NULL,否则您可能需要添加 need NULLS LAST 来查询 索引。

相关:

  • Optimize GROUP BY query to retrieve latest record per user

  • How to filter SQL results in a has-many-through relation


对于每个 (person, card_provider) 只有 几行 行,使用 DISTINCT ON 的方法可能更快。单独的 persons table 无济于事。甜蜜点取决于许多因素。

假设至少有几张张不同的信用卡。

DISTINCT ON 用于一张信用卡,LATERAL 子查询用于另一张:

SELECT a.person
     , a.balance   AS amex_balance
     , v.balance   AS visa_balance
     , a.timestamp AS amex_timestamp
     , v.timestamp AS visa_timestamp
FROM  (
   SELECT DISTINCT ON (person)
          person, balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider = 'amex'  -- the more selective credit card first
   ORDER  BY person, timestamp DESC
   ) a
JOIN  LATERAL (
   SELECT balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider = 'visa'
   AND    person = a.person
   ORDER  BY timestamp DESC
   LIMIT  1
   ) v ON v.balance > a.balance

DISTINCT ON每张信用卡,然后加入:

SELECT a.person
     , a.balance   AS amex_balance
     , v.balance   AS visa_balance
     , a.timestamp AS amex_timestamp
     , v.timestamp AS visa_timestamp
FROM  (
   SELECT DISTINCT ON (person)
          person, balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider = 'amex'
   ORDER  BY person, timestamp DESC
   ) a
JOIN  (
   SELECT DISTINCT ON (person)
          person, balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider = 'visa'
   ORDER  BY person, timestamp DESC
   ) v USING (person)
WHERE  v.balance > a.balance;

或者,我的最爱:一张DISTINCT ON用于两张信用卡,然后用[=27=过滤聚合] 条件:

SELECT person
     , max(balance)   FILTER (WHERE card_provider = 'amex') AS amex_balance
     , max(balance)   FILTER (WHERE card_provider = 'visa') AS visa_balance
     , max(timestamp) FILTER (WHERE card_provider = 'amex') AS amex_timestamp
     , max(timestamp) FILTER (WHERE card_provider = 'visa') AS visa_timestamp
FROM  (
   SELECT DISTINCT ON (person, card_provider)
          person, card_provider, balance, timestamp
   FROM   credit_card_balances 
   WHERE  card_provider IN ('amex', 'visa')
   ORDER  BY person, card_provider, timestamp DESC
   ) c
GROUP  BY person
HAVING max(balance) FILTER (WHERE card_provider = 'visa')
     > max(balance) FILTER (WHERE card_provider = 'amex');

聚合 FILTER 子句需要 Postgres 9.4+:

  • How can I simplify this game statistics query?

  • Select first row in each GROUP BY group?