SQL 查询唯一金额的总和,删除重复项

SQL Query for sum of unique amounts, remove duplicates

考虑以下 MySQL table 模式:

id int,
amount decimal,
transaction_no,
location_id int,
created_at datetime

以上架构用于存储餐厅的 POS 收据。用于获取收据计数及其总和的每日报告。尝试了以下查询:

SELECT location_id,count(distinct(transaction_no)) as count,sum(amount) as receipt_amount FROM `receipts`  WHERE date(`receipts`.`created_at`) = '2015-05-17' GROUP BY `receipts`.`location_id`

但问题是具有相同交易编号的收据会重​​复多次,每次金额 may/may 都不会不同。处理此问题的业务规则是我们收到的最后一张收据是最新的。所以上面的查询不起作用。

我想要做的是:

  1. 对于每个位置,获取当天的所有收据。
  2. 如果交易号重复,根据created_at
  3. 获取最后收到的收据
  4. So amounts of amounts col.

[编辑]

这里是查询计划:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 25814155
     filtered: 100.00
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: r
         type: ref
possible_keys: punchh_key_location_id_created_at
          key: punchh_key_location_id_created_at
      key_len: 50
          ref: t.punchh_key
         rows: 1
     filtered: 100.00
        Extra: Using index condition; Using where
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: r
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 25814155
     filtered: 100.00
        Extra: Using temporary; Using filesort
3 rows in set, 1 warning (0.00 sec)

您也可以使用 sum 中修改的 distinct

SELECT   location_id,
         COUNT(DISTINCT transaction_no) AS cnt,
         SUM(DISTINCT amount) AS receipt_amount 
FROM     `receipts`  
WHERE    DATE(`receipts`.`created_at`) = '2015-05-17' 
GROUP BY `receipts`.`location_id`

您可以通过加入内联视图来确定同一天中最后一个 created_at 值的金额,该视图确定每个 transaction_no 的最后一个 created_at天.

这避免了简单地使用 sum(distinct ...,否则两笔金额相同的不同交易(如果存在)将只计算一次。

这种方法应该可以避免这个问题。

select      r.location_id,
            count(*) as num_transactions,
            sum(r.amount) as receipt_amount
from        receipts r
       join (
                select      transaction_no,
                            max(created_at) as last_created_at_for_trans
                from        receipts
                where       created_at like '2015-05-17%'
                group by    transaction_no
            ) v
         on r.transaction_no = v.transaction_no
        and r.created_at = v.last_created_at_for_trans
where       r.created_at like '2015-05-17%'
group by    r.location_id

另一种方法是使用 not exists,您可能想测试一下哪种方法性能更好:

select      r.location_id,
            count(*) as num_transactions,
            sum(r.amount) as receipt_amount
from        receipts r
where       r.created_at like '2015-05-17%'
        and not exists ( select 1
                         from   receipts x
                         where  x.transaction_no = r.transaction_no
                            and x.created_at > r.created_at
                       )
group by    r.location_id

如何计算在 天内重复的交易?

我估计你其实不是想算一笔交易,只是因为是当天最后一笔,如果第二天还有收据的话。您可以通过多种方式获取每笔交易的最终记录。一种典型的方法是使用 group by(这类似于 Brian 的查询,但略有不同):

select r.*
from receipts r join
     (select transaction_no, max(created_at) as maxca
      from receipts r
      group by transaction_no
     ) t
     on r.transaction_no = t.transaction_no and r.created_at = t.maxca;

完整的查询是:

select location_id, count(*) as numtransactions, sum(amount) as receipt_amount
from receipts r join
     (select transaction_no, max(created_at) as maxca
      from receipts r
      group by transaction_no
     ) t
     on r.transaction_no = t.transaction_no and r.created_at = t.maxca;
where r.created_at >= date('2015-05-17') and r.created_at < date('2015-05-18')
group by location_id;

关于日期比较的注意事项。

你原来的date(r.created_at) = '2015-05-17'形式在逻辑上是正确的。但是,使用 date() 意味着不能使用索引。对常量进行两次比较的形式将允许查询利用 receipts(created_at).

上的索引

不鼓励使用 like 作为日期。这需要将日期 隐式 转换为字符串,然后作为字符串进行比较。这会产生不必要的转换,并且在某些数据库中会使语义依赖于全球化设置。