SQL 查询唯一金额的总和,删除重复项
SQL Query for sum of unique amounts, remove duplicates
考虑以下 MySQL table 模式:
id int,
amount decimal,
transaction_no,
location_id int,
created_at datetime
以上架构用于存储餐厅的 POS 收据。用于获取收据计数及其总和的每日报告。尝试了以下查询:
SELECT location_id,count(distinct(transaction_no)) as count,sum(amount) as receipt_amount FROM `receipts` WHERE date(`receipts`.`created_at`) = '2015-05-17' GROUP BY `receipts`.`location_id`
但问题是具有相同交易编号的收据会重复多次,每次金额 may/may 都不会不同。处理此问题的业务规则是我们收到的最后一张收据是最新的。所以上面的查询不起作用。
我想要做的是:
- 对于每个位置,获取当天的所有收据。
- 如果交易号重复,根据created_at
获取最后收到的收据
- So amounts of amounts col.
[编辑]
这里是查询计划:
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 25814155
filtered: 100.00
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: r
type: ref
possible_keys: punchh_key_location_id_created_at
key: punchh_key_location_id_created_at
key_len: 50
ref: t.punchh_key
rows: 1
filtered: 100.00
Extra: Using index condition; Using where
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: r
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 25814155
filtered: 100.00
Extra: Using temporary; Using filesort
3 rows in set, 1 warning (0.00 sec)
您也可以使用 sum
中修改的 distinct
:
SELECT location_id,
COUNT(DISTINCT transaction_no) AS cnt,
SUM(DISTINCT amount) AS receipt_amount
FROM `receipts`
WHERE DATE(`receipts`.`created_at`) = '2015-05-17'
GROUP BY `receipts`.`location_id`
您可以通过加入内联视图来确定同一天中最后一个 created_at
值的金额,该视图确定每个 transaction_no
的最后一个 created_at
天.
这避免了简单地使用 sum(distinct ...
,否则两笔金额相同的不同交易(如果存在)将只计算一次。
这种方法应该可以避免这个问题。
select r.location_id,
count(*) as num_transactions,
sum(r.amount) as receipt_amount
from receipts r
join (
select transaction_no,
max(created_at) as last_created_at_for_trans
from receipts
where created_at like '2015-05-17%'
group by transaction_no
) v
on r.transaction_no = v.transaction_no
and r.created_at = v.last_created_at_for_trans
where r.created_at like '2015-05-17%'
group by r.location_id
另一种方法是使用 not exists
,您可能想测试一下哪种方法性能更好:
select r.location_id,
count(*) as num_transactions,
sum(r.amount) as receipt_amount
from receipts r
where r.created_at like '2015-05-17%'
and not exists ( select 1
from receipts x
where x.transaction_no = r.transaction_no
and x.created_at > r.created_at
)
group by r.location_id
如何计算在 多 天内重复的交易?
我估计你其实不是想算一笔交易,只是因为是当天最后一笔,如果第二天还有收据的话。您可以通过多种方式获取每笔交易的最终记录。一种典型的方法是使用 group by
(这类似于 Brian 的查询,但略有不同):
select r.*
from receipts r join
(select transaction_no, max(created_at) as maxca
from receipts r
group by transaction_no
) t
on r.transaction_no = t.transaction_no and r.created_at = t.maxca;
完整的查询是:
select location_id, count(*) as numtransactions, sum(amount) as receipt_amount
from receipts r join
(select transaction_no, max(created_at) as maxca
from receipts r
group by transaction_no
) t
on r.transaction_no = t.transaction_no and r.created_at = t.maxca;
where r.created_at >= date('2015-05-17') and r.created_at < date('2015-05-18')
group by location_id;
关于日期比较的注意事项。
你原来的date(r.created_at) = '2015-05-17'
形式在逻辑上是正确的。但是,使用 date()
意味着不能使用索引。对常量进行两次比较的形式将允许查询利用 receipts(created_at)
.
上的索引
不鼓励使用 like
作为日期。这需要将日期 隐式 转换为字符串,然后作为字符串进行比较。这会产生不必要的转换,并且在某些数据库中会使语义依赖于全球化设置。
考虑以下 MySQL table 模式:
id int,
amount decimal,
transaction_no,
location_id int,
created_at datetime
以上架构用于存储餐厅的 POS 收据。用于获取收据计数及其总和的每日报告。尝试了以下查询:
SELECT location_id,count(distinct(transaction_no)) as count,sum(amount) as receipt_amount FROM `receipts` WHERE date(`receipts`.`created_at`) = '2015-05-17' GROUP BY `receipts`.`location_id`
但问题是具有相同交易编号的收据会重复多次,每次金额 may/may 都不会不同。处理此问题的业务规则是我们收到的最后一张收据是最新的。所以上面的查询不起作用。
我想要做的是:
- 对于每个位置,获取当天的所有收据。
- 如果交易号重复,根据created_at 获取最后收到的收据
- So amounts of amounts col.
[编辑]
这里是查询计划:
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 25814155
filtered: 100.00
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: r
type: ref
possible_keys: punchh_key_location_id_created_at
key: punchh_key_location_id_created_at
key_len: 50
ref: t.punchh_key
rows: 1
filtered: 100.00
Extra: Using index condition; Using where
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: r
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 25814155
filtered: 100.00
Extra: Using temporary; Using filesort
3 rows in set, 1 warning (0.00 sec)
您也可以使用 sum
中修改的 distinct
:
SELECT location_id,
COUNT(DISTINCT transaction_no) AS cnt,
SUM(DISTINCT amount) AS receipt_amount
FROM `receipts`
WHERE DATE(`receipts`.`created_at`) = '2015-05-17'
GROUP BY `receipts`.`location_id`
您可以通过加入内联视图来确定同一天中最后一个 created_at
值的金额,该视图确定每个 transaction_no
的最后一个 created_at
天.
这避免了简单地使用 sum(distinct ...
,否则两笔金额相同的不同交易(如果存在)将只计算一次。
这种方法应该可以避免这个问题。
select r.location_id,
count(*) as num_transactions,
sum(r.amount) as receipt_amount
from receipts r
join (
select transaction_no,
max(created_at) as last_created_at_for_trans
from receipts
where created_at like '2015-05-17%'
group by transaction_no
) v
on r.transaction_no = v.transaction_no
and r.created_at = v.last_created_at_for_trans
where r.created_at like '2015-05-17%'
group by r.location_id
另一种方法是使用 not exists
,您可能想测试一下哪种方法性能更好:
select r.location_id,
count(*) as num_transactions,
sum(r.amount) as receipt_amount
from receipts r
where r.created_at like '2015-05-17%'
and not exists ( select 1
from receipts x
where x.transaction_no = r.transaction_no
and x.created_at > r.created_at
)
group by r.location_id
如何计算在 多 天内重复的交易?
我估计你其实不是想算一笔交易,只是因为是当天最后一笔,如果第二天还有收据的话。您可以通过多种方式获取每笔交易的最终记录。一种典型的方法是使用 group by
(这类似于 Brian 的查询,但略有不同):
select r.*
from receipts r join
(select transaction_no, max(created_at) as maxca
from receipts r
group by transaction_no
) t
on r.transaction_no = t.transaction_no and r.created_at = t.maxca;
完整的查询是:
select location_id, count(*) as numtransactions, sum(amount) as receipt_amount
from receipts r join
(select transaction_no, max(created_at) as maxca
from receipts r
group by transaction_no
) t
on r.transaction_no = t.transaction_no and r.created_at = t.maxca;
where r.created_at >= date('2015-05-17') and r.created_at < date('2015-05-18')
group by location_id;
关于日期比较的注意事项。
你原来的date(r.created_at) = '2015-05-17'
形式在逻辑上是正确的。但是,使用 date()
意味着不能使用索引。对常量进行两次比较的形式将允许查询利用 receipts(created_at)
.
不鼓励使用 like
作为日期。这需要将日期 隐式 转换为字符串,然后作为字符串进行比较。这会产生不必要的转换,并且在某些数据库中会使语义依赖于全球化设置。