对深度层次化的数据进行分组时如何处理 SQL 聚合函数
How to deal with SQL Aggregate Functions when Grouping deeply hiearched data
给定以下场景:
我有一个包含 5 个表的数据库:
- 货币 (iso_number, iso_code),
- 产品(id,名称,current_price),
- 销售 (id, time_of_sale, currency_items_sold_in),
- sale_lines (id, sale_id, product_id, price_paid, 数量),
- cash_transactions (id, sale_id, received_currency_id, converted_currency_id, received_amount, converted_amount)
该设置允许存储客户最初提供的货币种类、内部兑换成的货币以及原始金额和兑换(转换)金额。
我希望能够找到符合特定条件(时间段、卖家、商店)等((为简单起见而省略))的所有销售。
对于所有这些销售,我将加入相关数据,即 sale_lines 和 cash_transactions。现在 sale_lines 上的货币始终与相关销售中的货币相匹配。
但是,对于 cash_transactions,received_amount/received_currency 可能与销售货币不同。尽管 converted_currency/converted_amount 存储在 cash_transaction 行中,但它应该跟随销售。
当我尝试对某些字段执行 SUM 时,当您开始加入一对多关系然后执行聚合函数(如 SUM)时出现问题,即使您在幕后指定了正确的 GROUP BY SQL 如果我们不使用 GROUP BY,服务器仍然会对显示数据所需的重复行求和。
这里也描述了这个问题:
https://wikido.isoftdata.com/index.php/The_GROUPing_pitfall
按照上面文章的解决方案,在我的例子中,我应该将每次销售的汇总结果左联接到外部查询中。
但是当 sale_lines 货币与销售匹配,但 cash_transactions 货币可能与销售不同时,我该怎么办?
我尝试创建以下 SQL Fiddle 插入一些测试数据并突出显示问题:http://sqlfiddle.com/#!17/54a7b/15
在 fiddle 中,我创建了 2 个销售项目,其中的商品以丹麦克朗 (208) 和 752(瑞典克朗) 出售。
第一次销售有2条销售线,2次现金交易第一次交易直接DKK => DKK,第二次交易SEK => DKK。
第二次sale也有2条sale lines,2笔现金交易,第一笔交易NOK => DKK,第二笔交易直接DKK => DKK。
在 fiddle 的最后一个查询中,可以观察到 total_received_amount 是假的,因为它是 DKK、SEK 和 NOK 的混合,没有提供太多价值。
我想要有关如何正确获取数据的建议,我不在乎是否必须在服务器端 (PHP) 执行额外的“逻辑”以删除一些重复数据只要总和正确的数据。
非常感谢任何建议。
DDL 来自 FIDDLE
CREATE TABLE currency (
iso_number CHARACTER VARYING(3) PRIMARY KEY,
iso_code CHARACTER VARYING(3)
);
INSERT INTO currency(iso_number, iso_code) VALUES ('208','DKK'), ('752','SEK'), ('572','NOK');
CREATE TABLE product (
id SERIAL PRIMARY KEY,
name CHARACTER VARYING(12),
current_price INTEGER
);
INSERT INTO product(id,name,current_price) VALUES (1,'icecream',200), (2,'sunglasses',300);
CREATE TABLE sale (
id SERIAL PRIMARY KEY,
time_of_sale TIMESTAMP,
currency_items_sold_in CHARACTER VARYING(3)
);
INSERT INTO sale(id, time_of_sale, currency_items_sold_in)
VALUES
(1, CURRENT_TIMESTAMP, '208'),
(2, CURRENT_TIMESTAMP, '752')
;
CREATE TABLE sale_lines (
id SERIAL PRIMARY KEY,
sale_id INTEGER,
product_id INTEGER,
price_paid INTEGER,
quantity FLOAT
);
INSERT INTO sale_lines(id, sale_id, product_id, price_paid, quantity)
VALUES
(1, 1, 1, 200, 1.0),
(2, 1, 2, 300, 1.0),
(3, 2, 1, 100, 1.0),
(4, 2, 1, 100, 1.0)
;
CREATE TABLE cash_transactions (
id SERIAL PRIMARY KEY,
sale_id INTEGER,
received_currency_id CHARACTER VARYING(3),
converted_currency_id CHARACTER VARYING(3),
received_amount INTEGER,
converted_amount INTEGER
);
INSERT INTO cash_transactions(id, sale_id, received_currency_id, converted_currency_id, received_amount, converted_amount)
VALUES
(1, 1, '208', '208', 200, 200),
(2, 1, '752', '208', 400, 300),
(3, 2, '572', '208', 150, 100),
(4, 2, '208', '208', 100, 100)
;
来自 FIDDLE
的查询
--SELECT * FROM currency;
--SELECT * FROM product;
--SELECT * FROM sale;
--SELECT * FROM sale_lines;
--SELECT * FROM cash_transactions;
--- Showing the sales with duplicated lines to
--- fit joined data for OneToMany SaleLines, and OneToMany cash transactions.
SELECT *
FROM sale s
LEFT JOIN sale_lines sl ON sl.sale_id = s.id
LEFT JOIN cash_transactions ct ON ct.sale_id = s.id;
--- Grouping the data by important identifier "currency_items_sold_in".
--- The SUM of sl.price_paid is wrong as it SUMS the duplicated lines as well.
SELECT
s.currency_items_sold_in,
SUM(sl.price_paid) as "price_paid"
FROM sale s
LEFT JOIN sale_lines sl ON sl.sale_id = s.id
LEFT JOIN cash_transactions ct ON ct.sale_id = s.id
GROUP BY s.currency_items_sold_in;
--- To solve this the SUM can be joined via the "Monkey-Poop" method.
--- Here the problem arises, the SUMS for cash_transaction.received_amount and cash_transaction.converted_amount cannot be relied upon
--- As those fields themselves rely on cash_transaction.received_currency_id and cash_transaction.converted_currency_id
SELECT
s.currency_items_sold_in,
SUM(sale_line_aggregates.price_paid) as "total_price_paid",
SUM(cash_transaction_aggregates.converted_amount) as "total_converted_amount",
SUM(cash_transaction_aggregates.received_amount) as "total_received_amount"
FROM sale s
LEFT JOIN (
SELECT
sale_id,
SUM(price_paid) AS price_paid
FROM sale_lines
GROUP BY sale_id
) AS sale_line_aggregates ON sale_line_aggregates.sale_id = s.id
LEFT JOIN (
SELECT
sale_id,
SUM(converted_amount) as converted_amount,
SUM(received_amount) as received_amount
FROM cash_transactions
GROUP BY sale_id
) AS cash_transaction_aggregates ON cash_transaction_aggregates.sale_id = s.id
GROUP BY s.currency_items_sold_in;
您可以计算子查询中按货币分组的每个金额。
那就和他们一起上币吧。
使用 CTE,您可以确保每个子查询使用相同的销售额。
WITH CTE_SALE AS (
SELECT
id as sale_id,
currency_items_sold_in AS iso_number
FROM sale
)
SELECT curr.iso_code AS currency
, COALESCE(line.price_paid, 0) as total_price_paid
, COALESCE(received.amount, 0) as total_received_amount
, COALESCE(converted.amount, 0) as total_converted_amount
FROM currency AS curr
LEFT JOIN (
SELECT s.iso_number
, SUM(sl.price_paid) AS price_paid
FROM sale_lines sl
JOIN CTE_SALE s ON s.sale_id = sl.sale_id
GROUP BY s.iso_number
) AS line
ON line.iso_number = curr.iso_number
LEFT JOIN (
SELECT tr.received_currency_id as iso_number
, SUM(tr.received_amount) AS amount
FROM cash_transactions tr
JOIN CTE_SALE s ON s.sale_id = tr.sale_id
GROUP BY tr.received_currency_id
) AS received
ON received.iso_number = curr.iso_number
LEFT JOIN (
SELECT tr.converted_currency_id as iso_number
, SUM(tr.converted_amount) AS amount
FROM cash_transactions AS tr
JOIN CTE_SALE s ON s.sale_id = tr.sale_id
GROUP BY tr.converted_currency_id
) AS converted
ON converted.iso_number = curr.iso_number;
currency | total_price_paid | total_received_amount | total_converted_amount
:------- | ---------------: | --------------------: | ---------------------:
DKK | 500 | 300 | 700
SEK | 200 | 400 | 0
NOK | 0 | 150 | 0
db<>fiddle here
给定以下场景:
我有一个包含 5 个表的数据库:
- 货币 (iso_number, iso_code),
- 产品(id,名称,current_price),
- 销售 (id, time_of_sale, currency_items_sold_in),
- sale_lines (id, sale_id, product_id, price_paid, 数量),
- cash_transactions (id, sale_id, received_currency_id, converted_currency_id, received_amount, converted_amount)
该设置允许存储客户最初提供的货币种类、内部兑换成的货币以及原始金额和兑换(转换)金额。
我希望能够找到符合特定条件(时间段、卖家、商店)等((为简单起见而省略))的所有销售。
对于所有这些销售,我将加入相关数据,即 sale_lines 和 cash_transactions。现在 sale_lines 上的货币始终与相关销售中的货币相匹配。 但是,对于 cash_transactions,received_amount/received_currency 可能与销售货币不同。尽管 converted_currency/converted_amount 存储在 cash_transaction 行中,但它应该跟随销售。
当我尝试对某些字段执行 SUM 时,当您开始加入一对多关系然后执行聚合函数(如 SUM)时出现问题,即使您在幕后指定了正确的 GROUP BY SQL 如果我们不使用 GROUP BY,服务器仍然会对显示数据所需的重复行求和。
这里也描述了这个问题: https://wikido.isoftdata.com/index.php/The_GROUPing_pitfall
按照上面文章的解决方案,在我的例子中,我应该将每次销售的汇总结果左联接到外部查询中。
但是当 sale_lines 货币与销售匹配,但 cash_transactions 货币可能与销售不同时,我该怎么办?
我尝试创建以下 SQL Fiddle 插入一些测试数据并突出显示问题:http://sqlfiddle.com/#!17/54a7b/15
在 fiddle 中,我创建了 2 个销售项目,其中的商品以丹麦克朗 (208) 和 752(瑞典克朗) 出售。 第一次销售有2条销售线,2次现金交易第一次交易直接DKK => DKK,第二次交易SEK => DKK。
第二次sale也有2条sale lines,2笔现金交易,第一笔交易NOK => DKK,第二笔交易直接DKK => DKK。
在 fiddle 的最后一个查询中,可以观察到 total_received_amount 是假的,因为它是 DKK、SEK 和 NOK 的混合,没有提供太多价值。
我想要有关如何正确获取数据的建议,我不在乎是否必须在服务器端 (PHP) 执行额外的“逻辑”以删除一些重复数据只要总和正确的数据。
非常感谢任何建议。
DDL 来自 FIDDLE
CREATE TABLE currency (
iso_number CHARACTER VARYING(3) PRIMARY KEY,
iso_code CHARACTER VARYING(3)
);
INSERT INTO currency(iso_number, iso_code) VALUES ('208','DKK'), ('752','SEK'), ('572','NOK');
CREATE TABLE product (
id SERIAL PRIMARY KEY,
name CHARACTER VARYING(12),
current_price INTEGER
);
INSERT INTO product(id,name,current_price) VALUES (1,'icecream',200), (2,'sunglasses',300);
CREATE TABLE sale (
id SERIAL PRIMARY KEY,
time_of_sale TIMESTAMP,
currency_items_sold_in CHARACTER VARYING(3)
);
INSERT INTO sale(id, time_of_sale, currency_items_sold_in)
VALUES
(1, CURRENT_TIMESTAMP, '208'),
(2, CURRENT_TIMESTAMP, '752')
;
CREATE TABLE sale_lines (
id SERIAL PRIMARY KEY,
sale_id INTEGER,
product_id INTEGER,
price_paid INTEGER,
quantity FLOAT
);
INSERT INTO sale_lines(id, sale_id, product_id, price_paid, quantity)
VALUES
(1, 1, 1, 200, 1.0),
(2, 1, 2, 300, 1.0),
(3, 2, 1, 100, 1.0),
(4, 2, 1, 100, 1.0)
;
CREATE TABLE cash_transactions (
id SERIAL PRIMARY KEY,
sale_id INTEGER,
received_currency_id CHARACTER VARYING(3),
converted_currency_id CHARACTER VARYING(3),
received_amount INTEGER,
converted_amount INTEGER
);
INSERT INTO cash_transactions(id, sale_id, received_currency_id, converted_currency_id, received_amount, converted_amount)
VALUES
(1, 1, '208', '208', 200, 200),
(2, 1, '752', '208', 400, 300),
(3, 2, '572', '208', 150, 100),
(4, 2, '208', '208', 100, 100)
;
来自 FIDDLE
的查询--SELECT * FROM currency;
--SELECT * FROM product;
--SELECT * FROM sale;
--SELECT * FROM sale_lines;
--SELECT * FROM cash_transactions;
--- Showing the sales with duplicated lines to
--- fit joined data for OneToMany SaleLines, and OneToMany cash transactions.
SELECT *
FROM sale s
LEFT JOIN sale_lines sl ON sl.sale_id = s.id
LEFT JOIN cash_transactions ct ON ct.sale_id = s.id;
--- Grouping the data by important identifier "currency_items_sold_in".
--- The SUM of sl.price_paid is wrong as it SUMS the duplicated lines as well.
SELECT
s.currency_items_sold_in,
SUM(sl.price_paid) as "price_paid"
FROM sale s
LEFT JOIN sale_lines sl ON sl.sale_id = s.id
LEFT JOIN cash_transactions ct ON ct.sale_id = s.id
GROUP BY s.currency_items_sold_in;
--- To solve this the SUM can be joined via the "Monkey-Poop" method.
--- Here the problem arises, the SUMS for cash_transaction.received_amount and cash_transaction.converted_amount cannot be relied upon
--- As those fields themselves rely on cash_transaction.received_currency_id and cash_transaction.converted_currency_id
SELECT
s.currency_items_sold_in,
SUM(sale_line_aggregates.price_paid) as "total_price_paid",
SUM(cash_transaction_aggregates.converted_amount) as "total_converted_amount",
SUM(cash_transaction_aggregates.received_amount) as "total_received_amount"
FROM sale s
LEFT JOIN (
SELECT
sale_id,
SUM(price_paid) AS price_paid
FROM sale_lines
GROUP BY sale_id
) AS sale_line_aggregates ON sale_line_aggregates.sale_id = s.id
LEFT JOIN (
SELECT
sale_id,
SUM(converted_amount) as converted_amount,
SUM(received_amount) as received_amount
FROM cash_transactions
GROUP BY sale_id
) AS cash_transaction_aggregates ON cash_transaction_aggregates.sale_id = s.id
GROUP BY s.currency_items_sold_in;
您可以计算子查询中按货币分组的每个金额。 那就和他们一起上币吧。
使用 CTE,您可以确保每个子查询使用相同的销售额。
WITH CTE_SALE AS ( SELECT id as sale_id, currency_items_sold_in AS iso_number FROM sale ) SELECT curr.iso_code AS currency , COALESCE(line.price_paid, 0) as total_price_paid , COALESCE(received.amount, 0) as total_received_amount , COALESCE(converted.amount, 0) as total_converted_amount FROM currency AS curr LEFT JOIN ( SELECT s.iso_number , SUM(sl.price_paid) AS price_paid FROM sale_lines sl JOIN CTE_SALE s ON s.sale_id = sl.sale_id GROUP BY s.iso_number ) AS line ON line.iso_number = curr.iso_number LEFT JOIN ( SELECT tr.received_currency_id as iso_number , SUM(tr.received_amount) AS amount FROM cash_transactions tr JOIN CTE_SALE s ON s.sale_id = tr.sale_id GROUP BY tr.received_currency_id ) AS received ON received.iso_number = curr.iso_number LEFT JOIN ( SELECT tr.converted_currency_id as iso_number , SUM(tr.converted_amount) AS amount FROM cash_transactions AS tr JOIN CTE_SALE s ON s.sale_id = tr.sale_id GROUP BY tr.converted_currency_id ) AS converted ON converted.iso_number = curr.iso_number;
currency | total_price_paid | total_received_amount | total_converted_amount :------- | ---------------: | --------------------: | ---------------------: DKK | 500 | 300 | 700 SEK | 200 | 400 | 0 NOK | 0 | 150 | 0
db<>fiddle here