SQL/Hive 查询:根据成本计算一天的销售数据和多个成本条目的利润信息table

SQL/Hive Query: Calculate the profit information using sales data and multiple cost entries for a day from costs table

我有以下销售数据和购买数据并使用 Hive Rank Over 分区,试图计算利润但它不起作用。

注意:同一 SKU 的成本信息可以在一天内更新多次。

对于订单 1011,成本 table 中没有条目,因此它应该将在 2022-05-19 06:50:20.000 输入的成本视为 32.5.

以下是示例销售数据:

以下是样品成本信息:

我能够理解并使用 window 函数实现,下面是完整的查询:

SELECT sku, sum(s.unit_sale_price*s.quantity_sold) as total_sale, sum(s.quantity_sold) total_quantity_sold, sum(s.quantity_sold * e.considered_cost) total_cost, sum((s.unit_sale_price * s.quantity_sold)-(s.quantity_sold * e.considered_cost)) 利润 FROM 销售人员 内部联接 ( SELECT sku, lead(update_timestamp,1) over ( partition by sku order by update_timestamp desc ) from_timestamp, update_timestamp to_timestamp, lead( cost,1) over ( partition by sku order by update_timestamp desc ) considered_cost, cost as original_cost, rank() OVER (PARTITION BY sku ORDER BY update_timestamp desc) as rk FROM cost ) e on e.sku = s.sku and s.posted_date between e.from_timestamp and e.to_timestamp 按 s.sku

分组

联合所有

SELECT sku, sum(s.unit_sale_prices.quantity_sold) as total_sale, sum(s.quantity_sold) total_quantity_sold , 求和(s.quantity_sold * e.original_cost) total_cost, 求和((s.unit_sale_prices.quantity_sold)-(s.quantity_sold * e.original_cost)) 利润 FROM 销售人员 内部联接 ( SELECT sku, lead(update_timestamp,1) over ( partition by sku order by update_timestamp desc ) from_timestamp, update_timestamp to_timestamp, lead( cost,1) over ( partition by sku order by update_timestamp desc ) considered_cost, cost as original_cost, rank() OVER (PARTITION BY sku ORDER BY update_timestamp desc) as rk FROM cost ) e on e.sku = s.sku and s.posted_date > e.to_timestamp and rk =1 按 s.sku

分组

联合所有

SELECT sku, sum(s.unit_sale_prices.quantity_sold) as total_sale, sum(s.quantity_sold) total_quantity_sold , 求和(s.quantity_sold * e.original_cost) total_cost, 求和((s.unit_sale_prices.quantity_sold)-(s.quantity_sold * e.original_cost)) 利润 FROM 销售人员 内部联接 ( SELECT sku, lead(update_timestamp,1) over ( partition by sku order by update_timestamp asc ) from_timestamp, update_timestamp to_timestamp, lead( cost,1) over ( partition by sku order by update_timestamp asc ) considered_cost, cost as original_cost, rank() OVER (PARTITION BY sku ORDER BY update_timestamp asc) as rk FROM cost ) e on e.sku = s.sku and s.posted_date < e.to_timestamp and rk =1 按 s.sku

分组

解释:

首先,我将成本 table 转换为 from_timestamp 和 to_timstamp,如下所示,

成本转置table查询:

SELECT sku, lead(update_timestamp,1) over ( partition by sku order by update_timestamp desc ) from_timestamp, update_timestamp to_timestamp, lead(cost,1) over ( partition by sku order by update_timestamp desc ) considered_cost, cost as original_cost, rank() OVER (PARTITION BY sku ORDER BY update_timestamp desc) as rk FROM cost

sku , from_timestamp, to_timstamp, cosindered_cost, original_cost, rk

然后应用内部联接与销售额 table 并构建聚合逻辑来计算利润。