SQL 清理历史重复中间值

Question

我注意到我的table有多个冗余值需要清理，它是一个记录价格变化的table，所以我想清理如下：

product | price | date
------------------------
1       | 1     | 1
1       | 1     | 2
1       | 1     | 3
1       | 1     | 4
2       | 77    | 5
1       | 1     | 6
1       | 2     | 7
1       | 2     | 8
1       | 1     | 9
1       | 1     | 10
1       | 1     | 11
1       | 1     | 12
1       | 3     | 13

为此：

product | price | date
------------------------
1       | 1     | 1
2       | 77    | 5
1       | 2     | 7
1       | 1     | 9
1       | 3     | 13

还假设在这种情况下，列 id 与 date 相同。

SELECT DISTINCT ON (product, price) 将不起作用，因为它会忽略 product 1 日 9 或 1 的更改，问题是我想按 product、price 分组，但只能根据 date 的相关更改在特定间隔内进行分组。

即使可以订购 product 也很难忽略 date 和 price 更改订单。

objective是删除所有不在预期结果中的idtable。

有人有什么建议吗？

Answer 1

这是一个间隙和孤岛问题，您希望将具有相同价格的相同产品的相邻行组合在一起。

这是一种使用行号之间的差异来定义组的方法

select product, price, min(date) date
from (
    select 
        t.*,
        row_number() over(partition by product order by date) rn1,
        row_number() over(partition by product, price order by date) rn2
    from mytable t
) t
group by product, price, rn1 - rn2
order by min(date)

Demo on DB Fiddle:

product | price | date
------: | ----: | ---:
      1 |     1 |    1
      2 |    77 |    5
      1 |     2 |    7
      1 |     1 |    9
      1 |     3 |   13

Answer 2

删除重复行并保持打开（使用 min() 或 max() 保留最旧/最新行）

您可以通过分组来过滤哪些列应该确定重复项

DELETE FROM MyTable WHERE RowId NOT IN (SELECT MIN(RowId) FROM MyTable GROUP BY Col1, Col2, Col3);

Answer 3

您似乎想要价格变化时的第一行。如果是这样，我建议 lag():

select t.product, t.product, t.price
from (select t.*,
             lag(price) over (partition by product order by date) as prev_price
      from t
     ) t
where prev_price is null or prev_price <> price;

不需要聚合。这个解决方案应该比使用聚合和 window 函数的解决方案更好。

SQL 清理历史重复中间值

SQL clean History repeated intermediate values

sql

window-functions

gaps-and-islands