如何创建一个仅计算红移上其他列的变化的列？

Question

我有这个数据集：

product   customer    date                        value     buyer_position
A         123455      2020-01-01 00:01:01         100       1
A         123456      2020-01-02 00:02:01         100       2
A         523455      2020-01-02 00:02:05         100       NULL
A         323455      2020-01-03 00:02:07         100       NULL
A         423455      2020-01-03 00:09:01         100       3
B         100455      2020-01-01 00:03:01         100       1
B         999445      2020-01-01 00:04:01         100       NULL
B         122225      2020-01-01 00:04:05         100       2
B         993848      2020-01-01 10:04:05         100       3
B         133225      2020-01-01 11:04:05         100       NULL
B         144225      2020-01-01 12:04:05         100       4

数据集包含公司销售的产品和看到该产品的客户。一个客户可以看到不止一个产品，但是组合产品+客户没有任何重复。我想知道有多少人在客户看到产品之前购买了它。

这将是完美的输出：

product   customer    date                        value     buyer_position     people_before
A         123455      2020-01-01 00:01:01         100       1                  0
A         123456      2020-01-02 00:02:01         100       2                  1
A         523455      2020-01-02 00:02:05         100       NULL               2
A         323455      2020-01-03 00:02:07         100       NULL               2
A         423455      2020-01-03 00:09:01         100       3                  2
B         100455      2020-01-01 00:03:01         100       1                  0
B         999445      2020-01-01 00:04:01         100       NULL               1
B         122225      2020-01-01 00:04:05         100       2                  1
B         993848      2020-01-01 10:04:05         100       3                  2
B         133225      2020-01-01 11:04:05         100       NULL               3
B         144225      2020-01-01 12:04:05         100       4                  3

如您所见，当客户122225看到他想要的商品时，已经有两个人购买了。以客户323455为例，已有两人购买了商品A

我想我应该使用一些 window 函数，比如 lag()。但是 lag() 函数不会得到这个“累积”信息。所以我有点迷路了。

Answer 1

这看起来像 window 对前几行 buyer_position 的非 null 值的计数：

select t.*,
    coalesce(count(buyer_position) over(
        partition by product
        order by date
        rows between unbounded preceding and 1 preceding
    ), 0) as people_before
from mytable t

Answer 2

嗯。 . .如果我没理解错的话，你想要 customer/product 减 1:

的最大买家头寸

select t.*,
       max(buyer_position) over (partition by customer, product order by date rows between unbounded preceding and current row) - 1
from t;

如何创建一个仅计算红移上其他列的变化的列？

How can I create a column which computes only the change of other column on redshift?

sql

count

lag

window-functions

amazon-redshift