如何创建一个仅计算红移上其他列的变化的列?
How can I create a column which computes only the change of other column on redshift?
我有这个数据集:
product customer date value buyer_position
A 123455 2020-01-01 00:01:01 100 1
A 123456 2020-01-02 00:02:01 100 2
A 523455 2020-01-02 00:02:05 100 NULL
A 323455 2020-01-03 00:02:07 100 NULL
A 423455 2020-01-03 00:09:01 100 3
B 100455 2020-01-01 00:03:01 100 1
B 999445 2020-01-01 00:04:01 100 NULL
B 122225 2020-01-01 00:04:05 100 2
B 993848 2020-01-01 10:04:05 100 3
B 133225 2020-01-01 11:04:05 100 NULL
B 144225 2020-01-01 12:04:05 100 4
数据集包含公司销售的产品和看到该产品的客户。一个客户可以看到不止一个产品,但是组合产品+客户没有任何重复。我想知道有多少人在客户看到产品之前购买了它。
这将是完美的输出:
product customer date value buyer_position people_before
A 123455 2020-01-01 00:01:01 100 1 0
A 123456 2020-01-02 00:02:01 100 2 1
A 523455 2020-01-02 00:02:05 100 NULL 2
A 323455 2020-01-03 00:02:07 100 NULL 2
A 423455 2020-01-03 00:09:01 100 3 2
B 100455 2020-01-01 00:03:01 100 1 0
B 999445 2020-01-01 00:04:01 100 NULL 1
B 122225 2020-01-01 00:04:05 100 2 1
B 993848 2020-01-01 10:04:05 100 3 2
B 133225 2020-01-01 11:04:05 100 NULL 3
B 144225 2020-01-01 12:04:05 100 4 3
如您所见,当客户122225看到他想要的商品时,已经有两个人购买了。以客户323455为例,已有两人购买了商品A
我想我应该使用一些 window 函数,比如 lag()。但是 lag() 函数不会得到这个“累积”信息。所以我有点迷路了。
这看起来像 window 对前几行 buyer_position
的非 null
值的计数:
select t.*,
coalesce(count(buyer_position) over(
partition by product
order by date
rows between unbounded preceding and 1 preceding
), 0) as people_before
from mytable t
嗯。 . .如果我没理解错的话,你想要 customer/product 减 1:
的最大买家头寸
select t.*,
max(buyer_position) over (partition by customer, product order by date rows between unbounded preceding and current row) - 1
from t;
我有这个数据集:
product customer date value buyer_position
A 123455 2020-01-01 00:01:01 100 1
A 123456 2020-01-02 00:02:01 100 2
A 523455 2020-01-02 00:02:05 100 NULL
A 323455 2020-01-03 00:02:07 100 NULL
A 423455 2020-01-03 00:09:01 100 3
B 100455 2020-01-01 00:03:01 100 1
B 999445 2020-01-01 00:04:01 100 NULL
B 122225 2020-01-01 00:04:05 100 2
B 993848 2020-01-01 10:04:05 100 3
B 133225 2020-01-01 11:04:05 100 NULL
B 144225 2020-01-01 12:04:05 100 4
数据集包含公司销售的产品和看到该产品的客户。一个客户可以看到不止一个产品,但是组合产品+客户没有任何重复。我想知道有多少人在客户看到产品之前购买了它。
这将是完美的输出:
product customer date value buyer_position people_before
A 123455 2020-01-01 00:01:01 100 1 0
A 123456 2020-01-02 00:02:01 100 2 1
A 523455 2020-01-02 00:02:05 100 NULL 2
A 323455 2020-01-03 00:02:07 100 NULL 2
A 423455 2020-01-03 00:09:01 100 3 2
B 100455 2020-01-01 00:03:01 100 1 0
B 999445 2020-01-01 00:04:01 100 NULL 1
B 122225 2020-01-01 00:04:05 100 2 1
B 993848 2020-01-01 10:04:05 100 3 2
B 133225 2020-01-01 11:04:05 100 NULL 3
B 144225 2020-01-01 12:04:05 100 4 3
如您所见,当客户122225看到他想要的商品时,已经有两个人购买了。以客户323455为例,已有两人购买了商品A
我想我应该使用一些 window 函数,比如 lag()。但是 lag() 函数不会得到这个“累积”信息。所以我有点迷路了。
这看起来像 window 对前几行 buyer_position
的非 null
值的计数:
select t.*,
coalesce(count(buyer_position) over(
partition by product
order by date
rows between unbounded preceding and 1 preceding
), 0) as people_before
from mytable t
嗯。 . .如果我没理解错的话,你想要 customer/product 减 1:
的最大买家头寸select t.*,
max(buyer_position) over (partition by customer, product order by date rows between unbounded preceding and current row) - 1
from t;