分区条件语句

Question

我有各个商店销售的产品的数据。对于某些商店，它们以 PROMO_FLG 映射的折扣出售。我想显示两个 COUNT PARTITION 列。

+-------------------------+--------------+---------------------+
| Store                   | Item         | PROMO_FLG|
|-------------------------+--------------+---------------------|
| 1                       |            1 |                   0 |
| 2                       |            1 |                   1 |
| 3                       |            1 |                   0 |
| 4                       |            1 |                   0 |
| 5                       |            1 |                   1 |
| 6                       |            1 |                   1 |
| 7                       |            1 |                   1 |
| 8                       |            1 |                   0 |
| 9                       |            1 |                   0 |
| 10                      |            1 |                   0 |
+-------------------------+--------------+---------------------+

先显示所有有该商品的店铺（已完成）

COUNT(DISTINCT STORE) OVER (PARTITION ITEM) 会给出 10

第二个 - 我正在寻找 - 仅计算这些在 PROMO_FLG = 1 属性中具有价值的商店。

那应该给我们 4

的价值

Answer 1

我想你想要：

select t.*,
       count(*) over (partition by item) as num_stores,
       sum(promo_flg) over (partition by item) as num_promo_1
from t;

如果您确实需要不同的计数：

select t.*,
       count(distinct store) over (partition by item) as num_stores,
       count(distinct case when promo_flg = 1 then store end) over (partition by item) as num_promo_1
from t;

Here 是一个 db<>fiddle。 fiddle 使用 Oracle，因为它支持 COUNT(DISTINCT) 作为 window 函数。

如果 window 函数不起作用，这里有一个替代方法：

select *
from t join
     (select item, count(distinct store) as num_stores, count(distinct case when promo_flg = 1 then store end) as num_stores_promo
      from t
      group by item
     ) tt
     using (item);

Answer 2

第二个使用 Gordon SQL 但显示它在 Snowflake 中工作

select v.*
    ,count(distinct store) over (partition by item) as num_stores
    ,count(distinct iff(promo_flg = 1, store, null)) over (partition by item) as num_dis_promo_stores
    ,sum(iff(promo_flg = 1, 1, 0)) over (partition by item) as num_sum_promo_stores
from values
  (1 , 1, 0 ),
  (2 , 1, 1 ),
  (3 , 1, 0 ),
  (4 , 1, 0 ),
  (5 , 1, 1 ),
  (6 , 1, 1 ),
  (7 , 1, 1 ),
  (8 , 1, 0 ),
  (9 , 1, 0 ),
  (10, 1, 0 )
  v(store, item, promo_flg) ;

给出：

STORE   ITEM    PROMO_FLG   NUM_STORES  NUM_DIS_PROMO_STORES    NUM_SUM_PROMO_STORES
1       1       0           10          4                       4
2       1       1           10          4                       4
3       1       0           10          4                       4
4       1       0           10          4                       4
5       1       1           10          4                       4
6       1       1           10          4                       4
7       1       1           10          4                       4
8       1       0           10          4                       4
9       1       0           10          4                       4
10      1       0           10          4                       4

因此，根据您是想要非重复计数还是总和，我使用了雪花支持的非标准 SQL 形式 iff 因为我更喜欢它更小 sql. 但是你可以看到它们在工作。

测试 Gordon 的第二个案例 count(distinct case when promo_flg = 1 then store end) over (partition by item) as num_promo_1 工作正常。

为了回应关于 Gordon 答案的 Marcin2x4 问题，您从方法中得到了不同的结果 if/when 数据与您描述的方式不同。因此，如果您的商店有一个项目和多行 promo_flg 存在。或者，如果 promo_flg 具有非零值：

select v.*
    ,count(distinct store) over (partition by item) as num_stores
    ,count(distinct iff(promo_flg = 1, store, null)) over (partition by item) as num_dis_promo_stores
    ,sum(iff(promo_flg <> 0, 1, 0)) over (partition by item) as num_sum_promo_stores
    ,sum(promo_flg) over (partition by item) as num_promo_1
    ,count(distinct case when promo_flg = 1 then store end) over (partition by item) as num_promo_1
from values
  (1 , 1, 0 ),
  (2 , 1, 1 ),
  (3 , 1, 0 ),
  (4 , 1, 0 ),
  (5 , 1, 1 ),
  (6 , 1, 1 ),
  (7 , 1, 1 ),
  (8 , 1, 0 ),
  (9 , 1, 0 ),
  (10, 1, 0 ),
  (7, 1, 1 ),
  (7, 1, 2 )
  v(store, item, promo_flg) ;

然后num_dis_promo_stores&num_promo_1给出4，num_sum_promo_stores给出6，&num_promo_1给出7

分区条件语句

Partition by with condition statement

sql

database

analysis

snowflake-cloud-data-platform