用于根据 Y 列计算 X 列中不同值数量的列?
A Column to count number of distinct values in column X based on column Y?
在 SSMS 2016 中,我有一个包含各种连接的 select 语句,它提供了以下数据:
| box_barcode | order_number | order_shipment_id | item | qty |
|-------------|--------------|-------------------|----------|-----|
| 3330000001 | 0000105 | FP001 | tshirt-S | 1 |
| 3330000001 | 0000105 | FP001 | tshirt-M | 2 |
| 3330000001 | 0000105 | FP001 | tshirt-L | 2 |
| 3330000005 | 0000108 | FP002 | shorts-S | 2 |
| 3330000005 | 0000108 | FP002 | shorts-M | 1 |
| 3330000005 | 0000120 | FP002 | shorts-S | 1 |
| 3330000010 | 0000120 | FP003 | shirts-M | 2 |
| 3330000010 | 0000120 | FP003 | shirts-L | 2 |
| 3330000010 | 0000121 | FP003 | shirts-S | 3 |
| 3330000010 | 0000121 | FP003 | shirts-M | 3 |
| 3330000010 | 0000122 | FP003 | shirts-S | 2 |
我想添加一列来计算每个 box_barcode 的不同 order_numbers 的数量,以获得所需的结果:
| box_barcode | order_number | order_shipment_id | item | qty | count |
|-------------|--------------|-------------------|----------|-----|-------|
| 3330000001 | 0000105 | FP001 | tshirt-S | 1 | 1
| 3330000001 | 0000105 | FP001 | tshirt-M | 2 | 1
| 3330000001 | 0000105 | FP001 | tshirt-L | 2 | 1
| 3330000005 | 0000108 | FP002 | shorts-S | 2 | 2
| 3330000005 | 0000108 | FP002 | shorts-M | 1 | 2
| 3330000005 | 0000120 | FP002 | shorts-S | 1 | 2
| 3330000010 | 0000120 | FP003 | shirts-M | 2 | 3
| 3330000010 | 0000120 | FP003 | shirts-L | 2 | 3
| 3330000010 | 0000121 | FP003 | shirts-S | 3 | 3
| 3330000010 | 0000121 | FP003 | shirts-M | 3 | 3
| 3330000010 | 0000122 | FP003 | shirts-S | 2 | 3
我正在努力寻找实现这一目标的最佳方式。我知道 count(distinct..),但我是否必须将我当前的查询放入子查询中以使计数首先与该查询的结果相悖?
唉,SQL 服务器不支持 count(distinct)
作为 window 功能。但是很容易模仿:
select t.*,
sum(case when seqnum = 1 then 1 else 0 end) over (partition by box_barcode) as distinct_count
from (select t.*,
row_number() over (partition by box_barcode, order_numbers order by box_barcode) as seqnum
from t
) t;
还有 dense_rank
和 max
的选项。
select t.*,
max(rnk) over(partition by box_barcode) as distinct_count
from (select t.*,
dense_rank() over(partition by box_barcode order by order_numbers) as rnk
from t
) t
排名最高的行(使用 dense_rank)将是每个 box_barcode 的不同数量的订单号。
在 SSMS 2016 中,我有一个包含各种连接的 select 语句,它提供了以下数据:
| box_barcode | order_number | order_shipment_id | item | qty |
|-------------|--------------|-------------------|----------|-----|
| 3330000001 | 0000105 | FP001 | tshirt-S | 1 |
| 3330000001 | 0000105 | FP001 | tshirt-M | 2 |
| 3330000001 | 0000105 | FP001 | tshirt-L | 2 |
| 3330000005 | 0000108 | FP002 | shorts-S | 2 |
| 3330000005 | 0000108 | FP002 | shorts-M | 1 |
| 3330000005 | 0000120 | FP002 | shorts-S | 1 |
| 3330000010 | 0000120 | FP003 | shirts-M | 2 |
| 3330000010 | 0000120 | FP003 | shirts-L | 2 |
| 3330000010 | 0000121 | FP003 | shirts-S | 3 |
| 3330000010 | 0000121 | FP003 | shirts-M | 3 |
| 3330000010 | 0000122 | FP003 | shirts-S | 2 |
我想添加一列来计算每个 box_barcode 的不同 order_numbers 的数量,以获得所需的结果:
| box_barcode | order_number | order_shipment_id | item | qty | count |
|-------------|--------------|-------------------|----------|-----|-------|
| 3330000001 | 0000105 | FP001 | tshirt-S | 1 | 1
| 3330000001 | 0000105 | FP001 | tshirt-M | 2 | 1
| 3330000001 | 0000105 | FP001 | tshirt-L | 2 | 1
| 3330000005 | 0000108 | FP002 | shorts-S | 2 | 2
| 3330000005 | 0000108 | FP002 | shorts-M | 1 | 2
| 3330000005 | 0000120 | FP002 | shorts-S | 1 | 2
| 3330000010 | 0000120 | FP003 | shirts-M | 2 | 3
| 3330000010 | 0000120 | FP003 | shirts-L | 2 | 3
| 3330000010 | 0000121 | FP003 | shirts-S | 3 | 3
| 3330000010 | 0000121 | FP003 | shirts-M | 3 | 3
| 3330000010 | 0000122 | FP003 | shirts-S | 2 | 3
我正在努力寻找实现这一目标的最佳方式。我知道 count(distinct..),但我是否必须将我当前的查询放入子查询中以使计数首先与该查询的结果相悖?
唉,SQL 服务器不支持 count(distinct)
作为 window 功能。但是很容易模仿:
select t.*,
sum(case when seqnum = 1 then 1 else 0 end) over (partition by box_barcode) as distinct_count
from (select t.*,
row_number() over (partition by box_barcode, order_numbers order by box_barcode) as seqnum
from t
) t;
还有 dense_rank
和 max
的选项。
select t.*,
max(rnk) over(partition by box_barcode) as distinct_count
from (select t.*,
dense_rank() over(partition by box_barcode order by order_numbers) as rnk
from t
) t
排名最高的行(使用 dense_rank)将是每个 box_barcode 的不同数量的订单号。