无法计算中位数 - SQL Server 2017
Unable to calculate median - SQL Server 2017
我正在尝试计算每个类别中交易的中位数。
一些注意事项(因为下面的数据集是更大数据集的一小段):
- 一名员工可以属于多个类别
- 每笔交易的中位数应 > 0
- 并非每个人都出现在每个类别中
数据是这样设置的:
| Person | Category | Transaction |
|:-------:|:--------:|:-----------:|
| PersonA | Sales | 27 |
| PersonB | Sales | 75 |
| PersonC | Sales | 87 |
| PersonD | Sales | 36 |
| PersonE | Sales | 70 |
| PersonB | Buys | 60 |
| PersonC | Buys | 92 |
| PersonD | Buys | 39 |
| PersonA | HR | 59 |
| PersonB | HR | 53 |
| PersonC | HR | 98 |
| PersonD | HR | 54 |
| PersonE | HR | 70 |
| PersonA | Other | 46 |
| PersonC | Other | 66 |
| PersonD | Other | 76 |
| PersonB | Other | 2 |
理想的输出应该是这样的:
| Category | Median | Average |
|:--------:|:------:|:-------:|
| Sales | 70 | 59 |
| Buys | 60 | 64 |
| HR | 59 | 67 |
| Other | 56 | 48 |
我可以通过以下方式获得平均值:
SELECT
Category,
AVG(Transaction) AS Average_Transactions
FROM
table
GROUP BY
Category
效果很好!
This post 试图帮我找到中位数。我写的是:
SELECT
Category,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Transaction) OVER (PARTITION BY Category) AS Median_Transactions
FROM
table
GROUP BY
Category
但是我得到一个错误:
Msg 8120: Column 'Transactions' is invalid in the select list because it is not contained in either an aggregate function or the **GROUP BY** clause
我该如何解决这个问题?
您可以使用 SELECT DISTINCT
做您想做的事:
SELECT DISTINCT Category,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Transaction) OVER (PARTITION BY Category) AS Median_Transactions
FROM table;
不幸的是,SQL 服务器不提供 PERCENTILE_
函数作为 window 函数,并且没有 MEDIAN()
聚合函数。您也可以使用子查询和计数来执行此操作。
这不是最佳解决方案,但这是您的解决方案
SELECT DISTINCT
category,
PERCENTILE_DISC(0.5)WITHIN GROUP(ORDER BY val) OVER (PARTITION BY category) AS Median_Transactions,
AVG(val) OVER (PARTITION BY d.category) [AVG]
FROM #data d;
我不认为这很漂亮,但它确实有效。没花时间打磨
with
avg_t as
( select category, avg(sales) as avg_sales
from sample
group by 1),
mn as
( select category, avg(sales) as median_sales
from (
select category, sales ,
row_number() over (partition by category order by sales asc) as r ,
count(person) over (partition by category) as total_count
from sample
) mn_sub
where (total_count % 2 = 0 and r in ( (total_count/2), ((total_count/2)+1)) ) or
(total_count % 2 <> 0 and r = ((total_count+1)/2))
group by 1
)
select avg_t.category, avg_t.avg_sales, mn.median_sales
from avg_t
inner join mn
on avg_t.category=mn.category
我正在尝试计算每个类别中交易的中位数。 一些注意事项(因为下面的数据集是更大数据集的一小段):
- 一名员工可以属于多个类别
- 每笔交易的中位数应 > 0
- 并非每个人都出现在每个类别中
数据是这样设置的:
| Person | Category | Transaction |
|:-------:|:--------:|:-----------:|
| PersonA | Sales | 27 |
| PersonB | Sales | 75 |
| PersonC | Sales | 87 |
| PersonD | Sales | 36 |
| PersonE | Sales | 70 |
| PersonB | Buys | 60 |
| PersonC | Buys | 92 |
| PersonD | Buys | 39 |
| PersonA | HR | 59 |
| PersonB | HR | 53 |
| PersonC | HR | 98 |
| PersonD | HR | 54 |
| PersonE | HR | 70 |
| PersonA | Other | 46 |
| PersonC | Other | 66 |
| PersonD | Other | 76 |
| PersonB | Other | 2 |
理想的输出应该是这样的:
| Category | Median | Average |
|:--------:|:------:|:-------:|
| Sales | 70 | 59 |
| Buys | 60 | 64 |
| HR | 59 | 67 |
| Other | 56 | 48 |
我可以通过以下方式获得平均值:
SELECT
Category,
AVG(Transaction) AS Average_Transactions
FROM
table
GROUP BY
Category
效果很好!
This post 试图帮我找到中位数。我写的是:
SELECT
Category,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Transaction) OVER (PARTITION BY Category) AS Median_Transactions
FROM
table
GROUP BY
Category
但是我得到一个错误:
Msg 8120: Column 'Transactions' is invalid in the select list because it is not contained in either an aggregate function or the **GROUP BY** clause
我该如何解决这个问题?
您可以使用 SELECT DISTINCT
做您想做的事:
SELECT DISTINCT Category,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Transaction) OVER (PARTITION BY Category) AS Median_Transactions
FROM table;
不幸的是,SQL 服务器不提供 PERCENTILE_
函数作为 window 函数,并且没有 MEDIAN()
聚合函数。您也可以使用子查询和计数来执行此操作。
这不是最佳解决方案,但这是您的解决方案
SELECT DISTINCT
category,
PERCENTILE_DISC(0.5)WITHIN GROUP(ORDER BY val) OVER (PARTITION BY category) AS Median_Transactions,
AVG(val) OVER (PARTITION BY d.category) [AVG]
FROM #data d;
我不认为这很漂亮,但它确实有效。没花时间打磨
with
avg_t as
( select category, avg(sales) as avg_sales
from sample
group by 1),
mn as
( select category, avg(sales) as median_sales
from (
select category, sales ,
row_number() over (partition by category order by sales asc) as r ,
count(person) over (partition by category) as total_count
from sample
) mn_sub
where (total_count % 2 = 0 and r in ( (total_count/2), ((total_count/2)+1)) ) or
(total_count % 2 <> 0 and r = ((total_count+1)/2))
group by 1
)
select avg_t.category, avg_t.avg_sales, mn.median_sales
from avg_t
inner join mn
on avg_t.category=mn.category