SQL 用于查找属于某个类别的一组数据的媒体的查询

SQL query for finding the media of a set of data belonging to a category

我有一个 table 包含不同类别的分箱数据,例如:

category, bin, frequency
a, 0, 10
a, 1, 20
a, 2, 30
a, 3, 15
b, 0, 18
b, 1, 54
b, 2, 33
b, 3, 24

我需要找到每个类别的近似值中位数。为此,我想计算每个类别的累积百分比直方图,并将第一个值设为 50% 以上。我知道如何为一个类别执行此操作:

SELECT category, bin as approx_median
FROM (
SELECT category, bin, frequency,
    (SELECT SUM(frequency) FROM table sub WHERE sub.bin <= base.bin) 
    / (SELECT SUM(frequency) FROM table) 
    * 100 as running_percent    
FROM table base
WHERE category = a
ORDER BY bin ) p
WHERE p.running_percent >= 50.0
LIMIT 1

问题是,如何对所有类别执行此操作以获得结果

category, approx_median
a, 2
b, 1

感谢任何建议。

可以用IN运算符,不知道行不行。试试吧。

SELECT category, bin as approx_median
FROM (
SELECT category, bin, frequency,
    (SELECT SUM(frequency) FROM table sub WHERE sub.bin <= base.bin) 
    / (SELECT SUM(frequency) FROM table) 
    * 100 as running_percent    
FROM table base
WHERE category in (select distinct category from table)
ORDER BY bin ) p
WHERE p.running_percent >= 50.0
LIMIT 1

如果您发布的查询符合您的实际要求,那么只需删除条件 WHERE category = a 并试一试。无论如何,您的 running_percent 计算是基于 bin 列的。您可以按类别进一步订购您的外部查询,以使其看起来不错。

您可能想要做的是这样的事情:

SELECT category, Min(bin) As approx_median
FROM(
    SELECT base.category, 
    base.bin, 
    (SELECT SUM(sub.frequency) AS SummeBin FROM [table] sub WHERE sub.bin <= base.bin and sub.category = base.category)
    / (SELECT SUM(sub.frequency) FROM [table] sub WHERE sub.category = base.category GROUP BY sub.category) * 100 as running_percent 
    FROM [table] base
) p
WHERE running_percent >= 50.0
GROUP BY category

您需要对类别进行分组并在聚合中引用它。 如果您使用 SQL Server 2012 及更高版本,则可以使用 Window 函数。 ABC-Analysis with Window Function.

的示例