SQL 用于查找属于某个类别的一组数据的媒体的查询
SQL query for finding the media of a set of data belonging to a category
我有一个 table 包含不同类别的分箱数据,例如:
category, bin, frequency
a, 0, 10
a, 1, 20
a, 2, 30
a, 3, 15
b, 0, 18
b, 1, 54
b, 2, 33
b, 3, 24
我需要找到每个类别的近似值中位数。为此,我想计算每个类别的累积百分比直方图,并将第一个值设为 50% 以上。我知道如何为一个类别执行此操作:
SELECT category, bin as approx_median
FROM (
SELECT category, bin, frequency,
(SELECT SUM(frequency) FROM table sub WHERE sub.bin <= base.bin)
/ (SELECT SUM(frequency) FROM table)
* 100 as running_percent
FROM table base
WHERE category = a
ORDER BY bin ) p
WHERE p.running_percent >= 50.0
LIMIT 1
问题是,如何对所有类别执行此操作以获得结果
category, approx_median
a, 2
b, 1
感谢任何建议。
可以用IN运算符,不知道行不行。试试吧。
SELECT category, bin as approx_median
FROM (
SELECT category, bin, frequency,
(SELECT SUM(frequency) FROM table sub WHERE sub.bin <= base.bin)
/ (SELECT SUM(frequency) FROM table)
* 100 as running_percent
FROM table base
WHERE category in (select distinct category from table)
ORDER BY bin ) p
WHERE p.running_percent >= 50.0
LIMIT 1
如果您发布的查询符合您的实际要求,那么只需删除条件 WHERE category = a
并试一试。无论如何,您的 running_percent 计算是基于 bin 列的。您可以按类别进一步订购您的外部查询,以使其看起来不错。
您可能想要做的是这样的事情:
SELECT category, Min(bin) As approx_median
FROM(
SELECT base.category,
base.bin,
(SELECT SUM(sub.frequency) AS SummeBin FROM [table] sub WHERE sub.bin <= base.bin and sub.category = base.category)
/ (SELECT SUM(sub.frequency) FROM [table] sub WHERE sub.category = base.category GROUP BY sub.category) * 100 as running_percent
FROM [table] base
) p
WHERE running_percent >= 50.0
GROUP BY category
您需要对类别进行分组并在聚合中引用它。
如果您使用 SQL Server 2012 及更高版本,则可以使用 Window 函数。 ABC-Analysis with Window Function.
的示例
我有一个 table 包含不同类别的分箱数据,例如:
category, bin, frequency
a, 0, 10
a, 1, 20
a, 2, 30
a, 3, 15
b, 0, 18
b, 1, 54
b, 2, 33
b, 3, 24
我需要找到每个类别的近似值中位数。为此,我想计算每个类别的累积百分比直方图,并将第一个值设为 50% 以上。我知道如何为一个类别执行此操作:
SELECT category, bin as approx_median
FROM (
SELECT category, bin, frequency,
(SELECT SUM(frequency) FROM table sub WHERE sub.bin <= base.bin)
/ (SELECT SUM(frequency) FROM table)
* 100 as running_percent
FROM table base
WHERE category = a
ORDER BY bin ) p
WHERE p.running_percent >= 50.0
LIMIT 1
问题是,如何对所有类别执行此操作以获得结果
category, approx_median
a, 2
b, 1
感谢任何建议。
可以用IN运算符,不知道行不行。试试吧。
SELECT category, bin as approx_median
FROM (
SELECT category, bin, frequency,
(SELECT SUM(frequency) FROM table sub WHERE sub.bin <= base.bin)
/ (SELECT SUM(frequency) FROM table)
* 100 as running_percent
FROM table base
WHERE category in (select distinct category from table)
ORDER BY bin ) p
WHERE p.running_percent >= 50.0
LIMIT 1
如果您发布的查询符合您的实际要求,那么只需删除条件 WHERE category = a
并试一试。无论如何,您的 running_percent 计算是基于 bin 列的。您可以按类别进一步订购您的外部查询,以使其看起来不错。
您可能想要做的是这样的事情:
SELECT category, Min(bin) As approx_median
FROM(
SELECT base.category,
base.bin,
(SELECT SUM(sub.frequency) AS SummeBin FROM [table] sub WHERE sub.bin <= base.bin and sub.category = base.category)
/ (SELECT SUM(sub.frequency) FROM [table] sub WHERE sub.category = base.category GROUP BY sub.category) * 100 as running_percent
FROM [table] base
) p
WHERE running_percent >= 50.0
GROUP BY category
您需要对类别进行分组并在聚合中引用它。 如果您使用 SQL Server 2012 及更高版本,则可以使用 Window 函数。 ABC-Analysis with Window Function.
的示例