正在用SQL服务器计算满足条件的各种分类百分比
Calculating Percentage of Various Categories that Meet Conditions with SQL Server
我有一个 table,其中包含有关城市间航班的信息,如下所示:
origin_city dest_city time
Dothan AL Atlanta GA 171
Dothan AL Atlanta GA 171
Dothan AL Elsewhere AL 2
Dothan AL Elsewhere AL 2
Dothan AL Elsewhere AL 2
Boston MA New York NY 5
Boston MA City MA 1
New York NY Boston MA 5
New York NY Boston MA 5
New York NY Boston MA 5
New York NY Poughkipsie NY 2
我想针对每个始发城市查找飞行时间少于 3 小时的航班所占的百分比。所以结果会是这样的:
Dothan AL 60
Boston MA 50
New York NY 25
我认为可行的代码如下所示:
SELECT F.origin_city as origin_city,
((SELECT COUNT(*) FROM Flights as F2
WHERE F2.actual_time < 3) / (SELECT COUNT(*) FROM Flights as F3)) * 100
AS percentage
FROM Flights as F
GROUP BY F.origin_city
ORDER BY percentage;
GO
当我 运行 它时,我得到了一个原始城市列表和一个百分比列,正如预期的那样,但百分比始终为 0。我仍然对子查询很困惑(如您所见).
您的百分比是在整个 table 而不是按来源城市分组。尝试这样的事情:
SELECT F.origin_city as origin_city,
(SUM(CASE WHEN F.actual_time < 3 THEN 1 ELSE 0 END) / COUNT(*) ) * 100 AS percentage
FROM Flights as F
GROUP BY F.origin_city
ORDER BY percentage;
GO
FWIW 当前子查询的问题是当前行与子查询中的数据之间没有连接。您可以将其重写为:
SELECT F.origin_city as origin_city,
((SELECT COUNT(*) FROM Flights as F2
WHERE F2.origin_city = F.origin_city and F2.actual_time < 3) / (SELECT COUNT(*) FROM Flights as F3 where F3.origin_city = F.origin_city)) * 100
AS percentage
FROM Flights as F
GROUP BY F.origin_city
ORDER BY percentage;
GO
但是当您已经有足够的数据来进行计算时,就没有必要为每一行重新查询 table,如上所示。
我会使用 AVG()
作为 window 函数:
SELECT F.origin_city as origin_city,
AVG( CASE WHEN F2.actual_time < 3 THEN 100.0 ELSE 0 END) as percentage
FROM Flights F
GROUP BY F.origin_city
ORDER BY percentage;
这假定时间以小时为单位。根据 Google 地图,你可以在 68 小时内从多森步行到亚特兰大,所以 171 是可疑的。
我有一个 table,其中包含有关城市间航班的信息,如下所示:
origin_city dest_city time
Dothan AL Atlanta GA 171
Dothan AL Atlanta GA 171
Dothan AL Elsewhere AL 2
Dothan AL Elsewhere AL 2
Dothan AL Elsewhere AL 2
Boston MA New York NY 5
Boston MA City MA 1
New York NY Boston MA 5
New York NY Boston MA 5
New York NY Boston MA 5
New York NY Poughkipsie NY 2
我想针对每个始发城市查找飞行时间少于 3 小时的航班所占的百分比。所以结果会是这样的:
Dothan AL 60
Boston MA 50
New York NY 25
我认为可行的代码如下所示:
SELECT F.origin_city as origin_city,
((SELECT COUNT(*) FROM Flights as F2
WHERE F2.actual_time < 3) / (SELECT COUNT(*) FROM Flights as F3)) * 100
AS percentage
FROM Flights as F
GROUP BY F.origin_city
ORDER BY percentage;
GO
当我 运行 它时,我得到了一个原始城市列表和一个百分比列,正如预期的那样,但百分比始终为 0。我仍然对子查询很困惑(如您所见).
您的百分比是在整个 table 而不是按来源城市分组。尝试这样的事情:
SELECT F.origin_city as origin_city,
(SUM(CASE WHEN F.actual_time < 3 THEN 1 ELSE 0 END) / COUNT(*) ) * 100 AS percentage
FROM Flights as F
GROUP BY F.origin_city
ORDER BY percentage;
GO
FWIW 当前子查询的问题是当前行与子查询中的数据之间没有连接。您可以将其重写为:
SELECT F.origin_city as origin_city,
((SELECT COUNT(*) FROM Flights as F2
WHERE F2.origin_city = F.origin_city and F2.actual_time < 3) / (SELECT COUNT(*) FROM Flights as F3 where F3.origin_city = F.origin_city)) * 100
AS percentage
FROM Flights as F
GROUP BY F.origin_city
ORDER BY percentage;
GO
但是当您已经有足够的数据来进行计算时,就没有必要为每一行重新查询 table,如上所示。
我会使用 AVG()
作为 window 函数:
SELECT F.origin_city as origin_city,
AVG( CASE WHEN F2.actual_time < 3 THEN 100.0 ELSE 0 END) as percentage
FROM Flights F
GROUP BY F.origin_city
ORDER BY percentage;
这假定时间以小时为单位。根据 Google 地图,你可以在 68 小时内从多森步行到亚特兰大,所以 171 是可疑的。