在 PROC SQL 中提取子集后提取最小值
Extracting Minimum value after subsetting in PROC SQL
我有一个名为 Flights 的示例数据集。我想从中提取 Origin
起飞延误次数最少的机场名称。
示例航班数据:-
Date (Sched_dep_time) (dep_time)(flight)(origin) (Dep_delay_min)
01-01-2013 5:15 5:17 1545 EWR -2
01-01-2013 5:29 5:33 1714 LGA -4
01-01-2013 5:40 5:42 1141 JFK -2
01-01-2013 21:10 21:04 725 JFK 6
01-01-2013 20:30 21:04 461 LGA -74
01-01-2013 21:06 21:05 1696 EWR 1
01-01-2013 20:55 21:10 507 EWR -55
01-01-2013 20:25 21:14 5708 LGA -89
01-01-2013 21:10 21:15 79 JFK -5
01-01-2013 21:24 21:16 301 LGA 8
01-01-2013 6:00 5:58 49 JFK 42
01-01-2013 6:00 5:58 71 JFK 42
01-01-2013 6:00 5:58 194 JFK 42
我试过的代码:-
Proc sql;
Create table least_delay as
Select origin,min(number_of_delays)as min_delay from
(Select Origin,Count(Departure_delay_minutes) as Number_of_delays from
Flight
Where (Departure_delay_minutes>0))
Group by Origin
;
Quit;
我得到的输出如下:-
Origin min_delay
1 NLI 1135504
2 JFK 1135504
3 LGA 1135504
它显示所有来源的相同结果!
谁能帮我解决这个问题?
您代码中的具体问题是您需要在子查询中添加一个group by Origin
子句。然而,所有这一切都只是 return 每个 Origin 的延迟次数,而不是延迟最少的 Origin。对代码稍作改动,添加一个 having
子句,即可解决此问题。
data flight;
input Date :ddmmyy10. (Sched_dep_time dep_time) (:time.) flight origin $ Dep_delay_min;
format date date9. Sched_dep_time dep_time time. ;
datalines;
01-01-2013 5:15 5:17 1545 EWR -2
01-01-2013 5:29 5:33 1714 LGA -4
01-01-2013 5:40 5:42 1141 JFK -2
01-01-2013 21:10 21:04 725 JFK 6
01-01-2013 20:30 21:04 461 LGA -74
01-01-2013 21:06 21:05 1696 EWR 1
01-01-2013 20:55 21:10 507 EWR -55
01-01-2013 20:25 21:14 5708 LGA -89
01-01-2013 21:10 21:15 79 JFK -5
01-01-2013 21:24 21:16 301 LGA 8
01-01-2013 6:00 5:58 49 JFK 42
01-01-2013 6:00 5:58 71 JFK 42
01-01-2013 6:00 5:58 194 JFK 42
;
run;
proc sql;
create table least_delay
as select *
from (
select
origin,
count(0) as num_delays
from
flight
where
dep_delay_min>0
group by
origin
)
having num_delays = min(num_delays);
quit;
我有一个名为 Flights 的示例数据集。我想从中提取 Origin 起飞延误次数最少的机场名称。
示例航班数据:-
Date (Sched_dep_time) (dep_time)(flight)(origin) (Dep_delay_min)
01-01-2013 5:15 5:17 1545 EWR -2
01-01-2013 5:29 5:33 1714 LGA -4
01-01-2013 5:40 5:42 1141 JFK -2
01-01-2013 21:10 21:04 725 JFK 6
01-01-2013 20:30 21:04 461 LGA -74
01-01-2013 21:06 21:05 1696 EWR 1
01-01-2013 20:55 21:10 507 EWR -55
01-01-2013 20:25 21:14 5708 LGA -89
01-01-2013 21:10 21:15 79 JFK -5
01-01-2013 21:24 21:16 301 LGA 8
01-01-2013 6:00 5:58 49 JFK 42
01-01-2013 6:00 5:58 71 JFK 42
01-01-2013 6:00 5:58 194 JFK 42
我试过的代码:-
Proc sql;
Create table least_delay as
Select origin,min(number_of_delays)as min_delay from
(Select Origin,Count(Departure_delay_minutes) as Number_of_delays from
Flight
Where (Departure_delay_minutes>0))
Group by Origin
;
Quit;
我得到的输出如下:-
Origin min_delay
1 NLI 1135504
2 JFK 1135504
3 LGA 1135504
它显示所有来源的相同结果!
谁能帮我解决这个问题?
您代码中的具体问题是您需要在子查询中添加一个group by Origin
子句。然而,所有这一切都只是 return 每个 Origin 的延迟次数,而不是延迟最少的 Origin。对代码稍作改动,添加一个 having
子句,即可解决此问题。
data flight;
input Date :ddmmyy10. (Sched_dep_time dep_time) (:time.) flight origin $ Dep_delay_min;
format date date9. Sched_dep_time dep_time time. ;
datalines;
01-01-2013 5:15 5:17 1545 EWR -2
01-01-2013 5:29 5:33 1714 LGA -4
01-01-2013 5:40 5:42 1141 JFK -2
01-01-2013 21:10 21:04 725 JFK 6
01-01-2013 20:30 21:04 461 LGA -74
01-01-2013 21:06 21:05 1696 EWR 1
01-01-2013 20:55 21:10 507 EWR -55
01-01-2013 20:25 21:14 5708 LGA -89
01-01-2013 21:10 21:15 79 JFK -5
01-01-2013 21:24 21:16 301 LGA 8
01-01-2013 6:00 5:58 49 JFK 42
01-01-2013 6:00 5:58 71 JFK 42
01-01-2013 6:00 5:58 194 JFK 42
;
run;
proc sql;
create table least_delay
as select *
from (
select
origin,
count(0) as num_delays
from
flight
where
dep_delay_min>0
group by
origin
)
having num_delays = min(num_delays);
quit;