在 PROC SQL 中提取子集后提取最小值

Extracting Minimum value after subsetting in PROC SQL

我有一个名为 Flights 的示例数据集。我想从中提取 Origin 起飞延误次数最少的机场名称。

示例航班数据:-

Date     (Sched_dep_time) (dep_time)(flight)(origin)  (Dep_delay_min)
01-01-2013  5:15             5:17   1545    EWR       -2
01-01-2013  5:29             5:33   1714    LGA       -4
01-01-2013  5:40             5:42   1141    JFK       -2
01-01-2013  21:10           21:04   725     JFK        6
01-01-2013  20:30           21:04   461     LGA      -74
01-01-2013  21:06           21:05   1696    EWR        1
01-01-2013  20:55           21:10   507     EWR      -55
01-01-2013  20:25           21:14   5708    LGA      -89
01-01-2013  21:10           21:15   79      JFK       -5
01-01-2013  21:24           21:16   301     LGA        8
01-01-2013  6:00             5:58   49      JFK        42
01-01-2013  6:00             5:58   71      JFK        42
01-01-2013  6:00             5:58   194     JFK        42

我试过的代码:-

Proc sql;                                                                                                                               

Create table least_delay as                                                                                                         

Select origin,min(number_of_delays)as min_delay from                                                                                                

(Select Origin,Count(Departure_delay_minutes) as Number_of_delays from 
Flight                                                       
Where (Departure_delay_minutes>0))                                                                                                      

Group by Origin                                                                                                                         
;                                                                                                                                       
Quit;    

我得到的输出如下:-

    Origin  min_delay
1   NLI      1135504
2   JFK      1135504
3   LGA      1135504

它显示所有来源的相同结果!

谁能帮我解决这个问题?

您代码中的具体问题是您需要在子查询中添加一个group by Origin 子句。然而,所有这一切都只是 return 每个 Origin 的延迟次数,而不是延迟最少的 Origin。对代码稍作改动,添加一个 having 子句,即可解决此问题。

data flight;
input Date :ddmmyy10. (Sched_dep_time dep_time) (:time.) flight origin $ Dep_delay_min;
format date date9. Sched_dep_time dep_time time. ;
datalines;
01-01-2013  5:15             5:17   1545    EWR       -2
01-01-2013  5:29             5:33   1714    LGA       -4
01-01-2013  5:40             5:42   1141    JFK       -2
01-01-2013  21:10           21:04   725     JFK        6
01-01-2013  20:30           21:04   461     LGA      -74
01-01-2013  21:06           21:05   1696    EWR        1
01-01-2013  20:55           21:10   507     EWR      -55
01-01-2013  20:25           21:14   5708    LGA      -89
01-01-2013  21:10           21:15   79      JFK       -5
01-01-2013  21:24           21:16   301     LGA        8
01-01-2013  6:00             5:58   49      JFK        42
01-01-2013  6:00             5:58   71      JFK        42
01-01-2013  6:00             5:58   194     JFK        42
;
run;

proc sql;
create table least_delay
as select *
from (
  select
    origin,
    count(0) as num_delays
  from
    flight
  where
    dep_delay_min>0
  group by
    origin
    )
having num_delays = min(num_delays);
quit;