SAS Proc 中的别名和 Group By 语句 SQL

Aliases and Group By Statements in SAS Proc SQL

我在 SAS 中使用 proc SQL,我的一个 proc sql 查询表现得很奇怪:

我有一个大数据集(大约 100 万行),看起来像这样:

apple_key    profit    price    cost    months    date      
golden_d     0.03      12       4       3         01/12
golden_d     0.03      8        0       2         01/12
granny_s     0.05      15       5       5         02/12
red_d        0.04      13       0       1         01/12
golden_d     0.02      1        2       12         03/14

在这个数据集上,我运行正在执行以下查询:

%let picking_date = 01/12; /* I simplify here - this part of my code definitely works */

proc sql; 
    CREATE TABLE output AS 
    SELECT 
        (CASE apple_key
              WHEN "golden_d" THEN 1
              WHEN "granny_s" THEN 2
              WHEN "red_d"    THEN 3
        END) AS apple_id,
        apple_key AS apple_name,
        (CASE WHEN cost= 0 THEN 0 
            ELSE 1 
        END) AS cost_flag,
        (CASE 
            WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2) 
            ELSE 5
        END) AS age, 
        "McDonalds" as farm, 
        sum(profit*price)/sum(price) as price_weighted_profit
    FROM input_table
    WHERE date = "&picking_date."d
        AND price > cost
        AND cost >= 0
        AND cost >= 0
    GROUP BY apple_id, apple_name, cost_flag, age, farm
    ; 
run; 

当我 运行 时,我的 GROUP BY 声明不起作用。我得到了一堆条目 对于单个组(其中 apple_id、apple_name、cost_flag、年龄和农场都相同,但我的聚合不起作用)。

但是,当我 运行 GROUP BY 分别(如下)时,一切都很好。我为每个组获得一个条目 "price weighted profit":

proc sql; 
    CREATE TABLE output_tmp AS 
    SELECT 
        (CASE apple_key
              WHEN "golden_d" THEN 1
              WHEN "granny_s" THEN 2
              WHEN "red_d"    THEN 3
        END) AS apple_id,
        apple_key AS apple_name,
        (CASE WHEN cost= 0 THEN 0 
            ELSE 1 
        END) AS cost_flag,
        (CASE 
            WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2) 
            ELSE 5
        END) AS age, 
        "McDonalds" as farm
    FROM input_table
    WHERE date = "&picking_date."d
        AND price > cost
        AND cost >= 0
        AND cost >= 0
   ;

    CREATE TABLE output AS
    SELECT 
        apple_id, 
        apple_name, 
        cost_flag, 
        age, 
        farm,
        sum(profit*price)/sum(price) as price_weighted_profit
    FROM output_tmp
    GROUP BY apple_id, apple_name, cost_flag, age, farm
    ;
quit;

为什么会这样?我该如何解决?这让我有点发疯...先谢谢你的帮助

它不起作用,因为 group by 没有将 sum(profit*price)/sum(price) 语句作为聚合函数。它不会这样做,因为别名如年龄,cost_flag 等

无论如何下面是正确的查询:-

 Proc sql;
    CREATE TABLE output AS 
     SELECT 
            apple_id, 
            apple_name, 
            cost_flag, 
            age, 
            farm, 
            sum(profit*price)/sum(price) as price_weighted_profit
        FROM
       (
        SELECT 
            (CASE apple_key
                  WHEN "golden_d" THEN 1
                  WHEN "granny_s" THEN 2
                  WHEN "red_d"    THEN 3
            END) AS apple_id,
            apple_key AS apple_name,
            (CASE WHEN cost= 0 THEN 0 
                ELSE 1 
            END) AS cost_flag,
            (CASE 
                WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2) 
                ELSE 5
            END) AS age, 
            "McDonalds" as farm
        FROM input_table
        WHERE date = "&picking_date."d
            AND price > cost
            AND cost >= 0
            AND cost >= 0

        ) a
        GROUP BY apple_id, apple_name, cost_flag, age, farm;
        quit;

如果您有任何问题,请告诉我

经验法则:- 每当您在 select 子句中使用任何聚合函数时,其余列都应该是分组依据的一部分。在您发布的问题中,您正在申请 sum(profit*price)/sum(price) 但没有导致问题的组。

Proc sql;
    CREATE TABLE output AS 
        SELECT 
            (CASE apple_key
                  WHEN "golden_d" THEN 1
                  WHEN "granny_s" THEN 2
                  WHEN "red_d"    THEN 3
            END) AS apple_id,
            apple_key AS apple_name,
            (CASE WHEN cost= 0 THEN 0 
                ELSE 1 
            END) AS cost_flag,
            (CASE 
                WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2) 
                ELSE 5
            END) AS age, 
            "McDonalds" as farm, 
            sum(profit*price)/sum(price) as price_weighted_profit
        FROM input_table
        WHERE date = "&picking_date."d
            AND price > cost
            AND cost >= 0
            AND cost >= 0    
        GROUP BY apple_id, apple_name, cost_flag, age, farm;
        quit;

我怀疑发生的事情是remerging。 SAS proc sql 接受这样的代码:

proc sql;
    select a.*, count(*)
    from a;

这并没有总结数据。相反,它将总计数放在每一行上。换句话说,如果 select 中的键与 group by 中的键不完全匹配,则根据 group by 键计算聚合函数,但将结果放在个人上行。其他数据库使用 window 函数的子集来执行此操作。

在你的情况下,重新合并并不明显。我认为存在关键混淆,因为您在 select 中使用与原始数据中相同的名称。我的建议是更改别名,使它们明确无误,并确保 group by 中的键明确无误。