SAS Proc 中的别名和 Group By 语句 SQL
Aliases and Group By Statements in SAS Proc SQL
我在 SAS 中使用 proc SQL,我的一个 proc sql 查询表现得很奇怪:
我有一个大数据集(大约 100 万行),看起来像这样:
apple_key profit price cost months date
golden_d 0.03 12 4 3 01/12
golden_d 0.03 8 0 2 01/12
granny_s 0.05 15 5 5 02/12
red_d 0.04 13 0 1 01/12
golden_d 0.02 1 2 12 03/14
在这个数据集上,我运行正在执行以下查询:
%let picking_date = 01/12; /* I simplify here - this part of my code definitely works */
proc sql;
CREATE TABLE output AS
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
GROUP BY apple_id, apple_name, cost_flag, age, farm
;
run;
当我 运行 时,我的 GROUP BY
声明不起作用。我得到了一堆条目
对于单个组(其中 apple_id、apple_name、cost_flag、年龄和农场都相同,但我的聚合不起作用)。
但是,当我 运行 GROUP BY 分别(如下)时,一切都很好。我为每个组获得一个条目 "price weighted profit":
proc sql;
CREATE TABLE output_tmp AS
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
;
CREATE TABLE output AS
SELECT
apple_id,
apple_name,
cost_flag,
age,
farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM output_tmp
GROUP BY apple_id, apple_name, cost_flag, age, farm
;
quit;
为什么会这样?我该如何解决?这让我有点发疯...先谢谢你的帮助
它不起作用,因为 group by 没有将 sum(profit*price)/sum(price) 语句作为聚合函数。它不会这样做,因为别名如年龄,cost_flag 等
无论如何下面是正确的查询:-
Proc sql;
CREATE TABLE output AS
SELECT
apple_id,
apple_name,
cost_flag,
age,
farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM
(
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
) a
GROUP BY apple_id, apple_name, cost_flag, age, farm;
quit;
如果您有任何问题,请告诉我
经验法则:- 每当您在 select 子句中使用任何聚合函数时,其余列都应该是分组依据的一部分。在您发布的问题中,您正在申请 sum(profit*price)/sum(price) 但没有导致问题的组。
Proc sql;
CREATE TABLE output AS
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
GROUP BY apple_id, apple_name, cost_flag, age, farm;
quit;
我怀疑发生的事情是remerging。 SAS proc sql 接受这样的代码:
proc sql;
select a.*, count(*)
from a;
这并没有总结数据。相反,它将总计数放在每一行上。换句话说,如果 select
中的键与 group by
中的键不完全匹配,则根据 group by
键计算聚合函数,但将结果放在个人上行。其他数据库使用 window 函数的子集来执行此操作。
在你的情况下,重新合并并不明显。我认为存在关键混淆,因为您在 select
中使用与原始数据中相同的名称。我的建议是更改别名,使它们明确无误,并确保 group by
中的键明确无误。
我在 SAS 中使用 proc SQL,我的一个 proc sql 查询表现得很奇怪:
我有一个大数据集(大约 100 万行),看起来像这样:
apple_key profit price cost months date
golden_d 0.03 12 4 3 01/12
golden_d 0.03 8 0 2 01/12
granny_s 0.05 15 5 5 02/12
red_d 0.04 13 0 1 01/12
golden_d 0.02 1 2 12 03/14
在这个数据集上,我运行正在执行以下查询:
%let picking_date = 01/12; /* I simplify here - this part of my code definitely works */
proc sql;
CREATE TABLE output AS
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
GROUP BY apple_id, apple_name, cost_flag, age, farm
;
run;
当我 运行 时,我的 GROUP BY
声明不起作用。我得到了一堆条目
对于单个组(其中 apple_id、apple_name、cost_flag、年龄和农场都相同,但我的聚合不起作用)。
但是,当我 运行 GROUP BY 分别(如下)时,一切都很好。我为每个组获得一个条目 "price weighted profit":
proc sql;
CREATE TABLE output_tmp AS
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
;
CREATE TABLE output AS
SELECT
apple_id,
apple_name,
cost_flag,
age,
farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM output_tmp
GROUP BY apple_id, apple_name, cost_flag, age, farm
;
quit;
为什么会这样?我该如何解决?这让我有点发疯...先谢谢你的帮助
它不起作用,因为 group by 没有将 sum(profit*price)/sum(price) 语句作为聚合函数。它不会这样做,因为别名如年龄,cost_flag 等
无论如何下面是正确的查询:-
Proc sql;
CREATE TABLE output AS
SELECT
apple_id,
apple_name,
cost_flag,
age,
farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM
(
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
) a
GROUP BY apple_id, apple_name, cost_flag, age, farm;
quit;
如果您有任何问题,请告诉我
经验法则:- 每当您在 select 子句中使用任何聚合函数时,其余列都应该是分组依据的一部分。在您发布的问题中,您正在申请 sum(profit*price)/sum(price) 但没有导致问题的组。
Proc sql;
CREATE TABLE output AS
SELECT
(CASE apple_key
WHEN "golden_d" THEN 1
WHEN "granny_s" THEN 2
WHEN "red_d" THEN 3
END) AS apple_id,
apple_key AS apple_name,
(CASE WHEN cost= 0 THEN 0
ELSE 1
END) AS cost_flag,
(CASE
WHEN CEIL(months / 2) < 5 THEN CEIL(months / 2)
ELSE 5
END) AS age,
"McDonalds" as farm,
sum(profit*price)/sum(price) as price_weighted_profit
FROM input_table
WHERE date = "&picking_date."d
AND price > cost
AND cost >= 0
AND cost >= 0
GROUP BY apple_id, apple_name, cost_flag, age, farm;
quit;
我怀疑发生的事情是remerging。 SAS proc sql 接受这样的代码:
proc sql;
select a.*, count(*)
from a;
这并没有总结数据。相反,它将总计数放在每一行上。换句话说,如果 select
中的键与 group by
中的键不完全匹配,则根据 group by
键计算聚合函数,但将结果放在个人上行。其他数据库使用 window 函数的子集来执行此操作。
在你的情况下,重新合并并不明显。我认为存在关键混淆,因为您在 select
中使用与原始数据中相同的名称。我的建议是更改别名,使它们明确无误,并确保 group by
中的键明确无误。