SAS proc 在 Redshift 中是等价的
SAS proc means equivalent in Redshift
我想将 SAS 代码转换为 redshift 代码。
我们在 SaS 中有 proc 表示过程,
proc means data=table0 noprint, nway missing;
id a ,b
class x
var qty
output out=table1 sum = ;
示例:
输入
所需的输出:
我想要 redshift 中的等效代码。通常的分组方式在这里不起作用。
When you specify only one variable in the ID statement, the value of
the ID variable for a given observation is the maximum (minimum) value
found in the corresponding group of observations in the input data
set. When you specify multiple variables in the ID statement, PROC
MEANS selects the maximum value by processing the variables in the ID
statement in the order in which you list them. PROC MEANS determines
which observation to use from all the ID variables by comparing the
values of the first ID variable. If more than one observation contains
the same maximum (minimum) ID value, then PROC MEANS uses the second
and subsequent ID variable values as “tiebreakers.” In any case, all
ID values are taken from the same observation for any given BY group
or classification level within a type.
因此选择 ID 值,第一个变量的最大值 (a
),然后如果该最大值有多个 b
值,它会选择其中的最大值。
一种方法是先执行 group by
,忽略 id
变量,然后分别确定 id
变量值,最后将它们连接在一起。
像这样...
data test;
input x a b;
var=0;
datalines;
1 1 1
1 1 2
1 1 3
1 2 1
1 2 2
2 1 1
2 1 2
2 1 3
2 2 1
2 2 2
2 2 3
2 3 1
;;;;
run;
proc means data=test nway;
id a b;
class x;
var var;
output out=test_out sum=;
run;
proc sql;
select m_a.x, m_a.a, m_b.b from (
select x, max(a) as a
from test
group by x ) m_a
left join
( select x, a, max(b) as b
from test
group by x, a
) m_b
on m_a.x=m_b.x and m_a.a=m_b.a
;
quit;
然后您将其连接回正常的 'group by' 数据集,只是按 class 变量分组。
我想将 SAS 代码转换为 redshift 代码。 我们在 SaS 中有 proc 表示过程,
proc means data=table0 noprint, nway missing;
id a ,b
class x
var qty
output out=table1 sum = ;
示例:
输入
所需的输出:
我想要 redshift 中的等效代码。通常的分组方式在这里不起作用。
When you specify only one variable in the ID statement, the value of the ID variable for a given observation is the maximum (minimum) value found in the corresponding group of observations in the input data set. When you specify multiple variables in the ID statement, PROC MEANS selects the maximum value by processing the variables in the ID statement in the order in which you list them. PROC MEANS determines which observation to use from all the ID variables by comparing the values of the first ID variable. If more than one observation contains the same maximum (minimum) ID value, then PROC MEANS uses the second and subsequent ID variable values as “tiebreakers.” In any case, all ID values are taken from the same observation for any given BY group or classification level within a type.
因此选择 ID 值,第一个变量的最大值 (a
),然后如果该最大值有多个 b
值,它会选择其中的最大值。
一种方法是先执行 group by
,忽略 id
变量,然后分别确定 id
变量值,最后将它们连接在一起。
像这样...
data test;
input x a b;
var=0;
datalines;
1 1 1
1 1 2
1 1 3
1 2 1
1 2 2
2 1 1
2 1 2
2 1 3
2 2 1
2 2 2
2 2 3
2 3 1
;;;;
run;
proc means data=test nway;
id a b;
class x;
var var;
output out=test_out sum=;
run;
proc sql;
select m_a.x, m_a.a, m_b.b from (
select x, max(a) as a
from test
group by x ) m_a
left join
( select x, a, max(b) as b
from test
group by x, a
) m_b
on m_a.x=m_b.x and m_a.a=m_b.a
;
quit;
然后您将其连接回正常的 'group by' 数据集,只是按 class 变量分组。