SAS proc 在 Redshift 中是等价的

SAS proc means equivalent in Redshift

我想将 SAS 代码转换为 redshift 代码。 我们在 SaS 中有 proc 表示过程,

proc means data=table0 noprint, nway missing;
id a ,b
class  x
var qty
output out=table1 sum = ;

示例: 输入

所需的输出:

我想要 redshift 中的等效代码。通常的分组方式在这里不起作用。

来自SAS documentation

When you specify only one variable in the ID statement, the value of the ID variable for a given observation is the maximum (minimum) value found in the corresponding group of observations in the input data set. When you specify multiple variables in the ID statement, PROC MEANS selects the maximum value by processing the variables in the ID statement in the order in which you list them. PROC MEANS determines which observation to use from all the ID variables by comparing the values of the first ID variable. If more than one observation contains the same maximum (minimum) ID value, then PROC MEANS uses the second and subsequent ID variable values as “tiebreakers.” In any case, all ID values are taken from the same observation for any given BY group or classification level within a type.

因此选择 ID 值,第一个变量的最大值 (a),然后如果该最大值有多个 b 值,它会选择其中的最大值。

一种方法是先执行 group by,忽略 id 变量,然后分别确定 id 变量值,最后将它们连接在一起。

像这样...

data test;
  input x a b;
  var=0;
  datalines;
1 1 1
1 1 2
1 1 3
1 2 1
1 2 2
2 1 1
2 1 2
2 1 3
2 2 1
2 2 2
2 2 3
2 3 1
;;;;
run;

proc means data=test nway;
  id a b;
  class x;
  var var;
  output out=test_out sum=;
run;


proc sql;
  select m_a.x, m_a.a, m_b.b from (
  select x, max(a)  as a
  from test
  group by x ) m_a
  left join 
  ( select x, a, max(b) as b
    from test
    group by x, a
  ) m_b
  on m_a.x=m_b.x and m_a.a=m_b.a
  ;
quit;

  

然后您将其连接回正常的 'group by' 数据集,只是按 class 变量分组。