如何 select 列名的所有组合并在 SAS 中对这些列进行计算

how to select all combination of column names and do calculation on those columns in SAS

我有一个数据集,其中包含 81 列个人 ID、79 个二进制变量和一个成本变量:

id h1 h2 h3 ... h79  cost
1  1  0  1      1     15
2  1  1  1      1     80
3  0  1  1      0     10
...

每个人id有一行记录。现在我想选择两个 h(binary) 变量中的哪一个具有超过 50 个唯一的人 ID。如果那么计算他们的总成本。 我想解决它的一个好方法是创建一个包含所有 h 变量的数组并使用两个 DO LOOPS?但是,如果我想查看一组三个或四个或五个变量怎么办?还有我要如何存储变量名称的组合,这样我才能知道这个变量组合有这个总成本。所以我认为最终输出将如下所示:

combinations              total cost
h1&h3                       95
h2&h3                       90
h1&h2&h3.                   80

感谢您的帮助!

听起来您只想使用 PROC SUMMARY。

data have ;
input id h1 h2 h3 h79 cost ;
cards;
1  1  0  1      1     15
2  1  1  1      1     80
3  0  1  1      0     10
;

proc summary data=have chartype ;
   class h1-h3 ;
   var cost ;
   output out=cost_summary sum= ;
run;

但您只对所有贡献 class 变量的值为 1 的结果感兴趣。

proc print data=cost_summary ;
  where min(h1,h2,h3) = 1 ;
run;

结果:

Obs    h1    h2    h3    _TYPE_    _FREQ_    cost

  2     .     .     1     001         3       105
  4     .     1     .     010         2        90
  6     .     1     1     011         2        90
  8     1     .     .     100         2        95
 10     1     .     1     101         2        95
 13     1     1     .     110         1        80
 16     1     1     1     111         1        80

DATA 步可以使用 ALLCOMBALLCOMBI 例程迭代大小为 n 的数组的 k-subset 组合。哈希可用于累积每个特定 k-subset 断言所有真实条件的计数和总成本。

options mprint;

data have (keep=id flag: cost);
  do id = 1 to 3;
    array flag(79) flag01-flag79;
    do i = 1 to dim(flag);
      flag(i) = ranuni(1) < 0.5;
    end;
    cost = ceil(10+100*ranuni(123));
    output;
  end;
run;

示例

data _null_;
  if 0 then set have;* prep pdv;

  array x flag:;
  n = dim(x);

  k = 2; ways2 = comb(dim(x),k); put 'NOTE: ' n= k= ways2=;
  k = 3; ways3 = comb(dim(x),k); put 'NOTE: ' n= k= ways3=;
  k = 4; ways4 = comb(dim(x),k); put 'NOTE: ' n= k= ways4=;
  k = 5; ways5 = comb(dim(x),k); put 'NOTE: ' n= k= ways5=;

  array var(5) ;
  length count cost_sum 8;

  declare hash all_true(hashexp:15, ordered:'A');
  all_true.defineKey('var1', 'var2', 'var3', 'var4', 'var5');
  all_true.defineData('var1', 'var2', 'var3', 'var4', 'var5', 'count', 'cost_sum');
  all_true.defineDone();

  do until (end);
    set have end=end;
    array f flag:;

    %macro track_all_true(K=);

      array index&K._[&K];
      call missing (of index&K._[*]);  %* reset search tracking variables;
      call missing (of var[*]);        %* reset search tracking variables;

      %* search all combinations for those that are all true;
      do p = 1 to comb(n,&K);

        call allcombi(n, &K, of index&K._[*], add, remove);

        %* check each item in the combination;
        do q = 1 to &K while(x[index&K._[q]]);
        end;

        if q > &K then do; %* each item was true;
          do q = 1 to &K;
            which_index = index&K._[q];
            which_var = vname( x[which_index] );
            var(q) = which_var;
          end;

          if all_true.find() ne 0 then do; %* track first occurrence of the combination;
            cost_sum = cost;
            count = 1;
            all_true.add();
          end;
          else do; %* accumulate count and cost information for the combination;
            cost_sum + cost;
            count + 1;
            all_true.replace();
          end;
        end;

      end;

    %mend;

    %track_all_true(K=2)
    %track_all_true(K=3)

    %track_all_true(K=4)
    %track_all_true(K=5)

  end;

  all_true.output(dataset:'count_cost');
  stop;
run;