基于多个事件导出属性

Derive attributes based on multiple events

我有数据想要转置,以便在任何时间点可视化单个 ID 的状态。

我一直在尝试遵循来自 的@Joe 的回答,但我在处理多种模式属性的情况时遇到了困难。

这是我拥有的基于事件的数据:

data have;
infile datalines delimiter="|";
input attrib :. multiple_attr :. id :. attrib_id :8. member_value :0. type :. dt_event :datetime18.;
format dt_event datetime20.;
datalines;
TYPE|N|ABC123|111|MEDIUM|Start|01DEC2014:00:00:00
TYPE|N|ABC123|111|MEDIUM|End|18APR2021:00:00:00
TYPE|N|ABC123|111|BIG|Start|19APR2021:00:00:00
TYPE|N|ABC123|111|BIG|End|31DEC2030:00:00:00
POSITION|N|ABC123|222|TOP|Start|01DEC2014:00:00:00
POSITION|N|ABC123|222|TOP|End|31DEC2030:00:00:00
IS_ACTIVE|N|ABC123|333|YES|Start|01DEC2014:00:00:00
IS_ACTIVE|N|ABC123|333|YES|End|31DEC2030:00:00:00
LEVELS|Y|ABC123|1|ALONE|Start|01DEC2014:00:00:00
LEVELS|Y|ABC123|1|BOTH|Start|01DEC2014:00:00:00
LEVELS|Y|ABC123|1|BOTH|End|18APR2021:00:00:00
LEVELS|Y|ABC123|1|ALONE|End|31DEC2030:00:00:00
TYPE|N|DEF456|111|MEDIUM|Start|01DEC2014:00:00:00
TYPE|N|DEF456|111|MEDIUM|End|31DEC2030:00:00:00
POSITION|N|DEF456|222|MID|Start|01DEC2014:00:00:00
POSITION|N|DEF456|222|MID|End|31DEC2030:00:00:00
IS_ACTIVE|N|DEF456|333|YES|Start|01MAR2014:00:00:00
IS_ACTIVE|N|DEF456|333|YES|End|31DEC2030:00:00:00
LEVELS|Y|DEF456|1|ALONE|Start|01MAR2014:00:00:00
LEVELS|Y|DEF456|1|BOTH|Start|01MAR2014:00:00:00
LEVELS|Y|DEF456|1|BOTH|End|31MAR2018:00:00:00
LEVELS|Y|DEF456|1|BOTH|Start|20AUG2018:00:00:00
LEVELS|Y|DEF456|1|ALONE|End|31DEC2030:00:00:00
LEVELS|Y|DEF456|1|BOTH|End|31DEC2030:00:00:00
;

使用@Joe 的方法:

proc sort data=have;
    by id attrib_id dt_event member_value;
run;

data want;
  set have(rename=member_value=in_value);
  by id attrib_id dt_event;
  retain start_date end_date member_value orig_value;
  format member_value new_value 0.;

  * First row per attrib_id is easy, just start it off with a START;
  if first.attrib_id then do;
    start_date = dt_event;
    member_value = in_value;
  end;     
  else do; *Now is the harder part;
    * For ENDs, we want to remove the current member_value from the concatenated value string, always, and then if it is the last row for that dt_event, we want to output a new record;
    if type='End' then do;
    
        *remove the current (in_)value;
        if first.dt_event then orig_value = member_value;
        do _i = 1 to countw(member_value,';');
            if scan(orig_value,_i,';') ne in_value then do;
                if orig_value > scan(orig_value,_i,';') then new_value = catx('; ',scan(orig_value,_i,';'),new_value);
                else new_value = catx('; ',new_value,scan(orig_value,_i,';'));
            end;
        end;
        orig_value = new_value;
 
        if last.dt_event then do;
            end_date = dt_event;
            output;
            start_date = dt_event + 86400;
            member_value = new_value;
            orig_value = ' ';
        end;
    end;
    else do;
        * For START, we want to be more careful about outputting, as this will output lots of unwanted rows if we do not take care;
        end_date = dt_event - 86400;
        if start_date < end_date and not missing(member_value) then output;
        if member_value > in_value then member_value = catx('; ',in_value,member_value);
        else member_value = catx('; ',member_value,in_value);
        start_date = dt_event;
        end_date = .;
    end;
  end;

  format start_date end_date datetime20.;
  keep id multiple_attr attrib_id member_value start_date end_date;
run;

我最终得到:


+---------------+--------+-----------+--------------------+--------------------+-------------------+
| multiple_attr |   id   | attrib_id |     start_date     |      end_date      |   member_value    |
+---------------+--------+-----------+--------------------+--------------------+-------------------+
| Y             | ABC123 |         1 | 01DEC2014:00:00:00 | 18APR2021:00:00:00 | ALONE; BOTH       |
| Y             | ABC123 |         1 | 19APR2021:00:00:00 | 31DEC2030:00:00:00 | BOTH; ALONE       |
| N             | ABC123 |       111 | 01DEC2014:00:00:00 | 18APR2021:00:00:00 | MEDIUM            |
| N             | ABC123 |       111 | 19APR2021:00:00:00 | 31DEC2030:00:00:00 | BIG               |
| N             | ABC123 |       222 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | TOP               |
| N             | ABC123 |       333 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | YES               |
| Y             | DEF456 |         1 | 01MAR2014:00:00:00 | 31MAR2018:00:00:00 | ALONE; BOTH       |
| Y             | DEF456 |         1 | 01APR2018:00:00:00 | 19AUG2018:00:00:00 | BOTH; ALONE       |
| Y             | DEF456 |         1 | 20AUG2018:00:00:00 | 31DEC2030:00:00:00 | BOTH; BOTH; ALONE |
| N             | DEF456 |       111 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | MEDIUM            |
| N             | DEF456 |       222 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | MID               |
| N             | DEF456 |       333 | 01MAR2014:00:00:00 | 31DEC2030:00:00:00 | YES               |
+---------------+--------+-----------+--------------------+--------------------+-------------------+

您可以看到多个模态属性 (where multiple_attr = "Y") 没有正确处理。

期望的输出应该是这样的:


+---------------+--------+-----------+--------------------+--------------------+--------------+
| multiple_attr |   id   | attrib_id |     start_date     |      end_date      | member_value |
+---------------+--------+-----------+--------------------+--------------------+--------------+
| Y             | ABC123 |         1 | 01DEC2014:00:00:00 | 18APR2021:00:00:00 | ALONE; BOTH  |
| Y             | ABC123 |         1 | 19APR2021:00:00:00 | 31DEC2030:00:00:00 | ALONE        |
| N             | ABC123 |       111 | 01DEC2014:00:00:00 | 18APR2021:00:00:00 | MEDIUM       |
| N             | ABC123 |       111 | 19APR2021:00:00:00 | 31DEC2030:00:00:00 | BIG          |
| N             | ABC123 |       222 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | TOP          |
| N             | ABC123 |       333 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | YES          |
| Y             | DEF456 |         1 | 01MAR2014:00:00:00 | 31MAR2018:00:00:00 | ALONE; BOTH  |
| Y             | DEF456 |         1 | 01APR2018:00:00:00 | 19AUG2018:00:00:00 | ALONE        |
| Y             | DEF456 |         1 | 20AUG2018:00:00:00 | 31DEC2030:00:00:00 | ALONE; BOTH  |
| N             | DEF456 |       111 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | MEDIUM       |
| N             | DEF456 |       222 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | MID          |
| N             | DEF456 |       333 | 01MAR2014:00:00:00 | 31DEC2030:00:00:00 | YES          |
+---------------+--------+-----------+--------------------+--------------------+--------------+

有没有办法处理多种模态属性?一旦该属性的模式结束(即在结束后从 ALONE; BOTH 切换到 ALONE ),我找不到 delete 成员值的方法。

不是 100% 确定我理解全部,但我认为至少这是一个问题。

查看删除值的位置,由于空格,您需要使用 strip 或类似名称。我删除了 catx() 中的空格并在此处添加 strip()

        if strip(scan(orig_value,_i,';')) ne strip(in_value) then do;
            if strip(orig_value) > strip(scan(orig_value,_i,';')) then new_value = catx(';',scan(orig_value,_i,';'),new_value);
            else new_value = catx(';',new_value,scan(orig_value,_i,';'));
        end;

否则它会将带空格的词与不带空格的词进行比较,虽然在某些情况下这些词是相同的(或被 SAS 视为相同),但在某些情况下它们不是,这会导致您在此处出现一些问题.例如,当我 运行 这个时,我在第二行得到“Alone”。