SAS 中的状态持续时间
Duration of state in SAS
我有一个关于SAS和变量特定状态持续时间分析的问题。我想知道我的数据集中的每个人在状态 b 发生之前持续保持状态 a 的时间。如果状态 c 在状态 a 之后发生 ,则持续时间应设置为零。请注意,如果 pre_period 处于状态 a,我也会将持续时间设置为零,但如果之后我得到另一个状态 a,则应该计算在内。
数据看起来有点像这样:
pre_period week1 week2 week3 week4 week5 week6 week7 ...
id1 b b a a a b c c ...
id2 a a a a b a b b ...
id3 b b a a b a a b ...
id4 c c c a a a a a ...
id5 a b a b b a a b ...
id6 b a a a a a a a ...
sas代码中的样本集:
data work.sample_data;
input id $ pre_period $ (week1-week7) ($);
datalines;
id1 b b a a a b c c
id2 a a a a b a b b
id3 b b a a b a a b
id4 c c c a a a a a
id5 a b a b b a a b
id6 b a a a a a a a
;
所以对于 id1,应该给我持续时间 3,对于 id2 1,对于 id3 3 和 1,对于 id4 5 对于 id5 1 和 2 以及对于 id6 7。
这样输出应该看起来像这样:
dur1 dur2 dur3 dur4 ...
id1 3 . . . ...
id2 1 . . . ...
id3 3 1 . . ...
id4 5 . . . ...
id5 1 2 . . ...
id6 7 . . . ...
我是SAS初学者,没有找到解决这个问题的方法。请注意,数据集包含几千行和大约一千列,因此对于一个人来说,我可能有几个我都想捕获的状态间隔(因此输出中有几个持续时间变量)。
我很感激任何建议。谢谢!
在这些情况下,明智的做法是从有限状态机的角度进行思考。这样,以后如果您的需求发生变化,很容易扩展状态机。
持续时间在三种情况下有效(包括从您的结果集中给出的隐式):
- 状态
a
的连续持续时间应计算在内,如果
- 它以状态
b
、 结束
- 数据集结束的时候还在状态
a
,
- 并且只要不在经期前状态为
a
的第一周开始。
首先,我们要注意前期需求,我们可以称这个状态为pre_period_locked_state
:
do week = 1 to last_week;
if current_state = pre_period_locked_state then do;
if 'a' not = pre_period or 'a' not = week_state then do;
current_state = duration_state;
end;
接下来disect就是state不是a
的时候,这里叫no_duration_state
:
if current_state = no_duration_state then do;
if 'a' = week_state then do;
current_state = duration_state;
end;
end;
这是我们的空闲状态,只有在新的持续时间开始时才会改变。下一个状态被命名为 duration_state
并定义为:
if current_state = duration_state then do;
if 'a' = week_state then do;
duration_count = duration_count + 1;
end;
if ('a' not = week_state or week = last_week) and 0 < duration_count then do;
current_state = dispatch_state;
end;
end;
第一部分可能是自我声明,持续时间计数器。第二部分负责持续时间何时结束。
现在进入dispatch_state
:
if current_state = dispatch_state then do;
if 'b' = week_state or 'a' = week_state and week = last_week then do;
duration{duration_index} = duration_count;
duration_index = duration_index + 1;
end;
duration_count = 0;
current_state = no_duration_state;
end;
这会处理输出的索引 table 并且还将确保只存储有效的持续时间。
我在下面添加了 id7
,因为示例数据没有任何以 b 以外的状态结束的持续时间。
data work.sample_data;
input id $ pre_period $ (week1-week7) ($);
datalines;
id1 b b a a a b c c
id2 a a a a b a b b
id3 b b a a b a a b
id4 c c c a a a a a
id5 a b a b b a a b
id6 b a a a a a a a
id7 b a a c a a a a
;
完整的 sas 代码状态机:
data work.duration_fsm;
set work.sample_data;
array weeks{*} week1-week7;
array duration{*} dur1-dur7;
*states;
initial_reset_state = 'initial_reset_state';
pre_period_locked_state = 'pre_period_locked_state';
duration_state = 'duration_state';
no_duration_state = 'no_duration_state';
dispatch_state = 'dispatch_state';
length current_state $ 50;
*initial values;
current_state = initial_reset_state;
last_week = dim(weeks);
keep id dur1-dur7;
do week = 1 to last_week;
if current_state = initial_reset_state then do;
duration_count = 0;
duration_index = 1;
current_state = pre_period_locked_state;
end;
week_state = weeks{week};
if current_state = pre_period_locked_state then do;
if 'a' not = pre_period and 'a' = week_state then do;
current_state = duration_state;
end;
else if 'a' = pre_period and 'a' not = week_state then do;
current_state = no_duration_state;
end;
end;
if current_state = no_duration_state then do;
if 'a' = week_state then do;
current_state = duration_state;
end;
end;
if current_state = duration_state then do;
if 'a' = week_state then do;
duration_count = duration_count + 1;
end;
if ('a' not = week_state or week = last_week) and 0 < duration_count then do;
current_state = dispatch_state;
end;
end;
if current_state = dispatch_state then do;
if 'b' = week_state or 'a' = week_state and week = last_week then do;
duration{duration_index} = duration_count;
duration_index = duration_index + 1;
end;
duration_count = 0;
current_state = no_duration_state;
end;
end;
run;
这将输出 work.duration_fsm
:
+-----+------+------+------+------+------+------+------+
| id | dur1 | dur2 | dur3 | dur4 | dur5 | dur6 | dur7 |
+-----+------+------+------+------+------+------+------+
| id1 | 3 | | | | | | |
| id2 | 1 | | | | | | |
| id3 | 2 | 2 | | | | | |
| id4 | 5 | | | | | | |
| id5 | 1 | 2 | | | | | |
| id6 | 7 | | | | | | |
| id7 | 4 | | | | | | |
+-----+------+------+------+------+------+------+------+
我有一个关于SAS和变量特定状态持续时间分析的问题。我想知道我的数据集中的每个人在状态 b 发生之前持续保持状态 a 的时间。如果状态 c 在状态 a 之后发生 ,则持续时间应设置为零。请注意,如果 pre_period 处于状态 a,我也会将持续时间设置为零,但如果之后我得到另一个状态 a,则应该计算在内。
数据看起来有点像这样:
pre_period week1 week2 week3 week4 week5 week6 week7 ...
id1 b b a a a b c c ...
id2 a a a a b a b b ...
id3 b b a a b a a b ...
id4 c c c a a a a a ...
id5 a b a b b a a b ...
id6 b a a a a a a a ...
sas代码中的样本集:
data work.sample_data;
input id $ pre_period $ (week1-week7) ($);
datalines;
id1 b b a a a b c c
id2 a a a a b a b b
id3 b b a a b a a b
id4 c c c a a a a a
id5 a b a b b a a b
id6 b a a a a a a a
;
所以对于 id1,应该给我持续时间 3,对于 id2 1,对于 id3 3 和 1,对于 id4 5 对于 id5 1 和 2 以及对于 id6 7。
这样输出应该看起来像这样:
dur1 dur2 dur3 dur4 ...
id1 3 . . . ...
id2 1 . . . ...
id3 3 1 . . ...
id4 5 . . . ...
id5 1 2 . . ...
id6 7 . . . ...
我是SAS初学者,没有找到解决这个问题的方法。请注意,数据集包含几千行和大约一千列,因此对于一个人来说,我可能有几个我都想捕获的状态间隔(因此输出中有几个持续时间变量)。
我很感激任何建议。谢谢!
在这些情况下,明智的做法是从有限状态机的角度进行思考。这样,以后如果您的需求发生变化,很容易扩展状态机。
持续时间在三种情况下有效(包括从您的结果集中给出的隐式):
- 状态
a
的连续持续时间应计算在内,如果- 它以状态
b
、 结束
- 数据集结束的时候还在状态
a
, - 并且只要不在经期前状态为
a
的第一周开始。
- 它以状态
首先,我们要注意前期需求,我们可以称这个状态为pre_period_locked_state
:
do week = 1 to last_week;
if current_state = pre_period_locked_state then do;
if 'a' not = pre_period or 'a' not = week_state then do;
current_state = duration_state;
end;
接下来disect就是state不是a
的时候,这里叫no_duration_state
:
if current_state = no_duration_state then do;
if 'a' = week_state then do;
current_state = duration_state;
end;
end;
这是我们的空闲状态,只有在新的持续时间开始时才会改变。下一个状态被命名为 duration_state
并定义为:
if current_state = duration_state then do;
if 'a' = week_state then do;
duration_count = duration_count + 1;
end;
if ('a' not = week_state or week = last_week) and 0 < duration_count then do;
current_state = dispatch_state;
end;
end;
第一部分可能是自我声明,持续时间计数器。第二部分负责持续时间何时结束。
现在进入dispatch_state
:
if current_state = dispatch_state then do;
if 'b' = week_state or 'a' = week_state and week = last_week then do;
duration{duration_index} = duration_count;
duration_index = duration_index + 1;
end;
duration_count = 0;
current_state = no_duration_state;
end;
这会处理输出的索引 table 并且还将确保只存储有效的持续时间。
我在下面添加了 id7
,因为示例数据没有任何以 b 以外的状态结束的持续时间。
data work.sample_data;
input id $ pre_period $ (week1-week7) ($);
datalines;
id1 b b a a a b c c
id2 a a a a b a b b
id3 b b a a b a a b
id4 c c c a a a a a
id5 a b a b b a a b
id6 b a a a a a a a
id7 b a a c a a a a
;
完整的 sas 代码状态机:
data work.duration_fsm;
set work.sample_data;
array weeks{*} week1-week7;
array duration{*} dur1-dur7;
*states;
initial_reset_state = 'initial_reset_state';
pre_period_locked_state = 'pre_period_locked_state';
duration_state = 'duration_state';
no_duration_state = 'no_duration_state';
dispatch_state = 'dispatch_state';
length current_state $ 50;
*initial values;
current_state = initial_reset_state;
last_week = dim(weeks);
keep id dur1-dur7;
do week = 1 to last_week;
if current_state = initial_reset_state then do;
duration_count = 0;
duration_index = 1;
current_state = pre_period_locked_state;
end;
week_state = weeks{week};
if current_state = pre_period_locked_state then do;
if 'a' not = pre_period and 'a' = week_state then do;
current_state = duration_state;
end;
else if 'a' = pre_period and 'a' not = week_state then do;
current_state = no_duration_state;
end;
end;
if current_state = no_duration_state then do;
if 'a' = week_state then do;
current_state = duration_state;
end;
end;
if current_state = duration_state then do;
if 'a' = week_state then do;
duration_count = duration_count + 1;
end;
if ('a' not = week_state or week = last_week) and 0 < duration_count then do;
current_state = dispatch_state;
end;
end;
if current_state = dispatch_state then do;
if 'b' = week_state or 'a' = week_state and week = last_week then do;
duration{duration_index} = duration_count;
duration_index = duration_index + 1;
end;
duration_count = 0;
current_state = no_duration_state;
end;
end;
run;
这将输出 work.duration_fsm
:
+-----+------+------+------+------+------+------+------+
| id | dur1 | dur2 | dur3 | dur4 | dur5 | dur6 | dur7 |
+-----+------+------+------+------+------+------+------+
| id1 | 3 | | | | | | |
| id2 | 1 | | | | | | |
| id3 | 2 | 2 | | | | | |
| id4 | 5 | | | | | | |
| id5 | 1 | 2 | | | | | |
| id6 | 7 | | | | | | |
| id7 | 4 | | | | | | |
+-----+------+------+------+------+------+------+------+