计算SAS中两行之间的百分比差异
Calculate the % difference between two rows in SAS
我有一个客户级别的数据,其中包含客户的 Pre-covid、post-covid 和 In-covid 余额。数据看起来像这样
账号
Covid 标志
余额
123
Covid 之前
100
123
新冠疫情
200
123
Post-Covid
400
我需要创建一个新列,其中包含这些 covid 标志之间的百分比差异。因此,额外的列应该在 pre-covid 期间与 covid(第 1 行到 2 行)、covid 到 pre-covid(从第 2 行到 3 行)以及最后从 pre-covid 到 post covid(第 1 行和第 3 行)
最终数据应该是这样的
账号
COVID 旗帜
% 差异
123
在 Covid 之前
100%
123
进入 Post Covid
100%
123
在 Post Covid 之前
300%
如何创建差异百分比列和新的 covid 标志?
我只能想到滞后函数来做这个,我可以使用 1 到 2、2 到 3 的滞后函数,但是我如何为 1 到 3 做这个?
由于只有三个值,我们可以使用一些简单的数据步骤逻辑将我们感兴趣的所有值存储到我们找到的临时变量中,然后在每个帐户的最后一行一次输出一个ID。为了说明这一点,下面是我们逐行阅读时的背景计算:
accountid covid_flag balance pre_covid in_covid post_covid pct_diff
123 Pre-Covid 100 100 . . .
123 In-Covid 200 100 200 . .
123 Post-Covid 400 100 200 400 .
----------------------------------------------------------------------------------------
Point where we output and calculate % diff
----------------------------------------------------------------------------------------
123 pre to in Covid 400 100 200 400 100%
123 In to Post Covid 400 100 200 400 100%
123 pre to Post Covid 400 100 200 400 300%
代码如下所示:
data want;
set have;
by accountid;
/* Temporary variables to hold the balance found in each row */
retain pre_covid in_covid post_covid;
/* Reset temporary variables at the start of each account ID */
if(first.accountid) then call missing(pre_covid, in_covid, post_covid);
/* Save each covid flag balance to temporary variables */
select(upcase(covid_flag) );
when('PRE-COVID') pre_covid = balance;
when('IN-COVID') in_covid = balance;
when('POST-COVID') post_covid = balance;
end;
/* Uncomment to view intermediate steps */
/* output;*/
/* At the very last account, calculate the differences and output for each one */
if(last.accountid) then do;
covid_flag = 'pre to in Covid';
pct_diff = (in_covid - pre_covid)/pre_covid;
output;
covid_flag = 'In to Post Covid';
pct_diff = (post_covid - in_covid)/in_covid;
output;
covid_flag = 'pre to Post Covid';
pct_diff = (post_covid - pre_covid)/pre_covid;
output;
end;
format pct_diff percent8.;
run;
我有一个客户级别的数据,其中包含客户的 Pre-covid、post-covid 和 In-covid 余额。数据看起来像这样
账号 | Covid 标志 | 余额 |
---|---|---|
123 | Covid 之前 | 100 |
123 | 新冠疫情 | 200 |
123 | Post-Covid | 400 |
我需要创建一个新列,其中包含这些 covid 标志之间的百分比差异。因此,额外的列应该在 pre-covid 期间与 covid(第 1 行到 2 行)、covid 到 pre-covid(从第 2 行到 3 行)以及最后从 pre-covid 到 post covid(第 1 行和第 3 行)
最终数据应该是这样的
账号 | COVID 旗帜 | % 差异 |
---|---|---|
123 | 在 Covid | 之前100% |
123 | 进入 Post Covid | 100% |
123 | 在 Post Covid | 之前300% |
如何创建差异百分比列和新的 covid 标志?
我只能想到滞后函数来做这个,我可以使用 1 到 2、2 到 3 的滞后函数,但是我如何为 1 到 3 做这个?
由于只有三个值,我们可以使用一些简单的数据步骤逻辑将我们感兴趣的所有值存储到我们找到的临时变量中,然后在每个帐户的最后一行一次输出一个ID。为了说明这一点,下面是我们逐行阅读时的背景计算:
accountid covid_flag balance pre_covid in_covid post_covid pct_diff
123 Pre-Covid 100 100 . . .
123 In-Covid 200 100 200 . .
123 Post-Covid 400 100 200 400 .
----------------------------------------------------------------------------------------
Point where we output and calculate % diff
----------------------------------------------------------------------------------------
123 pre to in Covid 400 100 200 400 100%
123 In to Post Covid 400 100 200 400 100%
123 pre to Post Covid 400 100 200 400 300%
代码如下所示:
data want;
set have;
by accountid;
/* Temporary variables to hold the balance found in each row */
retain pre_covid in_covid post_covid;
/* Reset temporary variables at the start of each account ID */
if(first.accountid) then call missing(pre_covid, in_covid, post_covid);
/* Save each covid flag balance to temporary variables */
select(upcase(covid_flag) );
when('PRE-COVID') pre_covid = balance;
when('IN-COVID') in_covid = balance;
when('POST-COVID') post_covid = balance;
end;
/* Uncomment to view intermediate steps */
/* output;*/
/* At the very last account, calculate the differences and output for each one */
if(last.accountid) then do;
covid_flag = 'pre to in Covid';
pct_diff = (in_covid - pre_covid)/pre_covid;
output;
covid_flag = 'In to Post Covid';
pct_diff = (post_covid - in_covid)/in_covid;
output;
covid_flag = 'pre to Post Covid';
pct_diff = (post_covid - pre_covid)/pre_covid;
output;
end;
format pct_diff percent8.;
run;