将 SQL 更新转换为 SAS DI

Translating a SQL update to SAS DI

假设我们有一个 table P_DEF,其中我们想要更新存储在另一个 table [=] 中的某个子集的列 RUN_ID 的值14=]。在 SQL:

中我会怎么做
update P_DEF
set RUN_ID = (-1) * TMP.RUN_ID /* change the sign of the value */
from P_DEF
inner join TMP
on P_DEF.RUN_ID = TMP.RUN_ID
and P_DEF.ITEM_ID = TMP.ITEM_ID
and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE

现在有个大问题:据我所知,proc SQL does not support this kind of filtered update. So how do I do this with a minimal number of transformations in SAS DI(S)?

SAS 不支持通过连接更新 SQL,但您可以执行相关更新:通过相关子查询中的值更新:

data P_DEF;
infile cards;
length RUN_ID_ORIG 8;
input RUN_ID ITEM_ID ITEM_TITLE .;
RUN_ID_ORIG = RUN_ID;
cards;
1 1 some title
1 1 should be negative
1 2 another title
1 3 should be negative
4 44 another title
5 44 should be negative
;
run;

data TMP;
infile cards;
input RUN_ID ITEM_ID ITEM_TITLE . @30 NEW_ID;
cards;
1 1 should be negative       100
1 3 should be negative       123
5 44 should be negative      188
;
run;

proc sql;
/* this unwillingly updates all records, nonmatched will be set to null */
update P_DEF
set RUN_ID = (select NEW_ID from TMP
            where P_DEF.RUN_ID = TMP.RUN_ID
            and P_DEF.ITEM_ID = TMP.ITEM_ID
            and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE )
;
select * from P_DEF
;
quit;

当存在不匹配时,相关更新是不够的,因此您需要添加过滤器以仅更新匹配的行。 连接多个列时,我通常依靠 catx 来获取唯一值 (根据您的数据,您可能需要在 put 函数中使用不同的数字格式):

proc sql;
update P_DEF set RUN_ID = RUN_ID_ORIG; /* reset RUN_ID */
quit;


/* correct "inner join" update */
proc sql;
update P_DEF
set RUN_ID = (select NEW_ID from TMP
            where P_DEF.RUN_ID = TMP.RUN_ID
            and P_DEF.ITEM_ID = TMP.ITEM_ID
            and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE )
where
          catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
in select catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
from TMP
;
select * from P_DEF;
quit;

上面的版本与您展示如何从子查询中获取值的确切示例略有不同 - NEW_ID 列。

您仅使用查找 table 来识别要更新的行的简化版本是这样的:

proc sql;
update P_DEF set RUN_ID = RUN_ID_ORIG; /* reset RUN_ID */
quit;

proc sql;
/* simplified for your case:
you dont actually use value from TMP that does not exist in P_DEF */
update P_DEF
set RUN_ID = -1 * RUN_ID
where
   RUN_ID > 0 /* so we can rerun this if needed */
   and      catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
in ( select catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
from TMP )
;
select * from P_DEF;
quit;

如您所见,相关更新可能需要两个子查询来更新单个列,因此不要期望它在更大的 table 上表现出色。使用数据步骤方法可能会更好:MERGE、MODIFY 或 UPDATE 语句。

至于您要求的 SAS Data Integration Studio 转换,我相信您可以使用 SCD Type 1 Loader 实现,这将生成我提到的一些代码。