将 SQL 更新转换为 SAS DI
Translating a SQL update to SAS DI
假设我们有一个 table P_DEF
,其中我们想要更新存储在另一个 table [=] 中的某个子集的列 RUN_ID
的值14=]。在 SQL:
中我会怎么做
update P_DEF
set RUN_ID = (-1) * TMP.RUN_ID /* change the sign of the value */
from P_DEF
inner join TMP
on P_DEF.RUN_ID = TMP.RUN_ID
and P_DEF.ITEM_ID = TMP.ITEM_ID
and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE
现在有个大问题:据我所知,proc SQL
does not support this kind of filtered update. So how do I do this with a minimal number of transformations in SAS DI(S)?
SAS 不支持通过连接更新 SQL,但您可以执行相关更新:通过相关子查询中的值更新:
data P_DEF;
infile cards;
length RUN_ID_ORIG 8;
input RUN_ID ITEM_ID ITEM_TITLE .;
RUN_ID_ORIG = RUN_ID;
cards;
1 1 some title
1 1 should be negative
1 2 another title
1 3 should be negative
4 44 another title
5 44 should be negative
;
run;
data TMP;
infile cards;
input RUN_ID ITEM_ID ITEM_TITLE . @30 NEW_ID;
cards;
1 1 should be negative 100
1 3 should be negative 123
5 44 should be negative 188
;
run;
proc sql;
/* this unwillingly updates all records, nonmatched will be set to null */
update P_DEF
set RUN_ID = (select NEW_ID from TMP
where P_DEF.RUN_ID = TMP.RUN_ID
and P_DEF.ITEM_ID = TMP.ITEM_ID
and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE )
;
select * from P_DEF
;
quit;
当存在不匹配时,相关更新是不够的,因此您需要添加过滤器以仅更新匹配的行。
连接多个列时,我通常依靠 catx 来获取唯一值
(根据您的数据,您可能需要在 put 函数中使用不同的数字格式):
proc sql;
update P_DEF set RUN_ID = RUN_ID_ORIG; /* reset RUN_ID */
quit;
/* correct "inner join" update */
proc sql;
update P_DEF
set RUN_ID = (select NEW_ID from TMP
where P_DEF.RUN_ID = TMP.RUN_ID
and P_DEF.ITEM_ID = TMP.ITEM_ID
and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE )
where
catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
in select catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
from TMP
;
select * from P_DEF;
quit;
上面的版本与您展示如何从子查询中获取值的确切示例略有不同 - NEW_ID 列。
您仅使用查找 table 来识别要更新的行的简化版本是这样的:
proc sql;
update P_DEF set RUN_ID = RUN_ID_ORIG; /* reset RUN_ID */
quit;
proc sql;
/* simplified for your case:
you dont actually use value from TMP that does not exist in P_DEF */
update P_DEF
set RUN_ID = -1 * RUN_ID
where
RUN_ID > 0 /* so we can rerun this if needed */
and catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
in ( select catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
from TMP )
;
select * from P_DEF;
quit;
如您所见,相关更新可能需要两个子查询来更新单个列,因此不要期望它在更大的 table 上表现出色。使用数据步骤方法可能会更好:MERGE、MODIFY 或 UPDATE 语句。
至于您要求的 SAS Data Integration Studio 转换,我相信您可以使用 SCD Type 1 Loader 实现,这将生成我提到的一些代码。
假设我们有一个 table P_DEF
,其中我们想要更新存储在另一个 table [=] 中的某个子集的列 RUN_ID
的值14=]。在 SQL:
update P_DEF
set RUN_ID = (-1) * TMP.RUN_ID /* change the sign of the value */
from P_DEF
inner join TMP
on P_DEF.RUN_ID = TMP.RUN_ID
and P_DEF.ITEM_ID = TMP.ITEM_ID
and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE
现在有个大问题:据我所知,proc SQL
does not support this kind of filtered update. So how do I do this with a minimal number of transformations in SAS DI(S)?
SAS 不支持通过连接更新 SQL,但您可以执行相关更新:通过相关子查询中的值更新:
data P_DEF;
infile cards;
length RUN_ID_ORIG 8;
input RUN_ID ITEM_ID ITEM_TITLE .;
RUN_ID_ORIG = RUN_ID;
cards;
1 1 some title
1 1 should be negative
1 2 another title
1 3 should be negative
4 44 another title
5 44 should be negative
;
run;
data TMP;
infile cards;
input RUN_ID ITEM_ID ITEM_TITLE . @30 NEW_ID;
cards;
1 1 should be negative 100
1 3 should be negative 123
5 44 should be negative 188
;
run;
proc sql;
/* this unwillingly updates all records, nonmatched will be set to null */
update P_DEF
set RUN_ID = (select NEW_ID from TMP
where P_DEF.RUN_ID = TMP.RUN_ID
and P_DEF.ITEM_ID = TMP.ITEM_ID
and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE )
;
select * from P_DEF
;
quit;
当存在不匹配时,相关更新是不够的,因此您需要添加过滤器以仅更新匹配的行。 连接多个列时,我通常依靠 catx 来获取唯一值 (根据您的数据,您可能需要在 put 函数中使用不同的数字格式):
proc sql;
update P_DEF set RUN_ID = RUN_ID_ORIG; /* reset RUN_ID */
quit;
/* correct "inner join" update */
proc sql;
update P_DEF
set RUN_ID = (select NEW_ID from TMP
where P_DEF.RUN_ID = TMP.RUN_ID
and P_DEF.ITEM_ID = TMP.ITEM_ID
and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE )
where
catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
in select catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
from TMP
;
select * from P_DEF;
quit;
上面的版本与您展示如何从子查询中获取值的确切示例略有不同 - NEW_ID 列。
您仅使用查找 table 来识别要更新的行的简化版本是这样的:
proc sql;
update P_DEF set RUN_ID = RUN_ID_ORIG; /* reset RUN_ID */
quit;
proc sql;
/* simplified for your case:
you dont actually use value from TMP that does not exist in P_DEF */
update P_DEF
set RUN_ID = -1 * RUN_ID
where
RUN_ID > 0 /* so we can rerun this if needed */
and catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
in ( select catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
from TMP )
;
select * from P_DEF;
quit;
如您所见,相关更新可能需要两个子查询来更新单个列,因此不要期望它在更大的 table 上表现出色。使用数据步骤方法可能会更好:MERGE、MODIFY 或 UPDATE 语句。
至于您要求的 SAS Data Integration Studio 转换,我相信您可以使用 SCD Type 1 Loader 实现,这将生成我提到的一些代码。