如何计算滚动日期 Window 的 SAS PROC SQL 的不同值?
How to Count Distinct for SAS PROC SQL with Rolling Date Window?
我有一个 SAS 数据集,如下所示:
MEMBER_ID VAR1 VAR2 DATE
1 12 5 01/04/2020
1 12 5 02/06/2020
1 16 5 04/14/2020
1 12 7 09/10/2020
2 10 5 02/20/2020
2 10 6 04/20/2020
2 10 5 04/25/2020
2 10 5 05/15/2020
3 15 3 01/15/2020
3 16 4 01/25/2020
4 10 5 05/15/2020
5 11 7 03/03/2020
5 12 8 04/03/2020
5 13 9 05/03/2020
我的目标是计算 VAR1
和 VAR2
中按 MEMBER_ID
分组且滚动日期范围为 180 天的不同值。因此,如果第 2 行中的日期在第 1 行中成员 1 的 180 天之内,那么他们将被(明确地)计算在内。我当前的代码如下所示:
PROC SQL;
CREATE TABLE WORK.WANT AS
SELECT t1.MEMBER_ID,
t1.VAR1,
t1.VAR2,
t1.DATE,
/* var1Count */
(COUNT(DISTINCT(t1.VAR1))) FORMAT=10. LABEL="var1Count " WHERE (t1 BETWEEN t1.DATE- 180 AND t1.DATE) AS var1Count ,
/* var2Count */
(COUNT(DISTINCT(t1.VAR2))) FORMAT=10. LABEL="var2Count " WHERE (t1 BETWEEN t1.DATE- 180 AND t1.DATE) AS var2Count ,
FROM WORK.HAVE t1
GROUP BY t1.MEMBER_ID
HAVING (CALCULATED var1Count ) >= 2 AND (CALCULATED var2Count ) >= 2
ORDER BY t1.MEMBER_ID,
t1.DATE;
QUIT;
但是虽然我认为列计算中的这个 WHERE
语句可能适用于常规 SQL 代码,但它在这里给了我错误。还有其他想法吗?可能是我需要在不同的 SAS data
步骤中执行此操作 COUNT(DISTINCT VAR)
,但我不确定(在这方面对 SAS 还很陌生)。非常感谢任何帮助!
我认为您需要在 SAS 中为此使用相关子查询:
SELECT h.* ,
(SELECT COUNT(DISTINCT h2.VAR1)
FROM WORK.HAVE h2
WHERE h2.MEMBER_ID = h.MEMBER_ID AND
h2.DATE BETWEEN h.DATE - 180 AND h.DATE
) as var1count,
(SELECT COUNT(DISTINCT h2.VAR2)
FROM WORK.HAVE h2
WHERE h2.MEMBER_ID = h.MEMBER_ID AND
h2.DATE BETWEEN h.DATE - 180 AND h.DATE
) as var2count
FROM WORK.HAVE h;
如果要过滤计数,可以使用子查询。
我有一个 SAS 数据集,如下所示:
MEMBER_ID VAR1 VAR2 DATE
1 12 5 01/04/2020
1 12 5 02/06/2020
1 16 5 04/14/2020
1 12 7 09/10/2020
2 10 5 02/20/2020
2 10 6 04/20/2020
2 10 5 04/25/2020
2 10 5 05/15/2020
3 15 3 01/15/2020
3 16 4 01/25/2020
4 10 5 05/15/2020
5 11 7 03/03/2020
5 12 8 04/03/2020
5 13 9 05/03/2020
我的目标是计算 VAR1
和 VAR2
中按 MEMBER_ID
分组且滚动日期范围为 180 天的不同值。因此,如果第 2 行中的日期在第 1 行中成员 1 的 180 天之内,那么他们将被(明确地)计算在内。我当前的代码如下所示:
PROC SQL;
CREATE TABLE WORK.WANT AS
SELECT t1.MEMBER_ID,
t1.VAR1,
t1.VAR2,
t1.DATE,
/* var1Count */
(COUNT(DISTINCT(t1.VAR1))) FORMAT=10. LABEL="var1Count " WHERE (t1 BETWEEN t1.DATE- 180 AND t1.DATE) AS var1Count ,
/* var2Count */
(COUNT(DISTINCT(t1.VAR2))) FORMAT=10. LABEL="var2Count " WHERE (t1 BETWEEN t1.DATE- 180 AND t1.DATE) AS var2Count ,
FROM WORK.HAVE t1
GROUP BY t1.MEMBER_ID
HAVING (CALCULATED var1Count ) >= 2 AND (CALCULATED var2Count ) >= 2
ORDER BY t1.MEMBER_ID,
t1.DATE;
QUIT;
但是虽然我认为列计算中的这个 WHERE
语句可能适用于常规 SQL 代码,但它在这里给了我错误。还有其他想法吗?可能是我需要在不同的 SAS data
步骤中执行此操作 COUNT(DISTINCT VAR)
,但我不确定(在这方面对 SAS 还很陌生)。非常感谢任何帮助!
我认为您需要在 SAS 中为此使用相关子查询:
SELECT h.* ,
(SELECT COUNT(DISTINCT h2.VAR1)
FROM WORK.HAVE h2
WHERE h2.MEMBER_ID = h.MEMBER_ID AND
h2.DATE BETWEEN h.DATE - 180 AND h.DATE
) as var1count,
(SELECT COUNT(DISTINCT h2.VAR2)
FROM WORK.HAVE h2
WHERE h2.MEMBER_ID = h.MEMBER_ID AND
h2.DATE BETWEEN h.DATE - 180 AND h.DATE
) as var2count
FROM WORK.HAVE h;
如果要过滤计数,可以使用子查询。