在更大的日期范围内查找彼此在 Y 天内有 X 笔交易的账户

Find Accounts with X Number of Transactions within Y Days of Each Other in a Larger Date Range

我正在尝试编写一个 SQL 语句来查找在 3 天内进行了 3 次或更多次交易且在一周内绝对值大于 10.00 美元的账户,然后 return那些交易。

考虑这个数据...

TransactionID  AccountNumber  TransactionDate  TransactionAmount
-------------  -------------  ---------------  -----------------
1              0123           2020-09-01       45.75
2              0123           2020-09-02       5.23
3              0123           2020-09-03       9.94
4              0123           2020-09-05       8.35
5              0123           2020-09-06       -16.23
6              0123           2020-09-07       14.71
7              0123           2020-09-08       15.03
8              0123           2020-09-08       23.10
9              0123           2020-09-09       94.20
10             0123           2020-09-09       5.01
11             0123           2020-09-10       3.02
12             0123           2020-09-11       4.37
13             0123           2020-09-12       4.54
14             9876           2020-09-01       -45.75
15             9876           2020-09-02       5.27
16             9876           2020-09-05       19.79
17             9876           2020-09-05       -11.64
18             9876           2020-09-06       12.42

如果审查的一周是 2020-09-01 到 2020-09-07,我希望只有 AccountNumber 9876 符合标准,TransactionIDs 16、17 和 18 是 3 天内的 3 笔交易,绝对价值大于 10.00 美元。

看来我应该可以使用 window 功能(也许还有框架),但我不知道如何开始。

根据这个问题的答案,我尝试不使用 window 函数...

multiple transactions within a certain time period, limited by date range

DECLARE
    @BeginDate       DATE
  , @EndDate         DATE
  , @ThresholdAmount DECIMAL(10, 2)
  , @ThresholdCount  INT
  , @NumberOfDays    INT;

SET @BeginDate = '09/01/2020';
SET @EndDate = '09/07/2020';
SET @ThresholdAmount = 10.00;
SET @ThresholdCount = 3;
SET @NumberOfDays = 3;

SELECT t.*
FROM   (
           SELECT
                 t1.*
               , (
                     SELECT COUNT(*)
                     FROM   Transactions t2
                     WHERE  t2.AccountNumber   = t1.AccountNumber
                        AND t2.TransactionID   <> t1.TransactionID
                        AND t2.TransactionDate >= t1.TransactionDate
                        AND t2.TransactionDate < DATEADD(DAY, @NumberOfDays, t1.TransactionDate)
                        AND ABS(t2.TransactionAmount) > @ThresholdAmount
                 ) AS NumberWithinXDays
           FROM  Transactions t1
           WHERE t1.TransactionDate BETWEEN @BeginDate AND @EndDate
             AND ABS(t1.TransactionAmount) > @ThresholdAmount
       ) t
WHERE  t.NumberWithinXDays >= @ThresholdCount;

SELECT *
FROM   Transactions t
WHERE  EXISTS (
                  SELECT *
                  FROM   (
                             SELECT     t1.AccountNumber
                             FROM       Transactions t1
                             INNER JOIN Transactions t2 ON t1.AccountNumber = t2.AccountNumber
                                                       AND t1.TransactionID <> t2.TransactionID
                                                       AND DATEDIFF(DAY, t1.TransactionDate, t2.TransactionDate) BETWEEN 0 AND (@NumberOfDays-1)
                             WHERE      t1.TransactionDate BETWEEN @BeginDate AND @EndDate
                                    AND t2.TransactionDate BETWEEN @BeginDate AND @EndDate
                                    AND ABS(t1.TransactionAmount) > @ThresholdAmount
                                    AND ABS(t2.TransactionAmount) > @ThresholdAmount
                             GROUP BY   t1.AccountNumber
                             HAVING     COUNT(t1.TransactionID) >= @ThresholdCount
                         ) x
                  WHERE  x.AccountNumber = t.AccountNumber
              )
   AND t.TransactionDate BETWEEN @BeginDate AND @EndDate
   AND ABS(t.TransactionAmount) > @ThresholdAmount

我的第一个查询返回...

TransactionID  AccountNumber  TransactionDate  TransactionAmount  NumberWithinXDays
-------------  -------------  ---------------  -----------------  -----------------
5              0123           2020-09-06       -16.23             3
6              0123           2020-09-07       14.71              3

甚至还差得远。而第二个查询 returns...

TransactionID  AccountNumber  TransactionDate  TransactionAmount
-------------  -------------  ---------------  -----------------
14             9876           2020-09-01       -45.75
16             9876           2020-09-05       19.79
17             9876           2020-09-05       -11.64
18             9876           2020-09-06       12.42

更接近,但不限于彼此之间3天内的交易。这就是我想要的结果。

TransactionID  AccountNumber  TransactionDate  TransactionAmount
-------------  -------------  ---------------  -----------------
16             9876           2020-09-05       19.79
17             9876           2020-09-05       -11.64
18             9876           2020-09-06       12.42

很可能我没有正确执行这些建议的查询。或者也许我遗漏了一些细微的差别,它们不适合我的情况。

关于修复我尝试的查询或完全不同的有或没有 window 函数的问题有什么建议吗?

这是我的代码的完整 dbfiddle

我无法使用 window 函数提出解决方案。当我考虑更多时,我认为我可以使用 CTE,但我也想不通。

我使用几个子查询来解决它。考虑到我的事务 table 有 8600 万行,我很担心性能。但是,它运行不到 30 秒,这对我来说已经足够了。

-- distinct is need because a particular transaction may fit into more than 
-- one transaction window but we only want to see it once in the results
SELECT     DISTINCT
           t.TransactionID
         , t.AccountNumber
         , t.TransactionDate
         , t.TransactionAmount
FROM       (
               SELECT
                          t1.AccountNumber
                        , t1.TransactionDateWindowBegin
                        , t1.TransactionDateWindowEnd
                        , COUNT(DISTINCT t2.TransactionID) AS Count
               FROM       (
                              -- establish the transaction window for each transaction within the 
                              -- larger date range and an absolute value above the threshold 
                              SELECT
                                    TransactionID
                                  , AccountNumber
                                  , TransactionDate                                  AS [TransactionDateWindowBegin]
                                  , DATEADD(DAY, @NumberOfDays - 1, TransactionDate) AS [TransactionDateWindowEnd]
                                  , TransactionAmount
                              FROM  Transactions
                              WHERE TransactionDate BETWEEN @BeginDate AND @EndDate
                                AND ABS(TransactionAmount) > @ThresholdAmount
                          )             t1
               -- join back to the transaction table to find transactions within the transaction window for 
               -- each transaction, count them, and only keep those that are above the threshold count
               INNER JOIN Transactions t2 ON t1.AccountNumber              = t2.AccountNumber
                                         AND t1.TransactionDateWindowBegin <= t2.TransactionDate
                                         AND t1.TransactionDateWindowEnd   >= t2.TransactionDate
               WHERE      t2.TransactionDate BETWEEN @BeginDate AND @EndDate
                      AND ABS(t2.TransactionAmount) > @ThresholdAmount
               GROUP BY   t1.AccountNumber
                        , t1.TransactionDateWindowBegin
                        , t1.TransactionDateWindowEnd
               HAVING     COUNT(DISTINCT t2.TransactionID) >= @ThresholdCount
           )             x
-- join back to the transaction table again to get the details for the 
-- transactions that meet the threshold amount and count criteria
INNER JOIN Transactions t ON x.AccountNumber              = t.AccountNumber
                         AND x.TransactionDateWindowBegin <= t.TransactionDate
                         AND x.TransactionDateWindowEnd   >= t.TransactionDate
                         AND ABS(t.TransactionAmount)     > @ThresholdAmount;

这是完整的demo