在更大的日期范围内查找彼此在 Y 天内有 X 笔交易的账户
Find Accounts with X Number of Transactions within Y Days of Each Other in a Larger Date Range
我正在尝试编写一个 SQL 语句来查找在 3 天内进行了 3 次或更多次交易且在一周内绝对值大于 10.00 美元的账户,然后 return那些交易。
考虑这个数据...
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
1 0123 2020-09-01 45.75
2 0123 2020-09-02 5.23
3 0123 2020-09-03 9.94
4 0123 2020-09-05 8.35
5 0123 2020-09-06 -16.23
6 0123 2020-09-07 14.71
7 0123 2020-09-08 15.03
8 0123 2020-09-08 23.10
9 0123 2020-09-09 94.20
10 0123 2020-09-09 5.01
11 0123 2020-09-10 3.02
12 0123 2020-09-11 4.37
13 0123 2020-09-12 4.54
14 9876 2020-09-01 -45.75
15 9876 2020-09-02 5.27
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
如果审查的一周是 2020-09-01 到 2020-09-07,我希望只有 AccountNumber 9876 符合标准,TransactionIDs 16、17 和 18 是 3 天内的 3 笔交易,绝对价值大于 10.00 美元。
看来我应该可以使用 window 功能(也许还有框架),但我不知道如何开始。
根据这个问题的答案,我尝试不使用 window 函数...
multiple transactions within a certain time period, limited by date range
DECLARE
@BeginDate DATE
, @EndDate DATE
, @ThresholdAmount DECIMAL(10, 2)
, @ThresholdCount INT
, @NumberOfDays INT;
SET @BeginDate = '09/01/2020';
SET @EndDate = '09/07/2020';
SET @ThresholdAmount = 10.00;
SET @ThresholdCount = 3;
SET @NumberOfDays = 3;
SELECT t.*
FROM (
SELECT
t1.*
, (
SELECT COUNT(*)
FROM Transactions t2
WHERE t2.AccountNumber = t1.AccountNumber
AND t2.TransactionID <> t1.TransactionID
AND t2.TransactionDate >= t1.TransactionDate
AND t2.TransactionDate < DATEADD(DAY, @NumberOfDays, t1.TransactionDate)
AND ABS(t2.TransactionAmount) > @ThresholdAmount
) AS NumberWithinXDays
FROM Transactions t1
WHERE t1.TransactionDate BETWEEN @BeginDate AND @EndDate
AND ABS(t1.TransactionAmount) > @ThresholdAmount
) t
WHERE t.NumberWithinXDays >= @ThresholdCount;
SELECT *
FROM Transactions t
WHERE EXISTS (
SELECT *
FROM (
SELECT t1.AccountNumber
FROM Transactions t1
INNER JOIN Transactions t2 ON t1.AccountNumber = t2.AccountNumber
AND t1.TransactionID <> t2.TransactionID
AND DATEDIFF(DAY, t1.TransactionDate, t2.TransactionDate) BETWEEN 0 AND (@NumberOfDays-1)
WHERE t1.TransactionDate BETWEEN @BeginDate AND @EndDate
AND t2.TransactionDate BETWEEN @BeginDate AND @EndDate
AND ABS(t1.TransactionAmount) > @ThresholdAmount
AND ABS(t2.TransactionAmount) > @ThresholdAmount
GROUP BY t1.AccountNumber
HAVING COUNT(t1.TransactionID) >= @ThresholdCount
) x
WHERE x.AccountNumber = t.AccountNumber
)
AND t.TransactionDate BETWEEN @BeginDate AND @EndDate
AND ABS(t.TransactionAmount) > @ThresholdAmount
我的第一个查询返回...
TransactionID AccountNumber TransactionDate TransactionAmount NumberWithinXDays
------------- ------------- --------------- ----------------- -----------------
5 0123 2020-09-06 -16.23 3
6 0123 2020-09-07 14.71 3
甚至还差得远。而第二个查询 returns...
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
14 9876 2020-09-01 -45.75
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
更接近,但不限于彼此之间3天内的交易。这就是我想要的结果。
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
很可能我没有正确执行这些建议的查询。或者也许我遗漏了一些细微的差别,它们不适合我的情况。
关于修复我尝试的查询或完全不同的有或没有 window 函数的问题有什么建议吗?
这是我的代码的完整 dbfiddle。
我无法使用 window 函数提出解决方案。当我考虑更多时,我认为我可以使用 CTE,但我也想不通。
我使用几个子查询来解决它。考虑到我的事务 table 有 8600 万行,我很担心性能。但是,它运行不到 30 秒,这对我来说已经足够了。
-- distinct is need because a particular transaction may fit into more than
-- one transaction window but we only want to see it once in the results
SELECT DISTINCT
t.TransactionID
, t.AccountNumber
, t.TransactionDate
, t.TransactionAmount
FROM (
SELECT
t1.AccountNumber
, t1.TransactionDateWindowBegin
, t1.TransactionDateWindowEnd
, COUNT(DISTINCT t2.TransactionID) AS Count
FROM (
-- establish the transaction window for each transaction within the
-- larger date range and an absolute value above the threshold
SELECT
TransactionID
, AccountNumber
, TransactionDate AS [TransactionDateWindowBegin]
, DATEADD(DAY, @NumberOfDays - 1, TransactionDate) AS [TransactionDateWindowEnd]
, TransactionAmount
FROM Transactions
WHERE TransactionDate BETWEEN @BeginDate AND @EndDate
AND ABS(TransactionAmount) > @ThresholdAmount
) t1
-- join back to the transaction table to find transactions within the transaction window for
-- each transaction, count them, and only keep those that are above the threshold count
INNER JOIN Transactions t2 ON t1.AccountNumber = t2.AccountNumber
AND t1.TransactionDateWindowBegin <= t2.TransactionDate
AND t1.TransactionDateWindowEnd >= t2.TransactionDate
WHERE t2.TransactionDate BETWEEN @BeginDate AND @EndDate
AND ABS(t2.TransactionAmount) > @ThresholdAmount
GROUP BY t1.AccountNumber
, t1.TransactionDateWindowBegin
, t1.TransactionDateWindowEnd
HAVING COUNT(DISTINCT t2.TransactionID) >= @ThresholdCount
) x
-- join back to the transaction table again to get the details for the
-- transactions that meet the threshold amount and count criteria
INNER JOIN Transactions t ON x.AccountNumber = t.AccountNumber
AND x.TransactionDateWindowBegin <= t.TransactionDate
AND x.TransactionDateWindowEnd >= t.TransactionDate
AND ABS(t.TransactionAmount) > @ThresholdAmount;
这是完整的demo。
我正在尝试编写一个 SQL 语句来查找在 3 天内进行了 3 次或更多次交易且在一周内绝对值大于 10.00 美元的账户,然后 return那些交易。
考虑这个数据...
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
1 0123 2020-09-01 45.75
2 0123 2020-09-02 5.23
3 0123 2020-09-03 9.94
4 0123 2020-09-05 8.35
5 0123 2020-09-06 -16.23
6 0123 2020-09-07 14.71
7 0123 2020-09-08 15.03
8 0123 2020-09-08 23.10
9 0123 2020-09-09 94.20
10 0123 2020-09-09 5.01
11 0123 2020-09-10 3.02
12 0123 2020-09-11 4.37
13 0123 2020-09-12 4.54
14 9876 2020-09-01 -45.75
15 9876 2020-09-02 5.27
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
如果审查的一周是 2020-09-01 到 2020-09-07,我希望只有 AccountNumber 9876 符合标准,TransactionIDs 16、17 和 18 是 3 天内的 3 笔交易,绝对价值大于 10.00 美元。
看来我应该可以使用 window 功能(也许还有框架),但我不知道如何开始。
根据这个问题的答案,我尝试不使用 window 函数...
multiple transactions within a certain time period, limited by date range
DECLARE
@BeginDate DATE
, @EndDate DATE
, @ThresholdAmount DECIMAL(10, 2)
, @ThresholdCount INT
, @NumberOfDays INT;
SET @BeginDate = '09/01/2020';
SET @EndDate = '09/07/2020';
SET @ThresholdAmount = 10.00;
SET @ThresholdCount = 3;
SET @NumberOfDays = 3;
SELECT t.*
FROM (
SELECT
t1.*
, (
SELECT COUNT(*)
FROM Transactions t2
WHERE t2.AccountNumber = t1.AccountNumber
AND t2.TransactionID <> t1.TransactionID
AND t2.TransactionDate >= t1.TransactionDate
AND t2.TransactionDate < DATEADD(DAY, @NumberOfDays, t1.TransactionDate)
AND ABS(t2.TransactionAmount) > @ThresholdAmount
) AS NumberWithinXDays
FROM Transactions t1
WHERE t1.TransactionDate BETWEEN @BeginDate AND @EndDate
AND ABS(t1.TransactionAmount) > @ThresholdAmount
) t
WHERE t.NumberWithinXDays >= @ThresholdCount;
SELECT *
FROM Transactions t
WHERE EXISTS (
SELECT *
FROM (
SELECT t1.AccountNumber
FROM Transactions t1
INNER JOIN Transactions t2 ON t1.AccountNumber = t2.AccountNumber
AND t1.TransactionID <> t2.TransactionID
AND DATEDIFF(DAY, t1.TransactionDate, t2.TransactionDate) BETWEEN 0 AND (@NumberOfDays-1)
WHERE t1.TransactionDate BETWEEN @BeginDate AND @EndDate
AND t2.TransactionDate BETWEEN @BeginDate AND @EndDate
AND ABS(t1.TransactionAmount) > @ThresholdAmount
AND ABS(t2.TransactionAmount) > @ThresholdAmount
GROUP BY t1.AccountNumber
HAVING COUNT(t1.TransactionID) >= @ThresholdCount
) x
WHERE x.AccountNumber = t.AccountNumber
)
AND t.TransactionDate BETWEEN @BeginDate AND @EndDate
AND ABS(t.TransactionAmount) > @ThresholdAmount
我的第一个查询返回...
TransactionID AccountNumber TransactionDate TransactionAmount NumberWithinXDays
------------- ------------- --------------- ----------------- -----------------
5 0123 2020-09-06 -16.23 3
6 0123 2020-09-07 14.71 3
甚至还差得远。而第二个查询 returns...
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
14 9876 2020-09-01 -45.75
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
更接近,但不限于彼此之间3天内的交易。这就是我想要的结果。
TransactionID AccountNumber TransactionDate TransactionAmount
------------- ------------- --------------- -----------------
16 9876 2020-09-05 19.79
17 9876 2020-09-05 -11.64
18 9876 2020-09-06 12.42
很可能我没有正确执行这些建议的查询。或者也许我遗漏了一些细微的差别,它们不适合我的情况。
关于修复我尝试的查询或完全不同的有或没有 window 函数的问题有什么建议吗?
这是我的代码的完整 dbfiddle。
我无法使用 window 函数提出解决方案。当我考虑更多时,我认为我可以使用 CTE,但我也想不通。
我使用几个子查询来解决它。考虑到我的事务 table 有 8600 万行,我很担心性能。但是,它运行不到 30 秒,这对我来说已经足够了。
-- distinct is need because a particular transaction may fit into more than
-- one transaction window but we only want to see it once in the results
SELECT DISTINCT
t.TransactionID
, t.AccountNumber
, t.TransactionDate
, t.TransactionAmount
FROM (
SELECT
t1.AccountNumber
, t1.TransactionDateWindowBegin
, t1.TransactionDateWindowEnd
, COUNT(DISTINCT t2.TransactionID) AS Count
FROM (
-- establish the transaction window for each transaction within the
-- larger date range and an absolute value above the threshold
SELECT
TransactionID
, AccountNumber
, TransactionDate AS [TransactionDateWindowBegin]
, DATEADD(DAY, @NumberOfDays - 1, TransactionDate) AS [TransactionDateWindowEnd]
, TransactionAmount
FROM Transactions
WHERE TransactionDate BETWEEN @BeginDate AND @EndDate
AND ABS(TransactionAmount) > @ThresholdAmount
) t1
-- join back to the transaction table to find transactions within the transaction window for
-- each transaction, count them, and only keep those that are above the threshold count
INNER JOIN Transactions t2 ON t1.AccountNumber = t2.AccountNumber
AND t1.TransactionDateWindowBegin <= t2.TransactionDate
AND t1.TransactionDateWindowEnd >= t2.TransactionDate
WHERE t2.TransactionDate BETWEEN @BeginDate AND @EndDate
AND ABS(t2.TransactionAmount) > @ThresholdAmount
GROUP BY t1.AccountNumber
, t1.TransactionDateWindowBegin
, t1.TransactionDateWindowEnd
HAVING COUNT(DISTINCT t2.TransactionID) >= @ThresholdCount
) x
-- join back to the transaction table again to get the details for the
-- transactions that meet the threshold amount and count criteria
INNER JOIN Transactions t ON x.AccountNumber = t.AccountNumber
AND x.TransactionDateWindowBegin <= t.TransactionDate
AND x.TransactionDateWindowEnd >= t.TransactionDate
AND ABS(t.TransactionAmount) > @ThresholdAmount;
这是完整的demo。