SQL 滞后 window 和自定义逻辑
SQL lag window and custom logic
我正在处理这个数据集
https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=447a5d2c33b04346e70dab0a8d098655
自定义逻辑:
按名称、考试中心、课程名称、考试类型分组。
如果进行了重新测试,则比较分数 - 如果较高者优先于最高分数或原始分数。
延迟:
根据以上选择的行,如果在 4 天内剩余记录集示例之间存在滞后 window,则应选择最高分记录。
如有任何建议,我们将不胜感激。
示例数据
+----------+-------+------------+------------+----------+-----------+------------+------------+------------------+-----------------------------------------------+
| recordid | Name | testcentre | coursename | testtype | testscore | testdate | retestflag | Preferred_Output | RejectReason |
+----------+-------+------------+------------+----------+-----------+------------+------------+------------------+-----------------------------------------------+
| 1 | Sam | Paris | English | IELTS | 90 | 01/02/2019 | 0 | 0 | |
| 3 | Sam | Paris | English | IELTS | 95 | 02/02/2019 | 1 | 1 | Better score in retest |
| 4 | Sam | Paris | English | TOEFL | 80 | 04/02/2019 | 0 | 0 | Within 4 days of previous test |
| 21 | Sam | Paris | English | IELTS | 95 | 02/02/2018 | 1 | 1 | marked as retest without base.needs inclusion |
| 5 | Jack | London | English | IELTS | 90 | 01/02/2019 | 0 | 1 | Same or bad score in retest |
| 8 | Jack | London | English | IELTS | 90 | 02/02/2019 | 1 | 0 | Same or bad score in retest |
| 7 | Louis | Brazil | English | IELTS | 70 | 01/02/2019 | 0 | 1 | Same score in retest |
| 11 | Louis | Brazil | English | IELTS | 70 | 02/02/2019 | 1 | 0 | Same score in retest |
| 13 | Louis | Brazil | English | TOEFL | 100 | 04/02/2019 | 0 | 0 | Within 4 days of previous test |
| 55 | Sam | Paris | English | IELTS | 90 | 01/02/2016 | 0 | 1 | Older test, no follow on |
| 60 | Sam | Paris | English | IELTS | 95 | 01/08/2019 | 0 | 1 | same score in retest |
| 61 | Sam | Paris | English | IELTS | 95 | 02/08/2019 | 1 | 0 | |
| 62 | Sam | Paris | English | TOEFL | 80 | 04/01/2020 | 0 | 1 | More than 4 days, included |
+----------+-------+------------+------------+----------+-----------+------------+------------+------------------+-----------------------------------------------+
期望的输出
+----------+-------+------------+------------+----------+-----------+------------+------------+------------------+
| recordid | Name | testcentre | coursename | testtype | testscore | testdate | retestflag | Preferred_Output |
+----------+-------+------------+------------+----------+-----------+------------+------------+------------------+
| 3 | Sam | Paris | English | IELTS | 95 | 02/02/2019 | 1 | 1 |
| 21 | Sam | Paris | English | IELTS | 95 | 02/02/2018 | 1 | 1 |
| 5 | Jack | London | English | IELTS | 90 | 01/02/2019 | 0 | 1 |
| 7 | Louis | Brazil | English | IELTS | 70 | 01/02/2019 | 0 | 1 |
| 55 | Sam | Paris | English | IELTS | 90 | 01/02/2016 | 0 | 1 |
| 60 | Sam | Paris | English | IELTS | 95 | 01/08/2019 | 0 | 1 |
| 62 | Sam | Paris | English | TOEFL | 80 | 04/01/2020 | 0 | 1 |
+----------+-------+------------+------------+----------+-----------+------------+------------+------------------+
select * from (
select *, row_number() over (partition by name order by testscore desc) rn
from test
) t
where rn = 1
根据您的示例数据和描述,这似乎符合您的要求:
select t.*
from (select t.*,
row_number() over (partition by name, testcentre, coursename, testtype order by testscore desc) as seqnum,
count(*) over (partition by name, testcentre, coursename, testtype) as cnt
from test t
) t
where seqnum = 1 and cnt >= 2;
这不包括“4 天内”的条件,因为该条件没有明确解释。例如,如果有一系列 5 次测试,每次间隔 3 天,会发生什么情况?
我正在处理这个数据集 https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=447a5d2c33b04346e70dab0a8d098655
自定义逻辑: 按名称、考试中心、课程名称、考试类型分组。 如果进行了重新测试,则比较分数 - 如果较高者优先于最高分数或原始分数。
延迟: 根据以上选择的行,如果在 4 天内剩余记录集示例之间存在滞后 window,则应选择最高分记录。
如有任何建议,我们将不胜感激。
示例数据
+----------+-------+------------+------------+----------+-----------+------------+------------+------------------+-----------------------------------------------+
| recordid | Name | testcentre | coursename | testtype | testscore | testdate | retestflag | Preferred_Output | RejectReason |
+----------+-------+------------+------------+----------+-----------+------------+------------+------------------+-----------------------------------------------+
| 1 | Sam | Paris | English | IELTS | 90 | 01/02/2019 | 0 | 0 | |
| 3 | Sam | Paris | English | IELTS | 95 | 02/02/2019 | 1 | 1 | Better score in retest |
| 4 | Sam | Paris | English | TOEFL | 80 | 04/02/2019 | 0 | 0 | Within 4 days of previous test |
| 21 | Sam | Paris | English | IELTS | 95 | 02/02/2018 | 1 | 1 | marked as retest without base.needs inclusion |
| 5 | Jack | London | English | IELTS | 90 | 01/02/2019 | 0 | 1 | Same or bad score in retest |
| 8 | Jack | London | English | IELTS | 90 | 02/02/2019 | 1 | 0 | Same or bad score in retest |
| 7 | Louis | Brazil | English | IELTS | 70 | 01/02/2019 | 0 | 1 | Same score in retest |
| 11 | Louis | Brazil | English | IELTS | 70 | 02/02/2019 | 1 | 0 | Same score in retest |
| 13 | Louis | Brazil | English | TOEFL | 100 | 04/02/2019 | 0 | 0 | Within 4 days of previous test |
| 55 | Sam | Paris | English | IELTS | 90 | 01/02/2016 | 0 | 1 | Older test, no follow on |
| 60 | Sam | Paris | English | IELTS | 95 | 01/08/2019 | 0 | 1 | same score in retest |
| 61 | Sam | Paris | English | IELTS | 95 | 02/08/2019 | 1 | 0 | |
| 62 | Sam | Paris | English | TOEFL | 80 | 04/01/2020 | 0 | 1 | More than 4 days, included |
+----------+-------+------------+------------+----------+-----------+------------+------------+------------------+-----------------------------------------------+
期望的输出
+----------+-------+------------+------------+----------+-----------+------------+------------+------------------+
| recordid | Name | testcentre | coursename | testtype | testscore | testdate | retestflag | Preferred_Output |
+----------+-------+------------+------------+----------+-----------+------------+------------+------------------+
| 3 | Sam | Paris | English | IELTS | 95 | 02/02/2019 | 1 | 1 |
| 21 | Sam | Paris | English | IELTS | 95 | 02/02/2018 | 1 | 1 |
| 5 | Jack | London | English | IELTS | 90 | 01/02/2019 | 0 | 1 |
| 7 | Louis | Brazil | English | IELTS | 70 | 01/02/2019 | 0 | 1 |
| 55 | Sam | Paris | English | IELTS | 90 | 01/02/2016 | 0 | 1 |
| 60 | Sam | Paris | English | IELTS | 95 | 01/08/2019 | 0 | 1 |
| 62 | Sam | Paris | English | TOEFL | 80 | 04/01/2020 | 0 | 1 |
+----------+-------+------------+------------+----------+-----------+------------+------------+------------------+
select * from (
select *, row_number() over (partition by name order by testscore desc) rn
from test
) t
where rn = 1
根据您的示例数据和描述,这似乎符合您的要求:
select t.*
from (select t.*,
row_number() over (partition by name, testcentre, coursename, testtype order by testscore desc) as seqnum,
count(*) over (partition by name, testcentre, coursename, testtype) as cnt
from test t
) t
where seqnum = 1 and cnt >= 2;
这不包括“4 天内”的条件,因为该条件没有明确解释。例如,如果有一系列 5 次测试,每次间隔 3 天,会发生什么情况?