如何在 SQL 中过滤出 partionned ordered table 中组的第一个记录为空的记录?
How to filter out in SQL the records in a partionned ordered table where first records of group are null?
数据
ROW YEAR PROD KEY DATE
1 2011 APPLE TIME 2011-11-18 00:00:00.000
2 2011 APPLE TIME 2011-11-19 00:00:00.000
3 2013 APPLE NULL 2011-11-18 00:00:00.000
4 2013 APPLE NULL 2011-11-19 00:00:00.000
5 2013 APPLE TIME 2014-04-08 00:00:00.000
6 2013 APPLE DIM 2014-04-09 00:00:00.000
7 2013 APPLE TIME 2014-11-10 10:50:14.113
8 2013 APPLE TIME 2014-11-12 10:46:04.947
9 2013 MELON JAK 2011-10-17 11:01:19.657
10 2013 MELON TIME 2014-11-18 11:19:35.547
11 2013 MELON NULL 2014-11-19 11:19:35.547
12 2013 MELON TIME 2014-11-21 10:32:36.017
13 2014 APPLE JAK 2003-04-10 00:00:00.000
14 2014 APPLE DIM 2003-04-11 00:00:00.000
15 2015 APPLE TIME 2002-09-27 00:00:00.000
16 2015 APPLE NULL 2004-09-28 00:00:00.000
ROW 不是 table 中的列。只是为了显示我想要的记录。
问题
以上数据按(YEAR,PROD)分区,按 DATE 排序。
我需要根据以下逻辑保留除第 3 行和第 4 行之外的所有行:
- 如果组的第一行(此处为 (YEAR, PROD))为 NULL,则丢弃它们。
- 11 和 16 为空,但我们保留它们,因为它们不是其组中的第一个。
每个组都必须从 KEY 不为空的记录开始
==> 否则丢弃
换句话说,我可以有:not null, null, not null, null
但我不能有 : null, not null, null, not null
预期结果
ROW YEAR PROD KEY DATE
1 2011 APPLE TIME 2011-11-18 00:00:00.000
2 2011 APPLE TIME 2011-11-19 00:00:00.000
5 2013 APPLE TIME 2014-04-08 00:00:00.000
6 2013 APPLE DIM 2014-04-09 00:00:00.000
7 2013 APPLE TIME 2014-11-10 10:50:14.113
8 2013 APPLE TIME 2014-11-12 10:46:04.947
9 2013 MELON JAK 2011-10-17 11:01:19.657
10 2013 MELON TIME 2014-11-18 11:19:35.547
11 2013 MELON TIME 2014-11-19 11:19:35.547
12 2013 MELON TIME 2014-11-21 10:32:36.017
13 2014 APPLE JAK 2003-04-10 00:00:00.000
14 2014 APPLE DIM 2003-04-11 00:00:00.000
15 2015 APPLE TIME 2002-09-27 00:00:00.000
16 2015 APPLE TIME 2004-09-28 00:00:00.000
我想这样做,所以后来我在每个组的开头总是有一个非空键。
这样,我以后总是可以使用前一行来填充具有空值的后续记录(在本例中为 11 和 16)
如有任何意见或建议,我们将不胜感激!
可能有更好的解决方案,但本质上(如果 KEY、DATE 等不是您产品中的保留字,您可以删除方括号 - 我使用的是 TSQL):
select *
from Tbl T1
where
/* Do not include if... */
NOT (
t1.[KEY] is null
/* This is part of the first KEY=NULL rows for this group
(no preceding record with KEY<>NULL) */
and not exists
(select 1
from Tbl T3
where T3.[YEAR]=T1.[YEAR]
and T3.PROD=T1.PROD
and T3.[DATE] < T1.[DATE]
and T3.[KEY] is not null
)
/* There are KEY<>NULL values further down */
and exists
(select 1
from Tbl T2
where T2.[YEAR]=T1.[YEAR]
and T2.PROD=T1.PROD
and T2.[DATE] > T1.[DATE]
and T2.[KEY] is not null
)
)
这种查询可以帮助:
select YEAR, PROD, KEY, DATE
from (
select YEAR, PROD, KEY, DATE,
MIN(CASE WHEN KEY IS NULL THEN DATE ELSE NULL END)
OVER(PARTITION BY YEAR, PROD) AS MIN_NULL_KEY_DATE,
ROW_NUMBER() OVER(PARTITION BY YEAR, PROD ORDER BY DATE ASC) RN
from your_table yt
)rpr
where 1 = 1
and CASE WHEN RN = 1 AND DATE = MIN_NULL_KEY_DATE THEN 0 ELSE 1 END = 1
所以我在这里尝试实现什么:当键列为空时,我们只是根据年份和产品列找到了最小日期。并检查该行是否是该组的第一行。如果 rn = 1 并且日期等于键为空时的最小日期值,则忽略它们以防万一。
下面得到你想要的输出。我正在检查无界行和当前行之间的键列的值,并且由于 NULL 具有最高等级,如果前面的行不为空,它将使用 NOT NULL 列填充字段 min_val。
select * from (
select year,prod,key1,date1
,min(key1) over(partition by year,prod order by date1 asc) as min_val
from t
)x
where x.min_val is not null
+------+-------+------+-------------------------+---------+
| year | prod | key1 | date1 | min_val |
+------+-------+------+-------------------------+---------+
| 2011 | APPLE | TIME | 2011-11-18 00:00:00.000 | TIME |
| 2011 | APPLE | TIME | 2011-11-19 00:00:00.000 | TIME |
| 2013 | APPLE | TIME | 2014-04-08 00:00:00.000 | TIME |
| 2013 | APPLE | DIM | 2014-04-09 00:00:00.000 | DIM |
| 2013 | APPLE | TIME | 2014-11-10 10:50:14.113 | DIM |
| 2013 | APPLE | TIME | 2014-11-12 10:46:04.947 | DIM |
| 2013 | MELON | JAK | 2011-10-17 11:01:19.657 | JAK |
| 2013 | MELON | TIME | 2014-11-18 11:19:35.547 | JAK |
| 2013 | MELON | | 2014-11-19 11:19:35.547 | JAK |
| 2013 | MELON | TIME | 2014-11-21 10:32:36.017 | JAK |
| 2014 | APPLE | JAK | 2003-04-10 00:00:00.000 | JAK |
| 2014 | APPLE | DIM | 2003-04-11 00:00:00.000 | DIM |
| 2015 | APPLE | TIME | 2002-09-27 00:00:00.000 | TIME |
| 2015 | APPLE | | 2004-09-28 00:00:00.000 | TIME |
+------+-------+------+-------------------------+---------+
link
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=ae82f64802674aa60005b8e9f534a150
数据
ROW YEAR PROD KEY DATE
1 2011 APPLE TIME 2011-11-18 00:00:00.000
2 2011 APPLE TIME 2011-11-19 00:00:00.000
3 2013 APPLE NULL 2011-11-18 00:00:00.000
4 2013 APPLE NULL 2011-11-19 00:00:00.000
5 2013 APPLE TIME 2014-04-08 00:00:00.000
6 2013 APPLE DIM 2014-04-09 00:00:00.000
7 2013 APPLE TIME 2014-11-10 10:50:14.113
8 2013 APPLE TIME 2014-11-12 10:46:04.947
9 2013 MELON JAK 2011-10-17 11:01:19.657
10 2013 MELON TIME 2014-11-18 11:19:35.547
11 2013 MELON NULL 2014-11-19 11:19:35.547
12 2013 MELON TIME 2014-11-21 10:32:36.017
13 2014 APPLE JAK 2003-04-10 00:00:00.000
14 2014 APPLE DIM 2003-04-11 00:00:00.000
15 2015 APPLE TIME 2002-09-27 00:00:00.000
16 2015 APPLE NULL 2004-09-28 00:00:00.000
ROW 不是 table 中的列。只是为了显示我想要的记录。
问题
以上数据按(YEAR,PROD)分区,按 DATE 排序。
我需要根据以下逻辑保留除第 3 行和第 4 行之外的所有行:
- 如果组的第一行(此处为 (YEAR, PROD))为 NULL,则丢弃它们。
- 11 和 16 为空,但我们保留它们,因为它们不是其组中的第一个。
每个组都必须从 KEY 不为空的记录开始
==> 否则丢弃
换句话说,我可以有:not null, null, not null, null
但我不能有 : null, not null, null, not null
预期结果
ROW YEAR PROD KEY DATE
1 2011 APPLE TIME 2011-11-18 00:00:00.000
2 2011 APPLE TIME 2011-11-19 00:00:00.000
5 2013 APPLE TIME 2014-04-08 00:00:00.000
6 2013 APPLE DIM 2014-04-09 00:00:00.000
7 2013 APPLE TIME 2014-11-10 10:50:14.113
8 2013 APPLE TIME 2014-11-12 10:46:04.947
9 2013 MELON JAK 2011-10-17 11:01:19.657
10 2013 MELON TIME 2014-11-18 11:19:35.547
11 2013 MELON TIME 2014-11-19 11:19:35.547
12 2013 MELON TIME 2014-11-21 10:32:36.017
13 2014 APPLE JAK 2003-04-10 00:00:00.000
14 2014 APPLE DIM 2003-04-11 00:00:00.000
15 2015 APPLE TIME 2002-09-27 00:00:00.000
16 2015 APPLE TIME 2004-09-28 00:00:00.000
我想这样做,所以后来我在每个组的开头总是有一个非空键。 这样,我以后总是可以使用前一行来填充具有空值的后续记录(在本例中为 11 和 16)
如有任何意见或建议,我们将不胜感激!
可能有更好的解决方案,但本质上(如果 KEY、DATE 等不是您产品中的保留字,您可以删除方括号 - 我使用的是 TSQL):
select *
from Tbl T1
where
/* Do not include if... */
NOT (
t1.[KEY] is null
/* This is part of the first KEY=NULL rows for this group
(no preceding record with KEY<>NULL) */
and not exists
(select 1
from Tbl T3
where T3.[YEAR]=T1.[YEAR]
and T3.PROD=T1.PROD
and T3.[DATE] < T1.[DATE]
and T3.[KEY] is not null
)
/* There are KEY<>NULL values further down */
and exists
(select 1
from Tbl T2
where T2.[YEAR]=T1.[YEAR]
and T2.PROD=T1.PROD
and T2.[DATE] > T1.[DATE]
and T2.[KEY] is not null
)
)
这种查询可以帮助:
select YEAR, PROD, KEY, DATE
from (
select YEAR, PROD, KEY, DATE,
MIN(CASE WHEN KEY IS NULL THEN DATE ELSE NULL END)
OVER(PARTITION BY YEAR, PROD) AS MIN_NULL_KEY_DATE,
ROW_NUMBER() OVER(PARTITION BY YEAR, PROD ORDER BY DATE ASC) RN
from your_table yt
)rpr
where 1 = 1
and CASE WHEN RN = 1 AND DATE = MIN_NULL_KEY_DATE THEN 0 ELSE 1 END = 1
所以我在这里尝试实现什么:当键列为空时,我们只是根据年份和产品列找到了最小日期。并检查该行是否是该组的第一行。如果 rn = 1 并且日期等于键为空时的最小日期值,则忽略它们以防万一。
下面得到你想要的输出。我正在检查无界行和当前行之间的键列的值,并且由于 NULL 具有最高等级,如果前面的行不为空,它将使用 NOT NULL 列填充字段 min_val。
select * from (
select year,prod,key1,date1
,min(key1) over(partition by year,prod order by date1 asc) as min_val
from t
)x
where x.min_val is not null
+------+-------+------+-------------------------+---------+
| year | prod | key1 | date1 | min_val |
+------+-------+------+-------------------------+---------+
| 2011 | APPLE | TIME | 2011-11-18 00:00:00.000 | TIME |
| 2011 | APPLE | TIME | 2011-11-19 00:00:00.000 | TIME |
| 2013 | APPLE | TIME | 2014-04-08 00:00:00.000 | TIME |
| 2013 | APPLE | DIM | 2014-04-09 00:00:00.000 | DIM |
| 2013 | APPLE | TIME | 2014-11-10 10:50:14.113 | DIM |
| 2013 | APPLE | TIME | 2014-11-12 10:46:04.947 | DIM |
| 2013 | MELON | JAK | 2011-10-17 11:01:19.657 | JAK |
| 2013 | MELON | TIME | 2014-11-18 11:19:35.547 | JAK |
| 2013 | MELON | | 2014-11-19 11:19:35.547 | JAK |
| 2013 | MELON | TIME | 2014-11-21 10:32:36.017 | JAK |
| 2014 | APPLE | JAK | 2003-04-10 00:00:00.000 | JAK |
| 2014 | APPLE | DIM | 2003-04-11 00:00:00.000 | DIM |
| 2015 | APPLE | TIME | 2002-09-27 00:00:00.000 | TIME |
| 2015 | APPLE | | 2004-09-28 00:00:00.000 | TIME |
+------+-------+------+-------------------------+---------+
link https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=ae82f64802674aa60005b8e9f534a150