SQLite:使用旧记录合并相同 table 中的行以填充新记录中的空白

SQLite: merge rows in same table using older record to fill blanks in newer record

我有一个 SQLite 数据 table,它存储复合键为 的记录。 在不同的日期为每个 id 添加新记录,但新记录通常不会完成所有列。不完整的列可能为 null 或空格,例如

   | id  | rdate      | data1   | data2   |
   ----------------------------------------
0  | 1   | 01/01/2009 | foo     | boo     |
1  | 1   | 04/01/2010 | foo1    | bar1    |
2  | 1   | 08/01/2010 | fooX    | <null>  |
3  | 2   | 01/01/2010 | foo2    | bar2    |
4  | 2   | 04/01/2010 |         |         |
5  | 3   | 01/01/2010 | foo3    | bar3    |
   ----------------------------------------

我想定期更新具有相同 ID 的记录,以用前一条记录的数据填充最近一条记录(按 rdate)中的空白列。在上面的示例中,来自 row 1 的数据用于填充 row 2

中的空白列

所以 table,在 运行 查询之后,看起来像这样:

   | id | rdate      | data1   | data2   |
   ----------------------------------------
0  | 1   | 01/01/2009 | foo     | boo     |
1  | 1   | 04/01/2010 | foo1    | bar1    |
2  | 1   | 08/01/2010 | fooX    | bar1    |
3  | 2   | 01/01/2010 | foo2    | bar2    |
4  | 2   | 04/01/2010 | foo2    | bar2    |
5  | 3   | 01/01/2010 | foo3    | bar3    |
   ----------------------------------------

我试图构建一个查询来执行此操作,但我正在努力解决这个问题,或者即使它可以完成。

着眼于合并记录,但从重复数据删除的角度来看。我找不到任何我需要的东西。 COALESCE 看起来很有希望,但我一直无法弄清楚如何构建查询来使用它。

非常感谢您的帮助和建议。

您可以将 update 与两个相关子查询一起使用:

update t
    set data1 = coalesce(data1,
                         (select t2.data1
                          from t t2
                          where t2.id = t.id and
                                t2.rdate < t.rdate and
                                t2.data1 is not null
                         )
                        ),
        data2 = coalesce(data1,
                         (select t2.data2
                          from t t2
                          where t2.id = t.id and
                                t2.rdate < t.rdate and
                                t2.data2 is not null
                         )
                        )
    where data1 is null or data2 is null;

要处理所有空格,您必须在查询中修改很多小东西:

UPDATE tablename as t1
    SET data1 = (CASE WHEN TRIM(t1.data1) <> ''
                      THEN t1.data1
                      ELSE (SELECT t2.data1
                            FROM tablename t2 
                            WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND trim(t2.data1) <> '' 
                            ORDER BY t2.rdate DESC
                            LIMIT 1
                           )
                 END),
        data2 = (CASE WHEN TRIM(t1.data2) <> ''
                      THEN t1.data2
                      ELSE (SELECT t2.data2
                            FROM tablename t2 
                            WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND trim(t2.data2) <> ''
                            ORDER BY t2.rdate DESC
                            LIMIT 1
                           )
                 END)            
    WHERE data1 IS NULL OR data2 IS NULL or trim(data1) = '' or trim(data2) = '';

对于 data1data2 列中的每一列,使用 returns 该列中最后一个非空值的相关子查询:

UPDATE tablename AS t1
SET data1 = COALESCE(
              NULLIF(TRIM(t1.data1), ''),
              (SELECT t2.data1 FROM tablename t2 
               WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND NULLIF(TRIM(t2.data1), '') IS NOT NULL 
               ORDER BY t2.rdate DESC LIMIT 1)
            ),
    data2 = COALESCE(
              NULLIF(TRIM(t1.data2), ''),
              (SELECT t2.data2 FROM tablename t2 
               WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND NULLIF(TRIM(t2.data2), '') IS NOT NULL 
               ORDER BY t2.rdate DESC LIMIT 1)
            )            
WHERE NULLIF(TRIM(t1.data1), '') IS NULL OR NULLIF(TRIM(t1.data2), '') IS NULL

参见demo

但最好更新 table 以便每个空值都替换为 null:

UPDATE tablename
SET data1 = NULLIF(TRIM(data1), ''),
    data2 = NULLIF(TRIM(data2), '')
WHERE TRIM(data1) = '' OR TRIM(data2) = ''

然后代码可以简化为:

UPDATE tablename AS t1
SET data1 = COALESCE(
              t1.data1,
              (SELECT t2.data1 FROM tablename t2 
               WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND t2.data1 IS NOT NULL 
               ORDER BY t2.rdate DESC LIMIT 1)
            ),
    data2 = COALESCE(
              t1.data2,
              (SELECT t2.data2 FROM tablename t2 
               WHERE t2.id = t1.id AND t2.rdate < t1.rdate AND t2.data2 IS NOT NULL 
               ORDER BY t2.rdate DESC LIMIT 1)
            )            
WHERE data1 IS NULL OR data2 IS NULL

参见demo

结果:

id rdate data1 data2
1 2009-01-01 foo boo
1 2010-01-04 foo1 bar1
1 2010-01-08 fooX bar1
2 2010-01-01 foo2 bar2
2 2010-01-04 foo2 bar2
3 2010-01-01 foo3 bar3

请注意,样本数据中的日期不可比。
将它们更改为格式 'YYYY-MM-DD'.