根据其他列的条件推算重复记录 ID sql
impute duplicate record id from conditions on other columns sql
假设我有这个数据集:
serial_id | name | address_id | id_duplicates | dob
_______________________________________________________
1 | JOHN | QWERTY | NULL | 10/2001
2 | JOHN | QWERTY | NULL | 10/2001
3 | JOHN | AZERTY | NULL | 10/2001
4 | JOHN | QWERTY | NULL | 09/2001
5 | MARY | QWERTY | NULL | 10/2001
6 | MARY | AZERTY | NULL | 10/2001
7 | MARY | AZERTY | NULL | 10/2001
当记录在某些条件下匹配时,我想用 serial_id
的 any 填充 id_duplicates
。
如果我希望具有相同匹配项 name
、address_id
和 dob
的记录共享 serial_id
列中的单个 ID,那么我将具有以下示例:
serial_id | name | address_id | id_duplicates | dob
_______________________________________________________
1 | JOHN | QWERTY | 1 | 10/2001 --> match
2 | JOHN | QWERTY | 1 | 10/2001 --> match
3 | JOHN | AZERTY | 3 | 10/2001 --> no match on address_id
4 | JOHN | QWERTY | 4 | 09/2001 --> no match on dob
5 | MARY | QWERTY | 5 | 10/2001 --> no match on name
6 | MARY | AZERTY | 6 | 10/2001 --> match
7 | MARY | AZERTY | 6 | 10/2001 --> match
我一直在痛苦地尝试使用嵌套查询来做到这一点,我感到很尴尬 post 因为它们毫无意义...
如有任何帮助,我们将不胜感激!
您可以使用 dense_rank()
:
select t.*,
dense_rank() over (order by name, address, dob) as id_duplicate
from t;
如果你想在 update
中使用它,这里有一个方法:
update t
set id_duplicate = tt.new_id_duplicate
from (select t.*,
dense_rank() over (order by name, address, dob) as new_id_duplicate
from t
) tt
where tt.serial_id = t.serial_id;
假设我有这个数据集:
serial_id | name | address_id | id_duplicates | dob
_______________________________________________________
1 | JOHN | QWERTY | NULL | 10/2001
2 | JOHN | QWERTY | NULL | 10/2001
3 | JOHN | AZERTY | NULL | 10/2001
4 | JOHN | QWERTY | NULL | 09/2001
5 | MARY | QWERTY | NULL | 10/2001
6 | MARY | AZERTY | NULL | 10/2001
7 | MARY | AZERTY | NULL | 10/2001
当记录在某些条件下匹配时,我想用 serial_id
的 any 填充 id_duplicates
。
如果我希望具有相同匹配项 name
、address_id
和 dob
的记录共享 serial_id
列中的单个 ID,那么我将具有以下示例:
serial_id | name | address_id | id_duplicates | dob
_______________________________________________________
1 | JOHN | QWERTY | 1 | 10/2001 --> match
2 | JOHN | QWERTY | 1 | 10/2001 --> match
3 | JOHN | AZERTY | 3 | 10/2001 --> no match on address_id
4 | JOHN | QWERTY | 4 | 09/2001 --> no match on dob
5 | MARY | QWERTY | 5 | 10/2001 --> no match on name
6 | MARY | AZERTY | 6 | 10/2001 --> match
7 | MARY | AZERTY | 6 | 10/2001 --> match
我一直在痛苦地尝试使用嵌套查询来做到这一点,我感到很尴尬 post 因为它们毫无意义...
如有任何帮助,我们将不胜感激!
您可以使用 dense_rank()
:
select t.*,
dense_rank() over (order by name, address, dob) as id_duplicate
from t;
如果你想在 update
中使用它,这里有一个方法:
update t
set id_duplicate = tt.new_id_duplicate
from (select t.*,
dense_rank() over (order by name, address, dob) as new_id_duplicate
from t
) tt
where tt.serial_id = t.serial_id;