按单个字段合并数据集 (mysql)

Consolidating a dataset by single field (mysql)

我有 table 笔交易 ('transactions_2020'),其中包括电子邮件地址、交易详情、日期等。这些交易包括地址和其他 PII 信息。

每个电子邮件地址进行多次交易在 table 中很常见。我想创建一个 table 个唯一电子邮件地址 ('individuals') 并保留所有相关的 PII 信息。对于每个电子邮件地址有多个交易的情况

我想保留与最近交易关联的列的值,但前提是这些字段不为空。在我的 'individuals' table 中生成一个包含 best/most 最近信息的合并行,即使该信息来自不同的交易。下面的简单示例(空白为空):

交易table

email_address   trans_date  address1    address2    birthdate
email1@none.com 2020-10-01                          2000-01-01
email1@none.com 2020-09-01              Box 123 
email1@none.com 2020-08-01  123 Main        
email2@none.com 2020-12-01  456 Elm                 2000-03-01
email2@none.com 2020-07-01  123 Elm                 2000-02-01
email3@none.com 2020-11-01  123 Maple               2000-05-01
email3@none.com 2020-09-01  123 Maple   Box 123 
            

个人table

email_address   address1    address2    birthdate   
email1@none.com 123 Main    Box 123     2000-01-01  
email2@none.com 456 Elm                 2000-03-01  
email3@none.com 123 Maple   Box 123     2000-05-01  

您需要两个地址列的最新非 null 值。这是使用 window 函数的方法:

select email_address,
    max(case when trans_date = trans_date_address1 then address1 end) as address1,
    max(case when trans_date = trans_date_address2 then address2 end) as address2,
    max(birthdate) as birthdate
from (
    select t.*,
        max(case when address1 is not null then trans_date end) over(partition by email_address) as trans_date_address1,
        max(case when address1 is not null then trans_date end) over(partition by email_address) as trans_date_address2
    from mytable t
) t
group by email_address

子查询 returns 每个地址不是 null 的最新日期。然后我们可以使用该信息在外部查询中进行聚合。

这需要 MySQL 8.0。在早期版本中,我会进行几个子查询:

select email_address,
    (
        select t1.address1 
        from mytable t1
        where t1.email_address = t.email_address and t1.address1 is not null 
        order by trans_date desc limit 1
    ) as address1,
    (
        select t1.address2
        from mytable t1
        where t1.email_address = t.email_address and t1.address2 is not null 
        order by trans_date desc limit 1
    ) as address2,
    max(birthdate) as birthdate
from mytable t
group by email_address