按单个字段合并数据集 (mysql)
Consolidating a dataset by single field (mysql)
我有 table 笔交易 ('transactions_2020'),其中包括电子邮件地址、交易详情、日期等。这些交易包括地址和其他 PII 信息。
每个电子邮件地址进行多次交易在 table 中很常见。我想创建一个 table 个唯一电子邮件地址 ('individuals') 并保留所有相关的 PII 信息。对于每个电子邮件地址有多个交易的情况
我想保留与最近交易关联的列的值,但前提是这些字段不为空。在我的 'individuals' table 中生成一个包含 best/most 最近信息的合并行,即使该信息来自不同的交易。下面的简单示例(空白为空):
交易table
email_address trans_date address1 address2 birthdate
email1@none.com 2020-10-01 2000-01-01
email1@none.com 2020-09-01 Box 123
email1@none.com 2020-08-01 123 Main
email2@none.com 2020-12-01 456 Elm 2000-03-01
email2@none.com 2020-07-01 123 Elm 2000-02-01
email3@none.com 2020-11-01 123 Maple 2000-05-01
email3@none.com 2020-09-01 123 Maple Box 123
个人table
email_address address1 address2 birthdate
email1@none.com 123 Main Box 123 2000-01-01
email2@none.com 456 Elm 2000-03-01
email3@none.com 123 Maple Box 123 2000-05-01
您需要两个地址列的最新非 null
值。这是使用 window 函数的方法:
select email_address,
max(case when trans_date = trans_date_address1 then address1 end) as address1,
max(case when trans_date = trans_date_address2 then address2 end) as address2,
max(birthdate) as birthdate
from (
select t.*,
max(case when address1 is not null then trans_date end) over(partition by email_address) as trans_date_address1,
max(case when address1 is not null then trans_date end) over(partition by email_address) as trans_date_address2
from mytable t
) t
group by email_address
子查询 returns 每个地址不是 null
的最新日期。然后我们可以使用该信息在外部查询中进行聚合。
这需要 MySQL 8.0。在早期版本中,我会进行几个子查询:
select email_address,
(
select t1.address1
from mytable t1
where t1.email_address = t.email_address and t1.address1 is not null
order by trans_date desc limit 1
) as address1,
(
select t1.address2
from mytable t1
where t1.email_address = t.email_address and t1.address2 is not null
order by trans_date desc limit 1
) as address2,
max(birthdate) as birthdate
from mytable t
group by email_address
我有 table 笔交易 ('transactions_2020'),其中包括电子邮件地址、交易详情、日期等。这些交易包括地址和其他 PII 信息。
每个电子邮件地址进行多次交易在 table 中很常见。我想创建一个 table 个唯一电子邮件地址 ('individuals') 并保留所有相关的 PII 信息。对于每个电子邮件地址有多个交易的情况
我想保留与最近交易关联的列的值,但前提是这些字段不为空。在我的 'individuals' table 中生成一个包含 best/most 最近信息的合并行,即使该信息来自不同的交易。下面的简单示例(空白为空):
交易table
email_address trans_date address1 address2 birthdate
email1@none.com 2020-10-01 2000-01-01
email1@none.com 2020-09-01 Box 123
email1@none.com 2020-08-01 123 Main
email2@none.com 2020-12-01 456 Elm 2000-03-01
email2@none.com 2020-07-01 123 Elm 2000-02-01
email3@none.com 2020-11-01 123 Maple 2000-05-01
email3@none.com 2020-09-01 123 Maple Box 123
个人table
email_address address1 address2 birthdate
email1@none.com 123 Main Box 123 2000-01-01
email2@none.com 456 Elm 2000-03-01
email3@none.com 123 Maple Box 123 2000-05-01
您需要两个地址列的最新非 null
值。这是使用 window 函数的方法:
select email_address,
max(case when trans_date = trans_date_address1 then address1 end) as address1,
max(case when trans_date = trans_date_address2 then address2 end) as address2,
max(birthdate) as birthdate
from (
select t.*,
max(case when address1 is not null then trans_date end) over(partition by email_address) as trans_date_address1,
max(case when address1 is not null then trans_date end) over(partition by email_address) as trans_date_address2
from mytable t
) t
group by email_address
子查询 returns 每个地址不是 null
的最新日期。然后我们可以使用该信息在外部查询中进行聚合。
这需要 MySQL 8.0。在早期版本中,我会进行几个子查询:
select email_address,
(
select t1.address1
from mytable t1
where t1.email_address = t.email_address and t1.address1 is not null
order by trans_date desc limit 1
) as address1,
(
select t1.address2
from mytable t1
where t1.email_address = t.email_address and t1.address2 is not null
order by trans_date desc limit 1
) as address2,
max(birthdate) as birthdate
from mytable t
group by email_address