Mysql:连接重复数据但忽略重复的字符串
Mysql: Concatenate Duplicate Data but ignore string in duplicates
有没有办法在忽略给定字符串的同时找到重复数据?
例如,如果我有 table 个名称,是否有办法连接名称都为 "Ann Smith" 但忽略字符串 "Dr. " 的行。例如,包含 "Ann Smith" 和 "Dr. Ann Smith" 的行应该连接成一个名为 "Dr. Ann Smith" 的行。如果名称匹配(减去 "dr." 字符串)并且两行的地址匹配,则连接 phone 数字。我想取两个名字中较大的一个,我认为这将涉及使用 MAX 语句。
目前我有一个 table 叫 t:
name | phone | address
ann smith | 1234567899 | 123 home address
dr. ann smith | 1234567890 | 123 home address
brian smith | 1235551234 | 789 city street
我想去:
name | phone | address
dr. ann smith | 1234567890, 1234567899 | 123 home address
brian smith | 1235551234 | 789 city street
要执行您想要的操作,您可能需要 CTE(通用 Table 表达式)和 LATERAL
查询。不幸的是 MySQL 5.x 没有实现它们中的任何一个。
以下查询查找重复名称:
select plain_name, count(*)
from (
select name, trim(replace(lower(name), lower('Dr.'), '')) as plain_name
from my_table
) x
group by plain_name
having count(*) > 1
这是朝着正确方向迈出的一步,但您需要进一步处理才能获得所需的结果。
如果您升级到 MySQL 8,您将获得 CTE,但仍然不会获得 LATERAL 查询。
编辑: 我进一步确定了重复的名称。没有 CTE,这个查询看起来越来越丑陋:
select z.*, y.times
from (
select name, trim(replace(lower(name), lower('Dr.'), '')) as plain_name
from my_table
) z,
(
select plain_name, count(*) as times
from (
select name, trim(replace(lower(name), lower('Dr.'), '')) as plain_name
from my_table
) x
group by plain_name
having count(*) > 1
) y
where z.plain_name = y.plain_name;
假设这些是完全嵌套的,您可以通过以下方式获得 "long form":
select name,
(select t2.name
from t t2
where t2.name like concat('%', t.name, '%')
order by length(t2.name) desc
limit 1
) as long_form
from t;
然后您可以在聚合中使用它。我会使用子查询:
select long_form, group_concat(distinct phone) as phones,
group_concat(distinct address) as addresses
from (select t.*,
(select t2.name
from t t2
where t2.name like concat('%', t.name, '%')
order by length(t2.name) desc
limit 1
) as long_form
from t
) tt
group by long_from;
我最终使用了上述答案的组合。首先,我创建了一个临时 table 来修剪 'Dr. ' 字符串并将其替换为空字符串。
create temporary table if not exists temp_names AS (
select *,
case when name like lower('dr. %') then trim(replace(lower(name), lower('dr. %'), ''))
else name end as plain_name from t);
然后我使用 select 和分组依据将 table 中的值与相同的 plain_name 值连接起来。
select max(name) as name, group_concat(distinct phone_number) as phone_number, address from temp_names
group by plain_name, address having count(*) >=1;
这给出了 table 所需的结果:
name | phone_number | address
dr. ann smith | 1234567890, 1234567899 | 123 home address
brian smith | 1235551234 | 789 city street
有没有办法在忽略给定字符串的同时找到重复数据?
例如,如果我有 table 个名称,是否有办法连接名称都为 "Ann Smith" 但忽略字符串 "Dr. " 的行。例如,包含 "Ann Smith" 和 "Dr. Ann Smith" 的行应该连接成一个名为 "Dr. Ann Smith" 的行。如果名称匹配(减去 "dr." 字符串)并且两行的地址匹配,则连接 phone 数字。我想取两个名字中较大的一个,我认为这将涉及使用 MAX 语句。
目前我有一个 table 叫 t:
name | phone | address
ann smith | 1234567899 | 123 home address
dr. ann smith | 1234567890 | 123 home address
brian smith | 1235551234 | 789 city street
我想去:
name | phone | address
dr. ann smith | 1234567890, 1234567899 | 123 home address
brian smith | 1235551234 | 789 city street
要执行您想要的操作,您可能需要 CTE(通用 Table 表达式)和 LATERAL
查询。不幸的是 MySQL 5.x 没有实现它们中的任何一个。
以下查询查找重复名称:
select plain_name, count(*)
from (
select name, trim(replace(lower(name), lower('Dr.'), '')) as plain_name
from my_table
) x
group by plain_name
having count(*) > 1
这是朝着正确方向迈出的一步,但您需要进一步处理才能获得所需的结果。
如果您升级到 MySQL 8,您将获得 CTE,但仍然不会获得 LATERAL 查询。
编辑: 我进一步确定了重复的名称。没有 CTE,这个查询看起来越来越丑陋:
select z.*, y.times
from (
select name, trim(replace(lower(name), lower('Dr.'), '')) as plain_name
from my_table
) z,
(
select plain_name, count(*) as times
from (
select name, trim(replace(lower(name), lower('Dr.'), '')) as plain_name
from my_table
) x
group by plain_name
having count(*) > 1
) y
where z.plain_name = y.plain_name;
假设这些是完全嵌套的,您可以通过以下方式获得 "long form":
select name,
(select t2.name
from t t2
where t2.name like concat('%', t.name, '%')
order by length(t2.name) desc
limit 1
) as long_form
from t;
然后您可以在聚合中使用它。我会使用子查询:
select long_form, group_concat(distinct phone) as phones,
group_concat(distinct address) as addresses
from (select t.*,
(select t2.name
from t t2
where t2.name like concat('%', t.name, '%')
order by length(t2.name) desc
limit 1
) as long_form
from t
) tt
group by long_from;
我最终使用了上述答案的组合。首先,我创建了一个临时 table 来修剪 'Dr. ' 字符串并将其替换为空字符串。
create temporary table if not exists temp_names AS (
select *,
case when name like lower('dr. %') then trim(replace(lower(name), lower('dr. %'), ''))
else name end as plain_name from t);
然后我使用 select 和分组依据将 table 中的值与相同的 plain_name 值连接起来。
select max(name) as name, group_concat(distinct phone_number) as phone_number, address from temp_names
group by plain_name, address having count(*) >=1;
这给出了 table 所需的结果:
name | phone_number | address
dr. ann smith | 1234567890, 1234567899 | 123 home address
brian smith | 1235551234 | 789 city street