MySql: Select 不同顺序的单词不同
MySql: Select Distinct for words in different order
我在创建查询时遇到问题,它没有从我的 table 中获取重复值。不幸的是,Full Name 列的 Name 和 Surname 的顺序不同。
例如:
+----+----------------------+
| ID | Full Name |
+----+----------------------+
| 1 | Marshall Wilson |
| 2 | Wilson Marshall |
| 3 | Lori Hill |
| 4 | Hill Lori |
| 5 | Casey Dean Davidson |
| 6 | Davidson Casey Dean |
+----+----------------------+
我想得到那个结果:
+----+-----------------------+
| ID | Full Name |
+----+-----------------------+
| 1 | Marshall Wilson |
| 3 | Lori Hill |
| 5 | Casey Dean Davidson |
+----+-----------------------+
我的目标是创建以类似方式获取的查询,例如:select distinct for Name and Surname in the same order.
有什么想法吗?
它需要很多String operations, and usage of multiple Derived Tables。它可能效率不高。
我们先tokenize把FullName
变成多个词,它就是从里面造出来的。为此,我们使用数字生成器 table gen
。在这种情况下,我假设子字符串的最大数量为 3。您可以通过添加更多选择轻松地进一步扩展它,例如 SELECT 4 UNION ALL ..
等等。
我们使用 Substring_Index()
with Replace()
function to get a substring out, using a single space character (' '
) as Delimiter. Trim()
用于删除剩余的任何 leading/trailing 个空格。
现在,诀窍是将此结果集用作派生 table,并对单词执行 Group_Concat()
,使它们按升序排序。这样,即使是重复的名称(但子串的顺序不同),也会得到相似的 words_sorted
值。最终,我们只需要在 words_sorted
上 Group By
来清除重复项。
查询#1
SELECT
MIN(dt2.ID) AS ID,
MIN(dt2.FullName) AS FullName
FROM
(
SELECT
dt1.ID,
dt1.FullName,
GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted
FROM
(
SELECT e.ID,
e.FullName,
TRIM(REPLACE(
SUBSTRING_INDEX(e.FullName, ' ', gen.idx),
SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
'')) AS word
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;
| ID | FullName |
| --- | ------------------- |
| 1 | Marshall Wilson |
| 3 | Hill Lori |
| 5 | Casey Dean Davidson |
我在创建查询时遇到问题,它没有从我的 table 中获取重复值。不幸的是,Full Name 列的 Name 和 Surname 的顺序不同。
例如:
+----+----------------------+
| ID | Full Name |
+----+----------------------+
| 1 | Marshall Wilson |
| 2 | Wilson Marshall |
| 3 | Lori Hill |
| 4 | Hill Lori |
| 5 | Casey Dean Davidson |
| 6 | Davidson Casey Dean |
+----+----------------------+
我想得到那个结果:
+----+-----------------------+
| ID | Full Name |
+----+-----------------------+
| 1 | Marshall Wilson |
| 3 | Lori Hill |
| 5 | Casey Dean Davidson |
+----+-----------------------+
我的目标是创建以类似方式获取的查询,例如:select distinct for Name and Surname in the same order.
有什么想法吗?
它需要很多String operations, and usage of multiple Derived Tables。它可能效率不高。
我们先tokenize把FullName
变成多个词,它就是从里面造出来的。为此,我们使用数字生成器 table gen
。在这种情况下,我假设子字符串的最大数量为 3。您可以通过添加更多选择轻松地进一步扩展它,例如 SELECT 4 UNION ALL ..
等等。
我们使用 Substring_Index()
with Replace()
function to get a substring out, using a single space character (' '
) as Delimiter. Trim()
用于删除剩余的任何 leading/trailing 个空格。
现在,诀窍是将此结果集用作派生 table,并对单词执行 Group_Concat()
,使它们按升序排序。这样,即使是重复的名称(但子串的顺序不同),也会得到相似的 words_sorted
值。最终,我们只需要在 words_sorted
上 Group By
来清除重复项。
查询#1
SELECT
MIN(dt2.ID) AS ID,
MIN(dt2.FullName) AS FullName
FROM
(
SELECT
dt1.ID,
dt1.FullName,
GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted
FROM
(
SELECT e.ID,
e.FullName,
TRIM(REPLACE(
SUBSTRING_INDEX(e.FullName, ' ', gen.idx),
SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
'')) AS word
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;
| ID | FullName |
| --- | ------------------- |
| 1 | Marshall Wilson |
| 3 | Hill Lori |
| 5 | Casey Dean Davidson |