MySql: Select 不同顺序的单词不同

MySql: Select Distinct for words in different order

我在创建查询时遇到问题,它没有从我的 table 中获取重复值。不幸的是,Full Name 列的 Name 和 Surname 的顺序不同。

例如:

+----+----------------------+
| ID | Full Name            |
+----+----------------------+
| 1  | Marshall Wilson      |
| 2  | Wilson Marshall      |
| 3  | Lori Hill            |
| 4  | Hill Lori            |
| 5  | Casey Dean Davidson  |
| 6  | Davidson Casey Dean  |
+----+----------------------+

我想得到那个结果:

+----+-----------------------+
| ID | Full Name             |
+----+-----------------------+
| 1  | Marshall Wilson       |
| 3  | Lori Hill             |
| 5  | Casey Dean Davidson   |
+----+-----------------------+

我的目标是创建以类似方式获取的查询,例如:select distinct for Name and Surname in the same order.

有什么想法吗?

它需要很多String operations, and usage of multiple Derived Tables。它可能效率不高

我们先tokenizeFullName变成多个词,它就是从里面造出来的。为此,我们使用数字生成器 table gen。在这种情况下,我假设子字符串的最大数量为 3。您可以通过添加更多选择轻松地进一步扩展它,例如 SELECT 4 UNION ALL .. 等等。

我们使用 Substring_Index() with Replace() function to get a substring out, using a single space character (' ') as Delimiter. Trim() 用于删除剩余的任何 leading/trailing 个空格。

现在,诀窍是将此结果集用作派生 table,并对单词执行 Group_Concat(),使它们按升序排序。这样,即使是重复的名称(但子串的顺序不同),也会得到相似的 words_sorted 值。最终,我们只需要在 words_sortedGroup By 来清除重复项。


查询#1

SELECT 
  MIN(dt2.ID) AS ID, 
  MIN(dt2.FullName) AS FullName 
FROM 
(
SELECT 
  dt1.ID, 
  dt1.FullName, 
  GROUP_CONCAT(IF(word = '', NULL, word) ORDER BY word ASC) words_sorted 
FROM 
(
SELECT e.ID, 
       e.FullName, 
       TRIM(REPLACE(
         SUBSTRING_INDEX(e.FullName, ' ', gen.idx), 
         SUBSTRING_INDEX(e.FullName, ' ', gen.idx-1),
         '')) AS word 
FROM employees AS e
CROSS JOIN (SELECT 1 AS idx UNION ALL 
            SELECT 2 UNION ALL 
            SELECT 3) AS gen -- You can add more numbers if more than 3 substrings
) AS dt1 
GROUP BY dt1.ID, dt1.FullName
) AS dt2
GROUP BY dt2.words_sorted
ORDER BY ID;

| ID  | FullName            |
| --- | ------------------- |
| 1   | Marshall Wilson     |
| 3   | Hill Lori           |
| 5   | Casey Dean Davidson |

View on DB Fiddle