查找列中每个单词的 SOUNDEX

Find SOUNDEX of each word in the column

我有以下数据:

create table testing
(
name varchar(100)
);

insert into testing values('Mr.Alex James Henrry');
insert into testing values('Mr John Desto');
insert into testing values('Ms.Lisa Jack Jerry Han');
insert into testing values('Smith White');
insert into testing values('Rowny James Duest');

注意:我想查找 soundex 名称字段中除英语敬语(Mr、Ms 等)之外的每个单词的值。

预期结果:

name                    name_soundex
-------------------------------------
Mr.Alex James Henrry    A420 J520 H560
Mr John Desto           J500 D230
Ms.Lisa Jack Jerry Han  L200 J200 J600 H500
Smith White             S530 W300
Rowny James Duest       R500 J520 D230

尝试过:

正在添加要存储的列 soundex:

alter table testing
add name_soundex varchar(500);

更新:

update testing
set name_soundex = SOUNDEX(name)

获得以下输出:

name                    name_soundex
-------------------------------------
Mr.Alex James Henrry    M600
Mr John Desto           M600
Ms.Lisa Jack Jerry Han  M200
Smith White             S530
Rowny James Duest       R500

您需要将名称分成各自的部分和 "remerge" 它们。 SQL Server 2008(几乎完全不受支持,因此您应该查看升级计划)没有内置分离器。SQL Server 2016+ 有,但是没有t 提供顺序位置;因此我使用了 DelimitedSplit8K(A google 会找到这个)。如果您使用的是 2012+,我会推荐 DelimitedSplit8K_LEAD(即使是 2016+,因为顺序位置很重要):

WITH VTE AS(
    SELECT *
    FROM (VALUES('Mr.Alex James Henrry'),
                ('Mr John Desto'),
                ('Ms.Lisa Jack Jerry Han'),
                ('Smith White'),
                ('Rowny James Duest')) V([Name]))
SELECT [name],
       STUFF((SELECT ' ' + SOUNDEX(DS.item)
              FROM dbo.DelimitedSplit8K(REPLACE([name],'.',' '),' ') DS
              WHERE DS.item NOT IN ('Mr','Mrs','Miss','...') --You know what your acceptable titles are
                                                             --Although, seeing as you have both "Mr {name}" and Mr.{name}", maybe not :/
              ORDER BY DS.itemnumber
              FOR XML PATH('')),1,1,'') AS name_soundex
FROM VTE;