如何在不忽略拼写错误的情况下通过相似性比较字符串?
How to compare strings by similarity without ignoring typos?
我需要按邻近度比较两个字符串,以防完整字符串上的 string.equals 失败,我需要始终比较名字和中间的 and/or 姓氏。
我已经找到了一些比较算法,但他们都考虑了结果的拼写错误,我必须比较确切的输入。
示例:
- Maria souza silva = Maria souza silva = ok
- Maria souza silva = Maria silva = ok
- Maria souza silva = Maria Carvalho = Nok
- Maria souza silva = Ana souza silva = Nok
- Maria de souza silva = Maria de = Nok
- Maria de souza silva = Maria souza = OK
我正在尝试这样的事情:
String name = "Maria da souza Silva";
String nameRequest = "Maria da Silva";
if(name.equalsIgnoreCase(nameRequest)){
System.out.print("ok 0");
}
String[] names = name.split(" ");
int nameLenght = names.length-1;
if(nameRequest.startsWith(names[0])){
System.out.println("ok 1, next");
} else {
System.out.print("nok, stop");
}
if(nameRequest.endsWith(names[nameLenght])){
System.out.print("ok 2");
}
结果是ok 1, next
和ok 2
。
名字和姓氏都可以,但是我需要比较中间名,忽略像"de/da"这样的。
您可以像这样使用正则表达式:
String firstName = "maria";
String lastName = "silva";
String regex = ("^" + firstName + "([ ].*[ ]|[ ])" + lastName + "$");
System.out.println("maria de silva".matches(regex));
System.out.println("maria silva".matches(regex));
System.out.println("maria deb".matches(regex));
System.out.println("a silva".matches(regex));
System.out.println("mariasilva".matches(regex));
true
true
false
false
false
正则表达式将在开头查找名字,在字符串末尾查找姓氏,在中间查找 2 space 之间的任何字符或仅查找一个 space.
一开始我打算使用纯正则表达式,可能有办法,但是这段代码会产生您要查找的结果,使用 first 和 last,或 first 和 middle,并忽略 de 和 da。
private void checkName(String target, String source) {
Pattern pattern = Pattern.compile("^(?<firstName>[^\s]+)\s((de|da)(\s|$))?(?<otherName>.*)$");
Matcher targetMatcher = pattern.matcher(target.trim().toLowerCase());
Matcher sourceMatcher = pattern.matcher(source.trim().toLowerCase());
if (!targetMatcher.matches() || !sourceMatcher.matches()) {
System.out.println("Nok");
}
boolean ok = true;
if (!sourceMatcher.group("firstName").equals(targetMatcher.group("firstName"))) {
ok = false;
} else {
String[] otherSourceName = sourceMatcher.group("otherName").split("\s");
String[] otherTargetName = targetMatcher.group("otherName").split("\s");
int targetIndex = 0;
for (String s : otherSourceName) {
boolean hit = false;
for (; targetIndex < otherTargetName.length; targetIndex++) {
if (s.equals(otherTargetName[targetIndex])) {
hit = true;
break;
}
}
if (!hit) {
ok = false;
break;
}
}
}
System.out.println(ok ? "ok" : "Nok");
}
对于您的示例,输出为:
ok
ok
Nok
Nok
Nok
ok
我需要按邻近度比较两个字符串,以防完整字符串上的 string.equals 失败,我需要始终比较名字和中间的 and/or 姓氏。
我已经找到了一些比较算法,但他们都考虑了结果的拼写错误,我必须比较确切的输入。
示例:
- Maria souza silva = Maria souza silva = ok
- Maria souza silva = Maria silva = ok
- Maria souza silva = Maria Carvalho = Nok
- Maria souza silva = Ana souza silva = Nok
- Maria de souza silva = Maria de = Nok
- Maria de souza silva = Maria souza = OK
我正在尝试这样的事情:
String name = "Maria da souza Silva";
String nameRequest = "Maria da Silva";
if(name.equalsIgnoreCase(nameRequest)){
System.out.print("ok 0");
}
String[] names = name.split(" ");
int nameLenght = names.length-1;
if(nameRequest.startsWith(names[0])){
System.out.println("ok 1, next");
} else {
System.out.print("nok, stop");
}
if(nameRequest.endsWith(names[nameLenght])){
System.out.print("ok 2");
}
结果是ok 1, next
和ok 2
。
名字和姓氏都可以,但是我需要比较中间名,忽略像"de/da"这样的。
您可以像这样使用正则表达式:
String firstName = "maria";
String lastName = "silva";
String regex = ("^" + firstName + "([ ].*[ ]|[ ])" + lastName + "$");
System.out.println("maria de silva".matches(regex));
System.out.println("maria silva".matches(regex));
System.out.println("maria deb".matches(regex));
System.out.println("a silva".matches(regex));
System.out.println("mariasilva".matches(regex));
true
true
false
false
false
正则表达式将在开头查找名字,在字符串末尾查找姓氏,在中间查找 2 space 之间的任何字符或仅查找一个 space.
一开始我打算使用纯正则表达式,可能有办法,但是这段代码会产生您要查找的结果,使用 first 和 last,或 first 和 middle,并忽略 de 和 da。
private void checkName(String target, String source) {
Pattern pattern = Pattern.compile("^(?<firstName>[^\s]+)\s((de|da)(\s|$))?(?<otherName>.*)$");
Matcher targetMatcher = pattern.matcher(target.trim().toLowerCase());
Matcher sourceMatcher = pattern.matcher(source.trim().toLowerCase());
if (!targetMatcher.matches() || !sourceMatcher.matches()) {
System.out.println("Nok");
}
boolean ok = true;
if (!sourceMatcher.group("firstName").equals(targetMatcher.group("firstName"))) {
ok = false;
} else {
String[] otherSourceName = sourceMatcher.group("otherName").split("\s");
String[] otherTargetName = targetMatcher.group("otherName").split("\s");
int targetIndex = 0;
for (String s : otherSourceName) {
boolean hit = false;
for (; targetIndex < otherTargetName.length; targetIndex++) {
if (s.equals(otherTargetName[targetIndex])) {
hit = true;
break;
}
}
if (!hit) {
ok = false;
break;
}
}
}
System.out.println(ok ? "ok" : "Nok");
}
对于您的示例,输出为:
ok
ok
Nok
Nok
Nok
ok