java 中的 Levenshtein 距离输出错误的数字
Levenshtein distance in java outputting the wrong number
对于我在 java 的大学作业,我被要求提供 "extra analytics functions" 我决定使用 Levenshtein 距离,但我有一个问题,输出到控制台的数字比实际数字少一个回答。所以 "cat" 和 "hat" 之间的距离应该是 1 但它显示为 0
public class Levenshtein {
public Levenshtein(String first, String second) {
char [] s = first.toCharArray();
char [] t = second .toCharArray();
int Subcost = 0;
int[][] array = new int[first.length()][second.length()];
for (int i = 0; i < array[0].length; i++)
{
array[0][i] = i;
}
for (int j = 0; j < array.length; j++)
{
array [j][0]= j;
}
for (int i = 1; i < second.length(); i++)
{
for (int j = 1; j < first.length(); j++)
{
if (s[j] == t [i])
{
Subcost = 0;
}
else
{
Subcost = 1;
}
array [j][i] = Math.min(array [j-1][i] +1,
Math.min(array [j][i-1] +1,
array [j-1][i-1] + Subcost) );
}
}
UI.output("The Levenshtein distance is -> " + array[first.length()-1][second.length()-1]);
}
}
显然您正在使用以下算法:
https://en.wikipedia.org/wiki/Levenshtein_distance#Iterative_with_full_matrix
我认为您对索引的理解不太准确。我不确定问题到底出在哪里,但这是 working version:
public int calculateLevenshteinDistance(String first, String second) {
char[] s = first.toCharArray();
char[] t = second.toCharArray();
int substitutionCost = 0;
int m = first.length();
int n = second.length();
int[][] array = new int[m + 1][n + 1];
for (int i = 1; i <= m; i++) {
array[i][0] = i;
}
for (int j = 1; j <= n; j++) {
array[0][j] = j;
}
for (int j = 1; j <= n; j++) {
for (int i = 1; i <= m; i++) {
if (s[i - 1] == t[j - 1]) {
substitutionCost = 0;
} else {
substitutionCost = 1;
}
int deletion = array[i - 1][j] + 1;
int insertion = array[i][j - 1] + 1;
int substitution = array[i - 1][j - 1] + substitutionCost;
int cost = Math.min(
deletion,
Math.min(
insertion,
substitution));
array[i][j] = cost;
}
}
return array[m][n];
}
对于我在 java 的大学作业,我被要求提供 "extra analytics functions" 我决定使用 Levenshtein 距离,但我有一个问题,输出到控制台的数字比实际数字少一个回答。所以 "cat" 和 "hat" 之间的距离应该是 1 但它显示为 0
public class Levenshtein {
public Levenshtein(String first, String second) {
char [] s = first.toCharArray();
char [] t = second .toCharArray();
int Subcost = 0;
int[][] array = new int[first.length()][second.length()];
for (int i = 0; i < array[0].length; i++)
{
array[0][i] = i;
}
for (int j = 0; j < array.length; j++)
{
array [j][0]= j;
}
for (int i = 1; i < second.length(); i++)
{
for (int j = 1; j < first.length(); j++)
{
if (s[j] == t [i])
{
Subcost = 0;
}
else
{
Subcost = 1;
}
array [j][i] = Math.min(array [j-1][i] +1,
Math.min(array [j][i-1] +1,
array [j-1][i-1] + Subcost) );
}
}
UI.output("The Levenshtein distance is -> " + array[first.length()-1][second.length()-1]);
}
}
显然您正在使用以下算法:
https://en.wikipedia.org/wiki/Levenshtein_distance#Iterative_with_full_matrix
我认为您对索引的理解不太准确。我不确定问题到底出在哪里,但这是 working version:
public int calculateLevenshteinDistance(String first, String second) {
char[] s = first.toCharArray();
char[] t = second.toCharArray();
int substitutionCost = 0;
int m = first.length();
int n = second.length();
int[][] array = new int[m + 1][n + 1];
for (int i = 1; i <= m; i++) {
array[i][0] = i;
}
for (int j = 1; j <= n; j++) {
array[0][j] = j;
}
for (int j = 1; j <= n; j++) {
for (int i = 1; i <= m; i++) {
if (s[i - 1] == t[j - 1]) {
substitutionCost = 0;
} else {
substitutionCost = 1;
}
int deletion = array[i - 1][j] + 1;
int insertion = array[i][j - 1] + 1;
int substitution = array[i - 1][j - 1] + substitutionCost;
int cost = Math.min(
deletion,
Math.min(
insertion,
substitution));
array[i][j] = cost;
}
}
return array[m][n];
}