Damerau–Levenshtein 距离的迭代版本
Iterative version of Damerau–Levenshtein distance
Levenshtein 距离可以使用两行以这种方式迭代计算:
https://en.wikipedia.org/wiki/Levenshtein_distance#Iterative_with_two_matrix_rows
我发现 Optimal String alignment distance 确实考虑了换位。维基百科说它可以使用常规 Levenshtein 算法的直接扩展来计算:
if i > 1 and j > 1 and a[i-1] = b[j-2] and a[i-2] = b[j-1] then
d[i, j] := minimum(d[i, j],
d[i-2, j-2] + cost) // transposition
但是,我无法将该页面上的伪代码算法扩展移植到迭代版本的代码中。非常感谢任何帮助。
你需要三行来计算这个新版本,我无法检查代码,但我对此很有信心:
int DamerauLevenshteinDistance(string s, string t)
{
// degenerate cases
if (s == t) return 0;
if (s.Length == 0) return t.Length;
if (t.Length == 0) return s.Length;
// create two work vectors of integer distances
int[] v0 = new int[t.Length + 1];
int[] v1 = new int[t.Length + 1];
int[] v2 = new int[t.Length + 1];
// initialize v0 (the previous row of distances)
// this row is A[0][i]: edit distance for an empty s
// the distance is just the number of characters to delete from t
for (int i = 0; i < v0.Length; i++)
v0[i] = i;
// compute v1
v1[0] = 0;
// use formula to fill in the rest of the row
for (int j = 0; j < t.Length; j++)
{
var cost = (s[0] == t[j]) ? 0 : 1;
v1[j + 1] = Minimum(v1[j] + 1, v0[j + 1] + 1, v0[j] + cost);
}
if (s.Length == 1) {
return v1[t.Length];
}
for (int i = 1; i < s.Length; i++)
{
// calculate v2 (current row distances) from the previous rows v0 and v1
// first element of v2 is A[i+1][0]
// edit distance is delete (i+1) chars from s to match empty t
v2[0] = i + 1;
// use formula to fill in the rest of the row
for (int j = 0; j < t.Length; j++)
{
var cost = (s[i] == t[j]) ? 0 : 1;
v2[j + 1] = Minimum(v2[j] + 1, v1[j + 1] + 1, v1[j] + cost);
if (j > 0 && s[i] = t[j-1] && s[i-1] = t[j])
v2[j + 1] = Minimum(v2[j+1],
v0[j-1] + cost);
}
// copy v2 (current row) to v1 (previous row) and v1 to v0 for next iteration
for (int j = 0; j < v0.Length; j++)
v0[j] = v1[j];
v1[j] = v2[j];
}
return v2[t.Length];
}
原始代码来自上述维基百科实现。
Levenshtein 距离可以使用两行以这种方式迭代计算:
https://en.wikipedia.org/wiki/Levenshtein_distance#Iterative_with_two_matrix_rows
我发现 Optimal String alignment distance 确实考虑了换位。维基百科说它可以使用常规 Levenshtein 算法的直接扩展来计算:
if i > 1 and j > 1 and a[i-1] = b[j-2] and a[i-2] = b[j-1] then
d[i, j] := minimum(d[i, j],
d[i-2, j-2] + cost) // transposition
但是,我无法将该页面上的伪代码算法扩展移植到迭代版本的代码中。非常感谢任何帮助。
你需要三行来计算这个新版本,我无法检查代码,但我对此很有信心:
int DamerauLevenshteinDistance(string s, string t)
{
// degenerate cases
if (s == t) return 0;
if (s.Length == 0) return t.Length;
if (t.Length == 0) return s.Length;
// create two work vectors of integer distances
int[] v0 = new int[t.Length + 1];
int[] v1 = new int[t.Length + 1];
int[] v2 = new int[t.Length + 1];
// initialize v0 (the previous row of distances)
// this row is A[0][i]: edit distance for an empty s
// the distance is just the number of characters to delete from t
for (int i = 0; i < v0.Length; i++)
v0[i] = i;
// compute v1
v1[0] = 0;
// use formula to fill in the rest of the row
for (int j = 0; j < t.Length; j++)
{
var cost = (s[0] == t[j]) ? 0 : 1;
v1[j + 1] = Minimum(v1[j] + 1, v0[j + 1] + 1, v0[j] + cost);
}
if (s.Length == 1) {
return v1[t.Length];
}
for (int i = 1; i < s.Length; i++)
{
// calculate v2 (current row distances) from the previous rows v0 and v1
// first element of v2 is A[i+1][0]
// edit distance is delete (i+1) chars from s to match empty t
v2[0] = i + 1;
// use formula to fill in the rest of the row
for (int j = 0; j < t.Length; j++)
{
var cost = (s[i] == t[j]) ? 0 : 1;
v2[j + 1] = Minimum(v2[j] + 1, v1[j + 1] + 1, v1[j] + cost);
if (j > 0 && s[i] = t[j-1] && s[i-1] = t[j])
v2[j + 1] = Minimum(v2[j+1],
v0[j-1] + cost);
}
// copy v2 (current row) to v1 (previous row) and v1 to v0 for next iteration
for (int j = 0; j < v0.Length; j++)
v0[j] = v1[j];
v1[j] = v2[j];
}
return v2[t.Length];
}
原始代码来自上述维基百科实现。