如何在 google / diff-match-patch C# 中实现字级别
How to Implement Word Level in google / diff-match-patch C#
我正在尝试在 Google 差异匹配补丁中实现单词级匹配,但它让我很吃力。
我得到的结果是:
=I've never been =|-a-|=t=|= th=|-e-|=se places=|
=I've never been =|=t=|+o+|= th=|+o+|=se places=|
我想要的结果是:
=I've never been =|-at these-|= places=|
=I've never been =|+to those+|= places=|
文档说:
make a copy of diff_linesToChars and call it diff_linesToWords. Look
for the line that identifies the next line boundary: lineEnd =
text.indexOf('\n', lineStart);
在c#版本中,我在diff_linesToCharsMunge中找到要更改的行,我将其更改为:
lineEnd = text.Replace(@"/[\n\.,;:]/ g"," ").IndexOf(" ", lineStart);
但是,粒度没有变化 - 它仍然在字符级别发现差异。
我正在打电话:
List<Diff> differences = diffs.diff_main(linepair.Original, linepair.Corrected, true);
diffs.diff_cleanupSemantic(differences);
我已经逐步检查以确保它符合我所做的更改(顺便说一句,在它开始之前有至少 100 个字符的硬编码)。
我用 diffmatch 程序创建了一个示例 dotnet 项目。它可能是较旧版本的 DiffMatchPatch 文件,但单词和行有效。
对于上面的示例文本,我得到以下输出。
at these | to those
我正在尝试在 Google 差异匹配补丁中实现单词级匹配,但它让我很吃力。
我得到的结果是:
=I've never been =|-a-|=t=|= th=|-e-|=se places=|
=I've never been =|=t=|+o+|= th=|+o+|=se places=|
我想要的结果是:
=I've never been =|-at these-|= places=|
=I've never been =|+to those+|= places=|
文档说:
make a copy of diff_linesToChars and call it diff_linesToWords. Look for the line that identifies the next line boundary: lineEnd = text.indexOf('\n', lineStart);
在c#版本中,我在diff_linesToCharsMunge中找到要更改的行,我将其更改为:
lineEnd = text.Replace(@"/[\n\.,;:]/ g"," ").IndexOf(" ", lineStart);
但是,粒度没有变化 - 它仍然在字符级别发现差异。
我正在打电话:
List<Diff> differences = diffs.diff_main(linepair.Original, linepair.Corrected, true);
diffs.diff_cleanupSemantic(differences);
我已经逐步检查以确保它符合我所做的更改(顺便说一句,在它开始之前有至少 100 个字符的硬编码)。
我用 diffmatch 程序创建了一个示例 dotnet 项目。它可能是较旧版本的 DiffMatchPatch 文件,但单词和行有效。
对于上面的示例文本,我得到以下输出。
at these | to those