等效更改的差异结果不一致

Question

考虑以下文件和差异结果：

a1.txt

a
b
My name is Ian

a2.txt

a
a
b
My name is John

运行 diff --side-by-side --suppress-common-lines a1.txt a2.txt 产生：

                             >  a
My name is Ian               |  My name is John

其中正确说明 a 已添加到 a2.txt 并且 My name is Ian 更改为 My name is John。

但是，如果我从两个文件中删除 b，产生的结果是不同的：

b1.txt

a
My name is Ian

b2.txt

a
a
My name is John

运行 diff --side-by-side --suppress-common-lines b1.txt b2.txt 产生：

My name is Ian                |  a
                              >  My name is John

这表示第 My name is Ian 行更改为 a 并且 My name is John 已添加到 b2.txt。

虽然第二次比较的结果在技术上是有效的，但a1.txt和a2.txt之间的差异相当于b1.txt和b2.txt之间的差异，所以为什么结果会不相等吗？

我能做些什么来使第二次比较产生与第一次相同的输出吗？

Answer 1

您在两个示例之间观察到的差异是正常的；它只是与您对 diff 所做的事情的期望相冲突。 diff utility solves the longest-common-subsequence problem，使用行为units/atoms.

[...] the difference between a1.txt and a2.txt is equivalent to that of b1.txt and b2.txt, so why would the result not be equal?

在这里，你的两个例子中最长的公共子序列是不同的，粗略地说，不要"line up"相同的方式。在第一个例子中，你有

# a1.txt              # a2.txt                   # line in common?
                      a                          n
a                     a                          y 
b                     b                          y
My name is Ian        My name is John            n

而在第二个示例中，您有

# b1.txt              # b2.txt                   # line in common?
a                     a                          y
My name is Ian        a                          n
                      My name is John            n

因此，就diff而言，两对文件的区别并不等同。 diff 不记得你为获取 b[12].txt 文件所做的一切就是从每个 a[12].txt 文件中删除 b 行。它所看到的是最长的公共子序列现在只包含包含 a 的一行，并从中推断出两个 b[12].txt 文件之间的差异。

Is there anything I can do such that the second comparison produces the same output as the first?

除非使用不同的 diff 算法（或实现您自己的算法），否则我不这么认为。

等效更改的差异结果不一致

Inconsistency in diff results for equivalent changes

shell

diff