等效更改的差异结果不一致

Inconsistency in diff results for equivalent changes

考虑以下文件和差异结果:

a1.txt

a
b
My name is Ian

a2.txt

a
a
b
My name is John

运行 diff --side-by-side --suppress-common-lines a1.txt a2.txt 产生:

                             >  a
My name is Ian               |  My name is John

其中正确说明 a 已添加到 a2.txt 并且 My name is Ian 更改为 My name is John

但是,如果我从两个文件中删除 b,产生的结果是不同的:

b1.txt

a
My name is Ian

b2.txt

a
a
My name is John

运行 diff --side-by-side --suppress-common-lines b1.txt b2.txt 产生:

My name is Ian                |  a
                              >  My name is John

这表示第 My name is Ian 行更改为 a 并且 My name is John 已添加到 b2.txt

虽然第二次比较的结果在技术上是有效的,但a1.txta2.txt之间的差异相当于b1.txtb2.txt之间的差异,所以为什么结果会不相等吗?

我能做些什么来使第二次比较产生与第一次相同的输出吗?

您在两个示例之间观察到的差异是正常的;它只是与您对 diff 所做的事情的期望相冲突。 diff utility solves the longest-common-subsequence problem,使用行为units/atoms.

[...] the difference between a1.txt and a2.txt is equivalent to that of b1.txt and b2.txt, so why would the result not be equal?

在这里,你的两个例子中最长的公共子序列是不同的,粗略地说,不要"line up"相同的方式。在第一个例子中,你有

# a1.txt              # a2.txt                   # line in common?
                      a                          n
a                     a                          y 
b                     b                          y
My name is Ian        My name is John            n

而在第二个示例中,您有

# b1.txt              # b2.txt                   # line in common?
a                     a                          y
My name is Ian        a                          n
                      My name is John            n

因此,就diff而言,两对文件的区别并不等同。 diff 不记得你为获取 b[12].txt 文件所做的一切就是从每个 a[12].txt 文件中删除 b 行。它所看到的是最长的公共子序列现在只包含包含 a 的一行,并从中推断出两个 b[12].txt 文件之间的差异。

Is there anything I can do such that the second comparison produces the same output as the first?

除非使用不同的 diff 算法(或实现您自己的算法),否则我不这么认为。