处理只改变空格的大块头

Dealing with hunks that only change whitespace

在我维护的代码中,我有时会收到提交者无缘无故地重排段落的拉取请求。这是一个例子:

diff --git a/knuth.tex b/knuth.tex
index 2f6a2f8..7b0827d 100644
--- a/knuth.tex
+++ b/knuth.tex
@@ -1,6 +1,6 @@
 Thus, I came to the conclusion that the designer of a new
 system must not only be the implementer and first
-large||scale user; the designer should also write the first
+large-scale user; the designer should also write the first
 user manual.

 The separation of any of these four components would have
@@ -9,8 +9,7 @@ all these activities, literally hundreds of improvements
 would never have been made, because I would never have
 thought of them or perceived why they were important.

-But a system cannot be successful if it is too strongly
-influenced by a single person. Once the initial design is
-complete and fairly robust, the real test begins as people
-with many different viewpoints undertake their own
-experiments.
+But a system cannot be successful if it is too strongly influenced by
+a single person. Once the initial design is complete and fairly
+robust, the real test begins as people with many different viewpoints
+undertake their own experiments.

如您所见,第一个 hunk 通过将 || 替换为 - 引入了实际更改,而第二个 hunk 除了换行和空格外没有任何更改。事实上,第二个 hunk 的 word-diff 将是空的。

是否可以制定一项政策(例如在 GitHub 或我的 CI 中)以拒绝包含此类“空”块的提交,或者重新格式化补丁以忽略这些块完全让我可以 git apply 没有他们?

相关:How to git-apply a git word diff

如果您正在寻找内置解决方案,我不知道有没有。然而,这并不意味着它不能相对容易地内置到 CI 系统中。

您可以将适当的 git diff 命令的输出通过管道传输到如下脚本中,如果输入包含上述第二个大块的补丁,该脚本将退出 1。

#!/usr/bin/env ruby

def filter(arr)
  arr.join.split("\n\n").map { |x| x.gsub(/\s+/, ' ') }.join("\n\n")
end

def should_reject(before, after)
  return false if before.empty? && after.empty?
  before = filter(before)
  after = filter(after)
  return true if before == after
  false
end

chunk = nil
before = []
after = []
while (line = gets)
  trimmed = line[1..-1]
  case line
  when /^(\+\+\+|---)/
    # Do nothing.
  when /^@@ /
    if should_reject(before, after)
      warn "Useless change to hunk #{chunk}"
      exit 1
    end
    chunk = line
    before = []
    after = []
  when /^ /
    before << trimmed
    after << trimmed
  when /^\+/
    after << trimmed
  when /^-/
    before << trimmed
  end
end

if should_reject(before, after)
  warn "Useless change to hunk #{chunk}"
  exit 1
end

它本质上是将每个大块拆分成块,块之间有一个空行,将所有空白都变成空格,然后进行比较。如果它们相等,它会抱怨并以非零值退出。您可能希望将其修改为更健壮,例如处理 CRLF 结尾等,但该方法是可行的。

附带说明一下,使这些更改更容易被发现的一种方法是使用每行句子样式。每句话不分长短,一整行,每行只有一个句子。这使得区分任何类型的更改变得更加容易,并且完全避免了包装问题。