例子看懂git怪-M/-C

Example to understand git blame -M / -C

我正在努力了解 git blame -Mgit blame -C 的工作原理。

我创建了两个文件:

文件A:

A
B
C

文件 B:

D
E
F

并将它们添加(并提交)到我的存储库(哈希 1234)。然后我把FileB的内容复制到FileA,这样就变成了这样:

文件A:

A
B
C
D
E
F

并提交更改(哈希 4567)。

那我运行git blame -C FileA.

我期望输出:

1234 A
1234 B
1234 C
1234 D
1234 E
1234 F

却得到了:

1234 A
1234 B
1234 C
4567 D
4567 E
4567 F

当我将块 D E F 移动到 FileA 并执行 git blame -M FileA.

时相同

我是否误解了 -C-M 的目的,或者我在构建测试文件时错过了什么?

更新 1:-C-M 的值设置为 3 既无帮助,也无助于处理较大的文本(尝试使用 3 段 lorem ipsum)

来自git help blame

   -C|<num>|
       In addition to -M, detect lines moved or copied from other files that were modified in the same commit. This is
       useful when you reorganize your program and move code around across files. When this option is given twice, the
       command additionally looks for copies from other files in the commit that creates the file. When this option is given
       three times, the command additionally looks for copies from other files in any commit.

       <num> is optional but it is the lower bound on the number of alphanumeric characters that git must detect as
       moving/copying between files for it to associate those lines with the parent commit. **And the default value is 40**. If
       there are more than one -C options given, the <num> argument of the last -C will take effect.

注意到默认值是 40?您的示例仅显示 6 个(或 9 个)字符的变化,远低于 40 的阈值...

我怀疑您的测试输入不够大,算法无法检测到文本移动...

编辑:其中还有关于 "other files that were modified in the same commit" 的内容。所以这是一个例子:

$ git init /tmp/foo
Initialized empty Git repository in /tmp/foo/.git/
$ cd /tmp/foo
$ cp /etc/motd file1
$ cp /etc/magic file2
$ cp /etc/os-release file3
$ git add file1 file2 file3
$ git commit -m baseline
[master (root-commit) 36a1d7] baseline
 3 files changed, 19 insertions(+)
 create mode 100644 file1
 create mode 100644 file2
 create mode 100644 file3
$ head -5 file2 >> file1
$ head -5 file3 >> file1
$ sed -i 1,5d file3
$ git add file1 file3
$ git commit -m second
[master b7a683] second
 2 files changed, 8 insertions(+), 5 deletions(-)
$ git log --pretty=oneline
 b7a683 (HEAD, master) second
 36a1d7 baseline
$ git blame file1
 ^36a1d7 (Joe User 2015-03-26 17:19:10 -0500  1) 
 ^36a1d7 (Joe User 2015-03-26 17:19:10 -0500  2) The programs included with the Debian GNU/Linux system are free software;
 ^36a1d7 (Joe User 2015-03-26 17:19:10 -0500  3) the exact distribution terms for each program are described in the
 ^36a1d7 (Joe User 2015-03-26 17:19:10 -0500  4) individual files in /usr/share/doc/*/copyright.
 ^36a1d7 (Joe User 2015-03-26 17:19:10 -0500  5) 
 ^36a1d7 (Joe User 2015-03-26 17:19:10 -0500  6) Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
 ^36a1d7 (Joe User 2015-03-26 17:19:10 -0500  7) permitted by applicable law.
 b7a6839 (Joe User 2015-03-26 17:21:41 -0500  8) # Magic local data for file(1) command.
 b7a6839 (Joe User 2015-03-26 17:21:41 -0500  9) # Insert here your local magic data. Format is described in magic(5).
 b7a6839 (Joe User 2015-03-26 17:21:41 -0500 10) 
 b7a6839 (Joe User 2015-03-26 17:21:41 -0500 11) PRETTY_NAME="Debian GNU/Linux 7 (wheezy)"
 b7a6839 (Joe User 2015-03-26 17:21:41 -0500 12) NAME="Debian GNU/Linux"
 b7a6839 (Joe User 2015-03-26 17:21:41 -0500 13) VERSION_ID="7"
 b7a6839 (Joe User 2015-03-26 17:21:41 -0500 14) VERSION="7 (wheezy)"
 b7a6839 (Joe User 2015-03-26 17:21:41 -0500 15) ID=debian
$ git blame -C file1
 ^36a1d7 file1 (Joe User 2015-03-26 17:19:10 -0500  1) 
 ^36a1d7 file1 (Joe User 2015-03-26 17:19:10 -0500  2) The programs included with the Debian GNU/Linux system are free software;
 ^36a1d7 file1 (Joe User 2015-03-26 17:19:10 -0500  3) the exact distribution terms for each program are described in the
 ^36a1d7 file1 (Joe User 2015-03-26 17:19:10 -0500  4) individual files in /usr/share/doc/*/copyright.
 ^36a1d7 file1 (Joe User 2015-03-26 17:19:10 -0500  5) 
 ^36a1d7 file1 (Joe User 2015-03-26 17:19:10 -0500  6) Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
 ^36a1d7 file1 (Joe User 2015-03-26 17:19:10 -0500  7) permitted by applicable law.
 b7a6839 file1 (Joe User 2015-03-26 17:21:41 -0500  8) # Magic local data for file(1) command.
 b7a6839 file1 (Joe User 2015-03-26 17:21:41 -0500  9) # Insert here your local magic data. Format is described in magic(5).
 b7a6839 file1 (Joe User 2015-03-26 17:21:41 -0500 10) 
 ^36a1d7 file3 (Joe User 2015-03-26 17:19:10 -0500 11) PRETTY_NAME="Debian GNU/Linux 7 (wheezy)"
 ^36a1d7 file3 (Joe User 2015-03-26 17:19:10 -0500 12) NAME="Debian GNU/Linux"
 ^36a1d7 file3 (Joe User 2015-03-26 17:19:10 -0500 13) VERSION_ID="7"
 ^36a1d7 file3 (Joe User 2015-03-26 17:19:10 -0500 14) VERSION="7 (wheezy)"
 ^36a1d7 file3 (Joe User 2015-03-26 17:19:10 -0500 15) ID=debian

请注意,没有 -Cgit blame 只是将新行归因于第二次提交。但是有了它,它将最后 5 行归因于第一次提交的 file3,因为 1) 这是它们的来源,2) 段足够大,并且 3) file3 也被修改了第二次提交。来自 file2 的行未被识别,因为虽然该段足够大,但 file2 在第二次提交中未被修改。

另外,请注意检测文件内内容移动的 -M 和检测不同文件之间 movement/copying 的 -C 之间的区别。