Git 如何创建唯一的提交哈希,主要是前几个字符?

How does Git create unique commit hashes, mainly the first few characters?

我很难理解 Git 如何创建完全独特的散列,即使在前 4 个字符中也不允许相同。我可以仅使用前四个字符调用 Git Bash 中的提交。是在算法中明确决定第一个字符是 "ultra"-唯一的并且永远不会与其他类似的哈希值冲突,还是算法以相同的方式生成哈希值的每个部分?

Git 使用以下信息生成 sha-1:

  • 提交的源代码树(分解为所有子树和 斑点)
  • 父提交 sha1
  • 作者信息(带时间戳)
  • 提交者信息(对,它们是不同的!,还有时间戳)
  • 提交信息

(关于完整的解释;看here)。

Git保证前 4 个字符是唯一的。在 chapter 7 of the Pro Git Book 中写着:

Git can figure out a short, unique abbreviation for your SHA-1 values. If you pass --abbrev-commit to the git log command, the output will use shorter values but keep them unique; it defaults to using seven characters but makes them longer if necessary to keep the SHA-1 unambiguous:

所以 Git 只是使缩写 只要有必要 就可以保持唯一性。他们甚至注意到:

Generally, eight to ten characters are more than enough to be unique within a project.

As an example, the Linux kernel, which is a pretty large project with over 450k commits and 3.6 million objects, has no two objects whose SHA-1s overlap more than the first 11 characters.

所以事实上,它们只是依赖于具有完全相同的(a 的 X 个第一个字符)sha 的巨大 可能性

四月。 2017 年:请注意,在所有 shattered.io episode 之后(Google 实现了 SHA1 冲突),20 字节格式将不会永远存在。

第一步是用通用对象替换 unsigned char sha1[20],它在整个 Git 代码库中都是硬编码的,其定义在未来可能会发生变化(SHA2?,Blake2, ...)

commit e86ab2c (21 Feb 2017) by brian m. carlson (bk2204)

Convert the remaining uses of unsigned char [20] to struct object_id.

这是从 commit 5f7817c (13 Mar 2015) by brian m. carlson (bk2204), for v2.5.0-rc0, in cache.h:

开始的持续努力的一个例子
/* The length in bytes and in hex digits of an object name (SHA-1 value). */
#define GIT_SHA1_RAWSZ 20
#define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)

struct object_id {
    unsigned char hash[GIT_SHA1_RAWSZ];
};

不要忘记,即使使用 SHA1,前 4 个字符也不足以保证唯一性,正如我在“How much of a git sha is generally considered necessary to uniquely identify a change in a given codebase?”中解释的那样。


2017 年 12 月更新 Git 2.16(2018 年第一季度):支持替代 SHA 的工作正在进行中:请参阅“”。

您将能够使用另一种哈希:SHA1 不再是 Git 的唯一哈希。

更新2018-2019:选择已在Git 2.19+:SHA-256中做出。
参见“hash-function-transition”。

这还没有激活(意味着 git 2.21 仍在使用 SHA1),但代码正在完成以支持未来的 SHA-256。


在 Git 2.26(2020 年第一季度)中,工作继续进行,并使用“struct object_id" 替换“char *sha1

参见 commit 2fecc48, commit 6ac9760, commit b99b6bc, commit 63f4a7f, commit e31c710, commit 500e4f2, commit f66d4e0, commit a93c141, commit 3f83fd5, commit 0763671 (24 Feb 2020) by Jeff King (peff)
(由 Junio C Hamano -- gitster -- in commit e8e7184 合并,2020 年 3 月 5 日)

packfile: drop nth_packed_object_sha1()

Signed-off-by: Jeff King

Once upon a time, nth_packed_object_sha1() was the primary way to get the oid of a packfile's index position.
But these days we have the more type-safe nth_packed_object_id() wrapper, and all callers have been converted.

Let's drop the "sha1" version (turning the safer wrapper into a single function) so that nobody is tempted to introduce new callers.


在 Git 2.29(2020 年第 4 季度)中,“sha1oid” 重命名继续..

参见 commit a46d1f7, commit fb07bd4, commit cfaf9f0, commit ef2d554, commit 962dd7e, commit 8f7e3de, commit b1f1ade (27 Sep 2020) by Martin Ågren (none)
(由 Junio C Hamano -- gitster -- in commit 07601b5 合并,2020 年 10 月 5 日)

wt-status: replace sha1 mentions with oid

Signed-off-by: Martin Ågren

abbrev_sha1_in_line() uses a struct object_id oid and should be fully prepared to handle non-SHA1 object ids. Rename it to abbrev_oid_in_line().

A few comments in wt_status_get_detached_from() mention "sha1". The variable they refer to was renamed in e86ab2c1cd ("wt-status: convert to struct object_id", 2017-02-21, Git v2.13.0-rc0). Update the comments to reference "oid" instead.