Git 如何创建唯一的提交哈希,主要是前几个字符?
How does Git create unique commit hashes, mainly the first few characters?
我很难理解 Git 如何创建完全独特的散列,即使在前 4 个字符中也不允许相同。我可以仅使用前四个字符调用 Git Bash 中的提交。是在算法中明确决定第一个字符是 "ultra"-唯一的并且永远不会与其他类似的哈希值冲突,还是算法以相同的方式生成哈希值的每个部分?
Git 使用以下信息生成 sha-1:
- 提交的源代码树(分解为所有子树和
斑点)
- 父提交 sha1
- 作者信息(带时间戳)
- 提交者信息(对,它们是不同的!,还有时间戳)
- 提交信息
(关于完整的解释;看here)。
Git不保证前 4 个字符是唯一的。在 chapter 7 of the Pro Git Book 中写着:
Git can figure out a short, unique abbreviation for your SHA-1 values.
If you pass --abbrev-commit to the git log command, the output will
use shorter values but keep them unique; it defaults to using seven
characters but makes them longer if necessary to keep the SHA-1
unambiguous:
所以 Git 只是使缩写 只要有必要 就可以保持唯一性。他们甚至注意到:
Generally, eight to ten characters are more than enough to be unique
within a project.
As an example, the Linux kernel, which is a pretty large project with
over 450k commits and 3.6 million objects, has no two objects whose
SHA-1s overlap more than the first 11 characters.
所以事实上,它们只是依赖于具有完全相同的(a 的 X 个第一个字符)sha 的巨大 可能性。
四月。 2017 年:请注意,在所有 shattered.io episode 之后(Google 实现了 SHA1 冲突),20 字节格式将不会永远存在。
第一步是用通用对象替换 unsigned char sha1[20]
,它在整个 Git 代码库中都是硬编码的,其定义在未来可能会发生变化(SHA2?,Blake2, ...)
见commit e86ab2c (21 Feb 2017) by brian m. carlson (bk2204
)。
Convert the remaining uses of unsigned char [20]
to struct object_id
.
这是从 commit 5f7817c (13 Mar 2015) by brian m. carlson (bk2204
), for v2.5.0-rc0, in cache.h
:
开始的持续努力的一个例子
/* The length in bytes and in hex digits of an object name (SHA-1 value). */
#define GIT_SHA1_RAWSZ 20
#define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
struct object_id {
unsigned char hash[GIT_SHA1_RAWSZ];
};
不要忘记,即使使用 SHA1,前 4 个字符也不足以保证唯一性,正如我在“How much of a git sha is generally considered necessary to uniquely identify a change in a given codebase?”中解释的那样。
2017 年 12 月更新 Git 2.16(2018 年第一季度):支持替代 SHA 的工作正在进行中:请参阅“”。
您将能够使用另一种哈希:SHA1 不再是 Git 的唯一哈希。
更新2018-2019:选择已在Git 2.19+:SHA-256中做出。
参见“hash-function-transition”。
这还没有激活(意味着 git 2.21 仍在使用 SHA1),但代码正在完成以支持未来的 SHA-256。
在 Git 2.26(2020 年第一季度)中,工作继续进行,并使用“struct object_id"
替换“char *sha1
”
参见 commit 2fecc48, commit 6ac9760, commit b99b6bc, commit 63f4a7f, commit e31c710, commit 500e4f2, commit f66d4e0, commit a93c141, commit 3f83fd5, commit 0763671 (24 Feb 2020) by Jeff King (peff
)。
(由 Junio C Hamano -- gitster
-- in commit e8e7184 合并,2020 年 3 月 5 日)
packfile
: drop nth_packed_object_sha1()
Signed-off-by: Jeff King
Once upon a time, nth_packed_object_sha1()
was the primary way to get the oid of a packfile's index position.
But these days we have the more type-safe nth_packed_object_id()
wrapper, and all callers have been converted.
Let's drop the "sha1
" version (turning the safer wrapper into a single function) so that nobody is tempted to introduce new callers.
在 Git 2.29(2020 年第 4 季度)中,“sha1
到 oid
” 重命名继续..
参见 commit a46d1f7, commit fb07bd4, commit cfaf9f0, commit ef2d554, commit 962dd7e, commit 8f7e3de, commit b1f1ade (27 Sep 2020) by Martin Ågren (none
)。
(由 Junio C Hamano -- gitster
-- in commit 07601b5 合并,2020 年 10 月 5 日)
wt-status
: replace sha1 mentions with oid
Signed-off-by: Martin Ågren
abbrev_sha1_in_line()
uses a struct
object_id oid
and should be fully prepared to handle non-SHA1 object ids. Rename it to abbrev_oid_in_line()
.
A few comments in wt_status_get_detached_from()
mention "sha1". The variable they refer to was renamed in e86ab2c1cd ("wt-status: convert to struct object_id",
2017-02-21, Git v2.13.0-rc0). Update the comments to reference "oid
" instead.
我很难理解 Git 如何创建完全独特的散列,即使在前 4 个字符中也不允许相同。我可以仅使用前四个字符调用 Git Bash 中的提交。是在算法中明确决定第一个字符是 "ultra"-唯一的并且永远不会与其他类似的哈希值冲突,还是算法以相同的方式生成哈希值的每个部分?
Git 使用以下信息生成 sha-1:
- 提交的源代码树(分解为所有子树和 斑点)
- 父提交 sha1
- 作者信息(带时间戳)
- 提交者信息(对,它们是不同的!,还有时间戳)
- 提交信息
(关于完整的解释;看here)。
Git不保证前 4 个字符是唯一的。在 chapter 7 of the Pro Git Book 中写着:
Git can figure out a short, unique abbreviation for your SHA-1 values. If you pass --abbrev-commit to the git log command, the output will use shorter values but keep them unique; it defaults to using seven characters but makes them longer if necessary to keep the SHA-1 unambiguous:
所以 Git 只是使缩写 只要有必要 就可以保持唯一性。他们甚至注意到:
Generally, eight to ten characters are more than enough to be unique within a project.
As an example, the Linux kernel, which is a pretty large project with over 450k commits and 3.6 million objects, has no two objects whose SHA-1s overlap more than the first 11 characters.
所以事实上,它们只是依赖于具有完全相同的(a 的 X 个第一个字符)sha 的巨大 可能性。
四月。 2017 年:请注意,在所有 shattered.io episode 之后(Google 实现了 SHA1 冲突),20 字节格式将不会永远存在。
第一步是用通用对象替换 unsigned char sha1[20]
,它在整个 Git 代码库中都是硬编码的,其定义在未来可能会发生变化(SHA2?,Blake2, ...)
见commit e86ab2c (21 Feb 2017) by brian m. carlson (bk2204
)。
Convert the remaining uses of
unsigned char [20]
tostruct object_id
.
这是从 commit 5f7817c (13 Mar 2015) by brian m. carlson (bk2204
), for v2.5.0-rc0, in cache.h
:
/* The length in bytes and in hex digits of an object name (SHA-1 value). */
#define GIT_SHA1_RAWSZ 20
#define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
struct object_id {
unsigned char hash[GIT_SHA1_RAWSZ];
};
不要忘记,即使使用 SHA1,前 4 个字符也不足以保证唯一性,正如我在“How much of a git sha is generally considered necessary to uniquely identify a change in a given codebase?”中解释的那样。
2017 年 12 月更新 Git 2.16(2018 年第一季度):支持替代 SHA 的工作正在进行中:请参阅“
您将能够使用另一种哈希:SHA1 不再是 Git 的唯一哈希。
更新2018-2019:选择已在Git 2.19+:SHA-256中做出。
参见“hash-function-transition”。
这还没有激活(意味着 git 2.21 仍在使用 SHA1),但代码正在完成以支持未来的 SHA-256。
在 Git 2.26(2020 年第一季度)中,工作继续进行,并使用“struct object_id"
替换“char *sha1
”
参见 commit 2fecc48, commit 6ac9760, commit b99b6bc, commit 63f4a7f, commit e31c710, commit 500e4f2, commit f66d4e0, commit a93c141, commit 3f83fd5, commit 0763671 (24 Feb 2020) by Jeff King (peff
)。
(由 Junio C Hamano -- gitster
-- in commit e8e7184 合并,2020 年 3 月 5 日)
packfile
: dropnth_packed_object_sha1()
Signed-off-by: Jeff King
Once upon a time,
nth_packed_object_sha1()
was the primary way to get the oid of a packfile's index position.
But these days we have the more type-safenth_packed_object_id()
wrapper, and all callers have been converted.
Let's drop the "
sha1
" version (turning the safer wrapper into a single function) so that nobody is tempted to introduce new callers.
在 Git 2.29(2020 年第 4 季度)中,“sha1
到 oid
” 重命名继续..
参见 commit a46d1f7, commit fb07bd4, commit cfaf9f0, commit ef2d554, commit 962dd7e, commit 8f7e3de, commit b1f1ade (27 Sep 2020) by Martin Ågren (none
)。
(由 Junio C Hamano -- gitster
-- in commit 07601b5 合并,2020 年 10 月 5 日)
wt-status
: replace sha1 mentions with oidSigned-off-by: Martin Ågren
abbrev_sha1_in_line()
uses astruct
object_idoid
and should be fully prepared to handle non-SHA1 object ids. Rename it toabbrev_oid_in_line()
.A few comments in
wt_status_get_detached_from()
mention "sha1". The variable they refer to was renamed in e86ab2c1cd ("wt-status: convert to structobject_id",
2017-02-21, Git v2.13.0-rc0). Update the comments to reference "oid
" instead.