为什么 Git 不使用更现代的 SHA?
Why doesn't Git use more modern SHA?
我了解到 Git 使用 SHA-1 摘要作为修订版的 ID。为什么它不使用更现代的 SHA 版本?
UPDATE:上述问题和这个答案来自 2015 年。从那时起 Google 宣布了第一次 SHA-1 碰撞:https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
显然,我只能从外部推测为什么 Git 继续使用 SHA-1,但这些可能是原因之一:
- Git是Linus Torvald的创造,此时Linus显然不想用另一种哈希算法代替SHA-1。
- 他合理地声称,针对 Git 的基于 SHA-1 碰撞的成功攻击比实现碰撞本身要困难得多,并且考虑到 SHA-1 比应有的弱,不完全是坏了,这使得它至少在今天远非可行的攻击。此外,他指出,如果碰撞对象比现有对象晚到达,"successful" 攻击将收效甚微,因为后一个对象将被假定为与有效对象相同并被忽略(尽管其他人指出可能会发生相反的情况)。
- 更改软件既费时又容易出错,尤其是当现有基础设施和基于现有协议的数据必须迁移时。即使是那些生产以加密安全为系统唯一要点的软件和硬件产品的公司,也仍在从某些地方迁移出 SHA-1 和其他弱算法。想象一下到处都是那些硬编码的
unsigned char[20]
缓冲区 ;-),在一开始就针对加密敏捷性进行编程比在以后对其进行改造要容易得多。
- SHA-1 的性能优于各种 SHA-2 哈希(现在可能还没有成为交易破坏者,但 10 年前可能是症结所在),以及存储大小SHA-2 更大。
部分链接:
- Whosebug question on what would happen if a collision did occur in Git
- Newsgroup post showing a brief comment from Linus on the subject a couple of months after the main SHA-1 weakness became known in 2005
- A thread discussing the weakness and possible move to sha-256 (with replies from Linus) in 2006
- NIST statement on SHA-1 deprecation and recommending "to transition rapidly to the stronger SHA-2 family of hash functions"
我个人的观点是,虽然实际攻击可能需要一段时间,而且即使它们确实发生了,人们最初也可能会通过改变哈希算法本身以外的方式来减轻它们,如果你真的关心安全您应该谨慎选择算法,并不断向上修正您的安全强度,因为攻击者的能力也只会朝一个方向发展,因此将 Git 当作是不明智的一个榜样,尤其是因为它使用 SHA-1 的目的并不是为了加密安全。
这是关于从 SHA1 迁移到 Mercurial 的紧迫性的讨论,但它也适用于 Git:https://www.mercurial-scm.org/wiki/mpm/SHA1
简而言之:如果你今天不是非常勤奋,你的漏洞比 sha1 严重得多。但尽管如此,Mercurial 在 10 多年前就开始准备从 sha1 迁移。
work has been underway for years to retrofit Mercurial's data structures and protocols for SHA1's successors. Storage space was allocated for larger hashes in our revlog structure over 10 years ago in Mercurial 0.9 with the the introduction of RevlogNG. The bundle2 format introduced more recently supports the exchange of different hash types over the network. The only remaining pieces are choice of a replacement function and choosing a backwards-compatibility strategy.
如果 git 没有在 Mercurial 之前从 sha1 迁移出去,您总是可以通过使用 hg-git.
保留本地 Mercurial 镜像来增加另一级别的安全性
现在有一个transition plan to a stronger hash, so it looks like in future it will use a more modern hash than SHA-1. From the current transition plan:
Some hashes under consideration are SHA-256, SHA-512/256, SHA-256x16, K12, and BLAKE2bp-256
Why does it not use a more modern version of SHA?
十二月2017:会的。 Git 2.16(2018 年第一季度)是第一个说明和实现该意图的版本。
注意:见下面的Git 2.19:它将是SHA-256.
Git 2.16 将提出一个基础架构来定义 Git 中使用的哈希函数,并将开始努力在各种代码路径中探索它。
参见 commit c250e02 (28 Nov 2017) by Ramsay Jones (``)。
参见 commit eb0ccfd, commit 78a6766, commit f50e766, commit abade65 (12 Nov 2017) by brian m. carlson (bk2204
)。
(由 Junio C Hamano -- gitster
-- in commit 721cc43 合并,2017 年 12 月 13 日)
Add structure representing hash algorithm
Since in the future we want to support an additional hash algorithm, add a structure that represents a hash algorithm and all the data that must go along with it.
Add a constant to allow easy enumeration of hash algorithms.
Implement function typedefs
to create an abstract API that can be used by any hash algorithm, and wrappers for the existing SHA1 functions that conform to this API.
Expose a value for hex size as well as binary size.
While one will always be twice the other, the two values are both used extremely
commonly throughout the codebase and providing both leads to improved readability.
Don't include an entry in the hash algorithm structure for the null object ID.
As this value is all zeros, any suitably sized all-zero object ID can be used, and there's no need to store a given one on a per-hash basis.
The current hash function transition plan envisions a time when we will accept input from the user that might be in SHA-1 or in the NewHash format.
Since we cannot know which the user has provided, add a constant representing the unknown algorithm to allow us to indicate that we must look the correct value up.
Integrate hash algorithm support with repo setup
In future versions of Git, we plan to support an additional hash
algorithm.
Integrate the enumeration of hash algorithms with repository setup, and store a pointer to the enumerated data in struct repository.
Of course, we currently only support SHA-1, so hard-code this value in
read_repository_format
.
In the future, we'll enumerate this value from the configuration.
Add a constant, the_hash_algo
, which points to the hash_algo
structure pointer in the repository global.
Note that this is the hash which is used to serialize data to disk, not the hash which is used to display items to the user.
The transition plan anticipates that these may be different.
We can add an additional element in the future (say, ui_hash_algo
) to provide for this case.
2018 年 8 月更新,对于 Git 2.19(2018 年第 3 季度),Git 似乎选择 SHA-256 作为 NewHash。
参见 commit 0ed8d8d (04 Aug 2018) by Jonathan Nieder (artagnon
)。
参见 commit 13f5e09 (25 Jul 2018) by Ævar Arnfjörð Bjarmason (avar
)。
(由 Junio C Hamano -- gitster
-- in commit 34f2297 合并,2018 年 8 月 20 日)
doc hash-function-transition
: pick SHA-256 as NewHash
From a security perspective, it seems that SHA-256, BLAKE2, SHA3-256, K12, and so on are all believed to have similar security properties.
All are good options from a security point of view.
SHA-256 has a number of advantages:
It has been around for a while, is widely used, and is supported by just about every single crypto library (OpenSSL, mbedTLS, CryptoNG, SecureTransport, etc).
When you compare against SHA1DC, most vectorized SHA-256 implementations are indeed faster, even without acceleration.
If we're doing signatures with OpenPGP (or even, I suppose, CMS), we're going to be using SHA-2, so it doesn't make sense to have our security depend on two separate algorithms when either one of them alone could break the security when we could just depend on one.
So SHA-256 it is.
Update the hash-function-transition design doc to say so.
After this patch, there are no remaining instances of the string
"NewHash
", except for an unrelated use from 2008 as a variable name in
t/t9700/test.pl
.
您可以在 Git 2.20(2018 年第 4 季度)中看到向 SHA 256 的过渡:
参见 commit 0d7c419, commit dda6346, commit eccb5a5, commit 93eb00f, commit d8a3a69, commit fbd0e37, commit f690b6b, commit 49d1660, commit 268babd, commit fa13080, commit 7b5e614, commit 58ce21b, commit 2f0c9e9, commit 825544a (15 Oct 2018) by brian m. carlson (bk2204
)。
参见 commit 6afedba (15 Oct 2018) by SZEDER Gábor (szeder
)。
(由 Junio C Hamano -- gitster
-- in commit d829d49 合并,2018 年 10 月 30 日)
replace hard-coded constants
Replace several 40-based constants with references to GIT_MAX_HEXSZ
or
the_hash_algo
, as appropriate.
Convert all uses of the GIT_SHA1_HEXSZ
to use the_hash_algo
so that they
are appropriate for any given hash length.
Instead of using a hard-coded constant for the size of a hex object ID,
switch to use the computed pointer from parse_oid_hex
that points after
the parsed object ID.
GIT_SHA1_HEXSZ
进一步 remove/replaced 与 Git 2.22(2019 年第二季度)和 commit d4e568b.
这种转变在 Git 2.21(2019 年第一季度)中继续,它添加了 sha-256 哈希并将其插入代码以允许使用 Git 构建 Git。
参见 commit 4b4e291, commit 27dc04c, commit 13eeedb, commit c166599, commit 37649b7, commit a2ce0a7, commit 50c817e, commit 9a3a0ff, commit 0dab712, commit 47edb64 (14 Nov 2018), and commit 2f90b9d, commit 1ccf07c (22 Oct 2018) by brian m. carlson (bk2204
)。
(由 Junio C Hamano -- gitster
-- in commit 33e4ae9 合并,2019 年 1 月 29 日)
Add a base implementation of SHA-256 support (Feb. 2019)
SHA-1 is weak and we need to transition to a new hash function.
For some time, we have referred to this new function as NewHash
.
Recently, we decided to pick SHA-256 as NewHash
.
The reasons behind the choice of SHA-256 are outlined in this thread and in the commit history for the hash function transition document.
Add a basic implementation of SHA-256 based off libtomcrypt
, which is in
the public domain.
Optimize it and restructure it to meet our coding standards.
Pull in the update and final functions from the SHA-1 block implementation, as we know these function correctly with all compilers. This implementation is slower than SHA-1, but more performant implementations will be introduced in future commits.
Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.
Note that with this patch, it is still not possible to switch to using SHA-256 in Git.
Additional patches are needed to prepare the code to handle a larger hash algorithm and further test fixes are needed.
hash
: add an SHA-256 implementation using OpenSSL
We already have OpenSSL routines available for SHA-1, so add routines
for SHA-256 as well.
On a Core i7-6600U, this SHA-256 implementation compares favorably to
the SHA1DC SHA-1 implementation:
SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks)
SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)
sha256
: add an SHA-256 implementation using libgcrypt
Generally, one gets better performance out of cryptographic routines written in assembly than C, and this is also true for SHA-256.
In addition, most Linux distributions cannot distribute Git linked against
OpenSSL for licensing reasons.
Most systems with GnuPG will also have libgcrypt
, since it is a dependency of GnuPG.
libgcrypt
is also faster than the SHA1DC implementation for messages of a few KiB and larger.
For comparison, on a Core i7-6600U, this implementation processes 16 KiB
chunks at 355 MiB/s while SHA1DC processes equivalent chunks at 337
MiB/s.
In addition, libgcrypt is licensed under the LGPL 2.1, which is
compatible with the GPL. Add an implementation of SHA-256 that uses
libgcrypt.
升级工作继续 Git 2.24(2019 年第 4 季度)
参见 commit aaa95df, commit be8e172, commit 3f34d70, commit fc06be3, commit 69fa337, commit 3a4d7aa, commit e0cb7cd, commit 8d4d86b, commit f6ca67d, commit dd336a5, commit 894c0f6, commit 4439c7a, commit 95518fa, commit e84f357, commit fe9fec4, commit 976ff7e, commit 703d2d4, commit 9d958cc, commit 7962e04, commit fee4930 (18 Aug 2019) by brian m. carlson (bk2204
)。
(由 Junio C Hamano -- gitster
-- in commit 676278f 合并,2019 年 10 月 11 日)
Instead of using GIT_SHA1_HEXSZ
and hard-coded constants, switch to
using the_hash_algo
.
在 Git 2.26(2020 年第一季度)中,测试脚本 已准备好迎接对象名称将使用 SHA-256 的那一天。
参见 commit 277eb5a, commit 44b6c05, commit 7a868c5, commit 1b8f39f, commit a8c17e3, commit 8320722, commit 74ad99b, commit ba1be1a, commit cba472d, commit 82d5aeb, commit 3c5e65c, commit 235d3cd, commit 1d86c8f, commit 525a7f1, commit 7a1bcb2, commit cb78f4f, commit 717c939, commit 08a9dd8, commit 215b60b, commit 194264c (21 Dec 2019) by brian m. carlson (bk2204
)。
(由 Junio C Hamano -- gitster
-- in commit f52ab33 合并,2020 年 2 月 5 日)
示例:
t4204
: make hash size independent
Signed-off-by: brian m. carlson
Use $OID_REGEX
instead of a hard-coded regular expression.
所以,而不是使用:
grep "^[a-f0-9]\{40\} $(git rev-parse HEAD)$" output
测试正在使用
grep "^$OID_REGEX $(git rev-parse HEAD)$" output
而OID_REGEX
来自commit bdee9cd (13 May 2018) by brian m. carlson (bk2204
)。
(由 Junio C Hamano -- gitster
-- in commit 9472b13 合并,2018 年 5 月 30 日,Git v2.18.0-rc0)
t/test-lib
: introduce OID_REGEX
Signed-off-by: brian m. carlson
Currently we have a variable, $_x40,
which contains a regex that matches a full 40-character hex constant.
However, with NewHash
, we'll have object IDs that are longer than 40 characters.
In such a case, $_x40
will be a confusing name.
Create a $OID_REGEX
variable which will always reflect a regex matching the appropriate object ID, regardless of the length of the current hash.
并且,仍然用于测试:
参见 commit f303765, commit edf0424, commit 5db24dc, commit d341e08, commit 88ed241, commit 48c10cc, commit f7ae8e6, commit e70649b, commit a30f93b, commit a79eec2, commit 796d138, commit 417e45e, commit dfa5f53, commit f743e8f, commit 72f936b, commit 5df0f11, commit 07877f3, commit 6025e89, commit 7b1a182, commit 94db7e3, commit db12505 (07 Feb 2020) by brian m. carlson (bk2204
)。
(由 Junio C Hamano -- gitster
-- in commit 5af345a 合并,2020 年 2 月 17 日)
t5703
: make test work with SHA-256
Signed-off-by: brian m. carlson
This test used an object ID which was 40 hex characters in length, causing the test not only not to pass, but to hang, when run with SHA-256 as the hash.
Change this value to a fixed dummy object ID using test_oid_init
and test_oid
.
Furthermore, ensure we extract an object ID of the appropriate length using cut with fields instead of a fixed length.
一些代码路径被赋予了一个存储库实例作为在存储库中工作的参数,但是将 the_repository
实例传递给了它的被调用者,它已经被 Git 2.26(Q1 2020).
参见commit b98d188, commit 2dcde20, commit 7ad5c44, commit c8123e7, commit 5ec9b8a, commit a651946, commit eb999b3 (30 Jan 2020) by Matheus Tavares (matheustavares
)。
(由 Junio C Hamano -- gitster
-- in commit 78e67cd 合并,2020 年 2 月 14 日)
sha1-file
: allow check_object_signature()
to handle any repo
Signed-off-by: Matheus Tavares
Some callers of check_object_signature()
can work on arbitrary repositories, but the repo does not get passed to this function. Instead, the_repository
is always used internally.
To fix possible inconsistencies, allow the function to receive a struct repository and make those callers pass on the repo being handled.
基于:
sha1-file
: pass git_hash_algo
to hash_object_file()
Signed-off-by: Matheus Tavares
Allow hash_object_file()
to work on arbitrary repos by introducing a git_hash_algo
parameter. Change callers which have a struct repository pointer in their scope to pass on the git_hash_algo
from the said repo.
For all other callers, pass on the_hash_algo
, which was already being used internally at hash_object_file()
.
This functionality will be used in the following patch to make check_object_signature()
be able to work on arbitrary repos (which, in turn, will be used to fix an inconsistency at object.c
:parse_object()).
我了解到 Git 使用 SHA-1 摘要作为修订版的 ID。为什么它不使用更现代的 SHA 版本?
UPDATE:上述问题和这个答案来自 2015 年。从那时起 Google 宣布了第一次 SHA-1 碰撞:https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
显然,我只能从外部推测为什么 Git 继续使用 SHA-1,但这些可能是原因之一:
- Git是Linus Torvald的创造,此时Linus显然不想用另一种哈希算法代替SHA-1。
- 他合理地声称,针对 Git 的基于 SHA-1 碰撞的成功攻击比实现碰撞本身要困难得多,并且考虑到 SHA-1 比应有的弱,不完全是坏了,这使得它至少在今天远非可行的攻击。此外,他指出,如果碰撞对象比现有对象晚到达,"successful" 攻击将收效甚微,因为后一个对象将被假定为与有效对象相同并被忽略(尽管其他人指出可能会发生相反的情况)。
- 更改软件既费时又容易出错,尤其是当现有基础设施和基于现有协议的数据必须迁移时。即使是那些生产以加密安全为系统唯一要点的软件和硬件产品的公司,也仍在从某些地方迁移出 SHA-1 和其他弱算法。想象一下到处都是那些硬编码的
unsigned char[20]
缓冲区 ;-),在一开始就针对加密敏捷性进行编程比在以后对其进行改造要容易得多。 - SHA-1 的性能优于各种 SHA-2 哈希(现在可能还没有成为交易破坏者,但 10 年前可能是症结所在),以及存储大小SHA-2 更大。
部分链接:
- Whosebug question on what would happen if a collision did occur in Git
- Newsgroup post showing a brief comment from Linus on the subject a couple of months after the main SHA-1 weakness became known in 2005
- A thread discussing the weakness and possible move to sha-256 (with replies from Linus) in 2006
- NIST statement on SHA-1 deprecation and recommending "to transition rapidly to the stronger SHA-2 family of hash functions"
我个人的观点是,虽然实际攻击可能需要一段时间,而且即使它们确实发生了,人们最初也可能会通过改变哈希算法本身以外的方式来减轻它们,如果你真的关心安全您应该谨慎选择算法,并不断向上修正您的安全强度,因为攻击者的能力也只会朝一个方向发展,因此将 Git 当作是不明智的一个榜样,尤其是因为它使用 SHA-1 的目的并不是为了加密安全。
这是关于从 SHA1 迁移到 Mercurial 的紧迫性的讨论,但它也适用于 Git:https://www.mercurial-scm.org/wiki/mpm/SHA1
简而言之:如果你今天不是非常勤奋,你的漏洞比 sha1 严重得多。但尽管如此,Mercurial 在 10 多年前就开始准备从 sha1 迁移。
work has been underway for years to retrofit Mercurial's data structures and protocols for SHA1's successors. Storage space was allocated for larger hashes in our revlog structure over 10 years ago in Mercurial 0.9 with the the introduction of RevlogNG. The bundle2 format introduced more recently supports the exchange of different hash types over the network. The only remaining pieces are choice of a replacement function and choosing a backwards-compatibility strategy.
如果 git 没有在 Mercurial 之前从 sha1 迁移出去,您总是可以通过使用 hg-git.
保留本地 Mercurial 镜像来增加另一级别的安全性现在有一个transition plan to a stronger hash, so it looks like in future it will use a more modern hash than SHA-1. From the current transition plan:
Some hashes under consideration are SHA-256, SHA-512/256, SHA-256x16, K12, and BLAKE2bp-256
Why does it not use a more modern version of SHA?
十二月2017:会的。 Git 2.16(2018 年第一季度)是第一个说明和实现该意图的版本。
注意:见下面的Git 2.19:它将是SHA-256.
Git 2.16 将提出一个基础架构来定义 Git 中使用的哈希函数,并将开始努力在各种代码路径中探索它。
参见 commit c250e02 (28 Nov 2017) by Ramsay Jones (``)。
参见 commit eb0ccfd, commit 78a6766, commit f50e766, commit abade65 (12 Nov 2017) by brian m. carlson (bk2204
)。
(由 Junio C Hamano -- gitster
-- in commit 721cc43 合并,2017 年 12 月 13 日)
Add structure representing hash algorithm
Since in the future we want to support an additional hash algorithm, add a structure that represents a hash algorithm and all the data that must go along with it.
Add a constant to allow easy enumeration of hash algorithms.
Implement functiontypedefs
to create an abstract API that can be used by any hash algorithm, and wrappers for the existing SHA1 functions that conform to this API.Expose a value for hex size as well as binary size.
While one will always be twice the other, the two values are both used extremely commonly throughout the codebase and providing both leads to improved readability.Don't include an entry in the hash algorithm structure for the null object ID.
As this value is all zeros, any suitably sized all-zero object ID can be used, and there's no need to store a given one on a per-hash basis.The current hash function transition plan envisions a time when we will accept input from the user that might be in SHA-1 or in the NewHash format.
Since we cannot know which the user has provided, add a constant representing the unknown algorithm to allow us to indicate that we must look the correct value up.
Integrate hash algorithm support with repo setup
In future versions of Git, we plan to support an additional hash algorithm.
Integrate the enumeration of hash algorithms with repository setup, and store a pointer to the enumerated data in struct repository.
Of course, we currently only support SHA-1, so hard-code this value inread_repository_format
.
In the future, we'll enumerate this value from the configuration.Add a constant,
the_hash_algo
, which points to thehash_algo
structure pointer in the repository global.
Note that this is the hash which is used to serialize data to disk, not the hash which is used to display items to the user.
The transition plan anticipates that these may be different.
We can add an additional element in the future (say,ui_hash_algo
) to provide for this case.
2018 年 8 月更新,对于 Git 2.19(2018 年第 3 季度),Git 似乎选择 SHA-256 作为 NewHash。
参见 commit 0ed8d8d (04 Aug 2018) by Jonathan Nieder (artagnon
)。
参见 commit 13f5e09 (25 Jul 2018) by Ævar Arnfjörð Bjarmason (avar
)。
(由 Junio C Hamano -- gitster
-- in commit 34f2297 合并,2018 年 8 月 20 日)
doc
hash-function-transition
: pick SHA-256 as NewHashFrom a security perspective, it seems that SHA-256, BLAKE2, SHA3-256, K12, and so on are all believed to have similar security properties.
All are good options from a security point of view.SHA-256 has a number of advantages:
It has been around for a while, is widely used, and is supported by just about every single crypto library (OpenSSL, mbedTLS, CryptoNG, SecureTransport, etc).
When you compare against SHA1DC, most vectorized SHA-256 implementations are indeed faster, even without acceleration.
If we're doing signatures with OpenPGP (or even, I suppose, CMS), we're going to be using SHA-2, so it doesn't make sense to have our security depend on two separate algorithms when either one of them alone could break the security when we could just depend on one.
So SHA-256 it is.
Update the hash-function-transition design doc to say so.After this patch, there are no remaining instances of the string "
NewHash
", except for an unrelated use from 2008 as a variable name int/t9700/test.pl
.
您可以在 Git 2.20(2018 年第 4 季度)中看到向 SHA 256 的过渡:
参见 commit 0d7c419, commit dda6346, commit eccb5a5, commit 93eb00f, commit d8a3a69, commit fbd0e37, commit f690b6b, commit 49d1660, commit 268babd, commit fa13080, commit 7b5e614, commit 58ce21b, commit 2f0c9e9, commit 825544a (15 Oct 2018) by brian m. carlson (bk2204
)。
参见 commit 6afedba (15 Oct 2018) by SZEDER Gábor (szeder
)。
(由 Junio C Hamano -- gitster
-- in commit d829d49 合并,2018 年 10 月 30 日)
replace hard-coded constants
Replace several 40-based constants with references to
GIT_MAX_HEXSZ
orthe_hash_algo
, as appropriate.
Convert all uses of theGIT_SHA1_HEXSZ
to usethe_hash_algo
so that they are appropriate for any given hash length.
Instead of using a hard-coded constant for the size of a hex object ID, switch to use the computed pointer fromparse_oid_hex
that points after the parsed object ID.
GIT_SHA1_HEXSZ
进一步 remove/replaced 与 Git 2.22(2019 年第二季度)和 commit d4e568b.
这种转变在 Git 2.21(2019 年第一季度)中继续,它添加了 sha-256 哈希并将其插入代码以允许使用 Git 构建 Git。
参见 commit 4b4e291, commit 27dc04c, commit 13eeedb, commit c166599, commit 37649b7, commit a2ce0a7, commit 50c817e, commit 9a3a0ff, commit 0dab712, commit 47edb64 (14 Nov 2018), and commit 2f90b9d, commit 1ccf07c (22 Oct 2018) by brian m. carlson (bk2204
)。
(由 Junio C Hamano -- gitster
-- in commit 33e4ae9 合并,2019 年 1 月 29 日)
Add a base implementation of SHA-256 support (Feb. 2019)
SHA-1 is weak and we need to transition to a new hash function.
For some time, we have referred to this new function asNewHash
.
Recently, we decided to pick SHA-256 asNewHash
.
The reasons behind the choice of SHA-256 are outlined in this thread and in the commit history for the hash function transition document.Add a basic implementation of SHA-256 based off
libtomcrypt
, which is in the public domain.
Optimize it and restructure it to meet our coding standards.
Pull in the update and final functions from the SHA-1 block implementation, as we know these function correctly with all compilers. This implementation is slower than SHA-1, but more performant implementations will be introduced in future commits.Wire up SHA-256 in the list of hash algorithms, and add a test that the algorithm works correctly.
Note that with this patch, it is still not possible to switch to using SHA-256 in Git.
Additional patches are needed to prepare the code to handle a larger hash algorithm and further test fixes are needed.
hash
: add an SHA-256 implementation using OpenSSLWe already have OpenSSL routines available for SHA-1, so add routines for SHA-256 as well.
On a Core i7-6600U, this SHA-256 implementation compares favorably to the SHA1DC SHA-1 implementation:
SHA-1: 157 MiB/s (64 byte chunks); 337 MiB/s (16 KiB chunks) SHA-256: 165 MiB/s (64 byte chunks); 408 MiB/s (16 KiB chunks)
sha256
: add an SHA-256 implementation usinglibgcrypt
Generally, one gets better performance out of cryptographic routines written in assembly than C, and this is also true for SHA-256.
In addition, most Linux distributions cannot distribute Git linked against OpenSSL for licensing reasons.Most systems with GnuPG will also have
libgcrypt
, since it is a dependency of GnuPG.
libgcrypt
is also faster than the SHA1DC implementation for messages of a few KiB and larger.For comparison, on a Core i7-6600U, this implementation processes 16 KiB chunks at 355 MiB/s while SHA1DC processes equivalent chunks at 337 MiB/s.
In addition, libgcrypt is licensed under the LGPL 2.1, which is compatible with the GPL. Add an implementation of SHA-256 that uses libgcrypt.
升级工作继续 Git 2.24(2019 年第 4 季度)
参见 commit aaa95df, commit be8e172, commit 3f34d70, commit fc06be3, commit 69fa337, commit 3a4d7aa, commit e0cb7cd, commit 8d4d86b, commit f6ca67d, commit dd336a5, commit 894c0f6, commit 4439c7a, commit 95518fa, commit e84f357, commit fe9fec4, commit 976ff7e, commit 703d2d4, commit 9d958cc, commit 7962e04, commit fee4930 (18 Aug 2019) by brian m. carlson (bk2204
)。
(由 Junio C Hamano -- gitster
-- in commit 676278f 合并,2019 年 10 月 11 日)
Instead of using
GIT_SHA1_HEXSZ
and hard-coded constants, switch to usingthe_hash_algo
.
在 Git 2.26(2020 年第一季度)中,测试脚本 已准备好迎接对象名称将使用 SHA-256 的那一天。
参见 commit 277eb5a, commit 44b6c05, commit 7a868c5, commit 1b8f39f, commit a8c17e3, commit 8320722, commit 74ad99b, commit ba1be1a, commit cba472d, commit 82d5aeb, commit 3c5e65c, commit 235d3cd, commit 1d86c8f, commit 525a7f1, commit 7a1bcb2, commit cb78f4f, commit 717c939, commit 08a9dd8, commit 215b60b, commit 194264c (21 Dec 2019) by brian m. carlson (bk2204
)。
(由 Junio C Hamano -- gitster
-- in commit f52ab33 合并,2020 年 2 月 5 日)
示例:
t4204
: make hash size independentSigned-off-by: brian m. carlson
Use
$OID_REGEX
instead of a hard-coded regular expression.
所以,而不是使用:
grep "^[a-f0-9]\{40\} $(git rev-parse HEAD)$" output
测试正在使用
grep "^$OID_REGEX $(git rev-parse HEAD)$" output
而OID_REGEX
来自commit bdee9cd (13 May 2018) by brian m. carlson (bk2204
)。
(由 Junio C Hamano -- gitster
-- in commit 9472b13 合并,2018 年 5 月 30 日,Git v2.18.0-rc0)
t/test-lib
: introduceOID_REGEX
Signed-off-by: brian m. carlson
Currently we have a variable,
$_x40,
which contains a regex that matches a full 40-character hex constant.However, with
NewHash
, we'll have object IDs that are longer than 40 characters.In such a case,
$_x40
will be a confusing name.Create a
$OID_REGEX
variable which will always reflect a regex matching the appropriate object ID, regardless of the length of the current hash.
并且,仍然用于测试:
参见 commit f303765, commit edf0424, commit 5db24dc, commit d341e08, commit 88ed241, commit 48c10cc, commit f7ae8e6, commit e70649b, commit a30f93b, commit a79eec2, commit 796d138, commit 417e45e, commit dfa5f53, commit f743e8f, commit 72f936b, commit 5df0f11, commit 07877f3, commit 6025e89, commit 7b1a182, commit 94db7e3, commit db12505 (07 Feb 2020) by brian m. carlson (bk2204
)。
(由 Junio C Hamano -- gitster
-- in commit 5af345a 合并,2020 年 2 月 17 日)
t5703
: make test work with SHA-256Signed-off-by: brian m. carlson
This test used an object ID which was 40 hex characters in length, causing the test not only not to pass, but to hang, when run with SHA-256 as the hash.
Change this value to a fixed dummy object ID using
test_oid_init
andtest_oid
.Furthermore, ensure we extract an object ID of the appropriate length using cut with fields instead of a fixed length.
一些代码路径被赋予了一个存储库实例作为在存储库中工作的参数,但是将 the_repository
实例传递给了它的被调用者,它已经被 Git 2.26(Q1 2020).
参见commit b98d188, commit 2dcde20, commit 7ad5c44, commit c8123e7, commit 5ec9b8a, commit a651946, commit eb999b3 (30 Jan 2020) by Matheus Tavares (matheustavares
)。
(由 Junio C Hamano -- gitster
-- in commit 78e67cd 合并,2020 年 2 月 14 日)
sha1-file
: allowcheck_object_signature()
to handle any repoSigned-off-by: Matheus Tavares
Some callers of
check_object_signature()
can work on arbitrary repositories, but the repo does not get passed to this function. Instead,the_repository
is always used internally.
To fix possible inconsistencies, allow the function to receive a struct repository and make those callers pass on the repo being handled.
基于:
sha1-file
: passgit_hash_algo
tohash_object_file()
Signed-off-by: Matheus Tavares
Allow
hash_object_file()
to work on arbitrary repos by introducing agit_hash_algo
parameter. Change callers which have a struct repository pointer in their scope to pass on thegit_hash_algo
from the said repo.
For all other callers, pass onthe_hash_algo
, which was already being used internally athash_object_file()
.
This functionality will be used in the following patch to makecheck_object_signature()
be able to work on arbitrary repos (which, in turn, will be used to fix an inconsistency atobject.c
:parse_object()).