Git 如何确定需要在存储库之间发送哪些对象？

Question

我看过 here 但不太明白我想知道的事情：git push 或 git pull 如何找出另一个缺少的提交对象边?

假设我们有一个包含以下提交的存储库：（字母代表 SHA-1 ID，d 是 refs/heads/master）

a -> b -> c -> d

相比之下，遥控器有这些：

a -> e -> f -> g

根据 git 文档，遥控器会告诉我们它的 refs/heads/master 位于 g，但由于我们不知道该提交，因此实际上并不知道告诉我们任何事情。这如何足以找出缺失的数据？

在另一个方向，文档说：

At this point, the fetch-pack process looks at what objects it has and responds with the objects that it needs by sending “want” and then the SHA-1 it wants. It sends all the objects it already has with “have” and then the SHA-1. At the end of this list, it writes “done” to initiate the upload-pack process to begin sending the packfile of the data it needs:

这解释了远程如何确定要发送的数据，但这不会影响包含许多对象的存储库的拉取性能吗？否则，文中的实际含义是什么？

显然，数据传输的方式因方向（推与拉）而异。这种设计选择遇到了哪些挑战以及如何应对，我如何理解文档中的描述？

Answer 1

神奇之处在于 ID。一个提交 ID 由很多东西组成，但基本上它是一个 SHA-1 hash 这个

内容（一切，不仅仅是差异）
作者
日期
日志消息
Parent ID

更改其中任何一项，您需要使用新 ID 创建新提交。请注意，包含 parent 个 ID。

这对 Git 意味着什么？这意味着如果我告诉你我已经提交 "ABC123" 而你已经提交 "ABC123" 我们知道我们有相同的提交，内容相同，作者相同，日期相同，消息相同 并且相同parents。这些 parent 具有相同的 ID，因此它们具有相同的内容、相同的作者、相同的日期、相同的消息、 和相同的 parent。等等。如果 ID 匹配，它们必须具有相同的历史记录，则无需进一步检查。这是 Git 的一大优势，它深深地融入了它的设计中，没有它你就无法理解 Git。

拉取是获取加合并。 git pull origin master 是 git fetch origin 加 git merge master origin/master（或 rebase 加上 --rebase）。提取看起来像这样...

remote @ http://example.com/project.git

                  F - G [bugfix]
                 /
A - B - C - D - E - J [master]
                     \
                      H - I [feature]

local
origin = http://example.com/project.git

                  F - G [origin/bugfix]
                 /
A - B - C - D - E [origin/master] [master]

[local]嘿远程，你有什么分支？
[远程] 我在 G 有错误修复。
[local]我在G也有bug修复！完毕。还有什么？
[remote] 我在 I.
[local] 我和我都没有功能。我的 parent 是什么？
[远程]我是parent是H.
[local]我没有H，H的parent是什么？
[远程] H 的 parent 是 J.
[local] 我没有 J。J 的 parent 是什么？
[远程] J 的 parent 是 E.
[local]我有E！请把 J、H 和 I 发给我。
[远程]好的，他们来了。
[local] 将 J、H 和 I 添加到 repo 并将 origin/feature 放在 I 好吧，你还有什么？
[远程]我在J有master
[local]我在E有master，你已经发给我J了。移动origin/master到J。还有什么？
[远程]就是这样！
[本地] Kthxbi

现在本地看起来像这样...

local
origin = http://example.com/project.git

                  F - G [origin/bugfix]
                 /
A - B - C - D - E [master] - J [origin/master]
                              \
                               H - I [origin/feature]

然后git merge master origin/master完成拉动，快进到J

推送是类似的，除了过程是相反的（本地发送提交到远程）并且它只会 fast-forward.

这就是 Pro Git refers to as "the dumb protocol" and is used when your remote is a simple HTTP server. The Smart Protocol 更经常使用的东西，不那么啰嗦，并且有很多优化。但您可以看到两者都非常高效。不需要传达整个历史，他们只需要发送 20 字节的哈希键，直到找到共同的祖先。

这里有一些资料来源和进一步阅读。

Git 如何确定需要在存储库之间发送哪些对象？

How does Git determine what objects need to be sent between repositories?

git

git-pull

git-push

git-fetch