如何下载大型 Git 存储库？

Question

我在 BitBucket 上有一个超过 4GB 的 GIT 存储库。

我无法使用正常的 GIT 命令克隆存储库，因为它失败了（看起来它工作了很长时间但随后回滚）。
我也无法从 BitBucket 界面以 zip 格式下载存储库：

Feature unavailable This repository is too large for us to generate a download.

有什么方法可以增量下载 GIT 存储库吗？

Answer 1

如果您不需要提取整个历史记录，您可以指定要克隆的修订数量

git clone <repo_url> --depth=1

当然，如果您的存储库中有一个特别大的文件，这可能无济于事

Answer 2

一个潜在的技术就是克隆一个分支。然后你可以稍后再拉更多。做 git clone [url_of_remote] --branch [branch_name] --single-branch.

大型存储库似乎是 git 的主要弱点。您可以在 http://www.sitepoint.com/managing-huge-repositories-with-git/. This article mentions a git extension called git-annex that can help with large files. Check it out at https://git-annex.branchable.com/ 阅读相关内容。它允许 git 无需将文件签入 git 即可管理文件。免责声明，我自己从未尝试过。

How do I clone a large Git repository on an unreliable connection? 中的一些解决方案也可能有所帮助。

编辑：由于您只需要这些文件，您可以尝试 git archive。你会使用类似于

的语法

git archive --remote=ssh://git@bitbucket.org/username/reponame.git --format=tar --output="file.tar" master

我试图在我的 AWS Codecommit 账户上的一个回购上进行测试，但它似乎不允许。 BitBucket 上的某人可能能够测试。请注意，在 Windows 上，您希望使用 zip 而不是 tar，并且这一切都必须通过 ssh 连接而不是 https 来完成。

在 http://git-scm.com/docs/git-archive

阅读更多关于 git archive 的信息

Answer 3

我用这个方法让它工作fatal: early EOF fatal: index-pack failed

但仅在我设置 SSL 之后 - 此方法仍然无法通过 HTTP 工作。

BitBucket 的支持真的很有帮助，为我指明了这个方向。

Answer 4

对我来说，帮助非常好，就像这个答案中描述的那样：，但有一点改进，因为大回购：

最初：

git config --global core.compression 0

然后，只克隆您的存储库的一部分：

git clone --depth 1 <repo_URI>

现在 "the rest"

git fetch --unshallow

但这里有诀窍：当你有一个大的回购时，有时你必须多次执行该步骤。所以...再次，

git fetch --unshallow

等等。

多试几次。可能您会看到，每次执行 'unshallow' 时，您都会在错误之前获得越来越多的对象。

最后，只是为了确定。

git pull --all

Answer 5

BitBucket 应该有办法为大型回购构建存档 Git 2.13.x/2.14（2017 年第 3 季度）

见commit 867e40f (30 Apr 2017), commit ebdfa29 (27 Apr 2017), commit 4cdf3f9, commit af95749, commit 3c78fd8, commit c061a14, and commit 758c1f9, by Rene Scharfe。
^{（由 Junio C Hamano -- gitster -- in commit f085834 合并，2017 年 5 月 16 日）}

archive-zip: support files bigger than 4GB

Write a zip64 extended information extra field for big files as part of their local headers and as part of their central directory headers.
Also write a zip64 version of the data descriptor in that case.

If we're streaming then we don't know the compressed size at the time we write the header. Deflate can end up making a file bigger instead of smaller if we're unlucky.
Write a local zip64 header already for files with a size of 2GB or more in this case to be on the safe side.

Both sizes need to be included in the local zip64 header, but the extra field for the directory must only contain 64-bit equivalents for 32-bit values of 0xffffffff.

Answer 6

您只能克隆第一个提交，然后克隆第二个提交...等等。如果两次提交之间的差异不是很大，将更容易拉取。你可以从这个answer.

看到更多细节

Answer 7

1) 您最初可以下载只有最新提交修订版（深度=1）的单个分支，这将显着减少要下载的 repo 的大小并且仍然让您在代码库上工作：

git clone --depth <Number> <repository> --branch <branch name> --single-branch

示例：
git clone --depth 1 https://github.com/dundermifflin/dwightsecrets.git --branch scranton --single-branch

2) 稍后你可以获得所有提交（在此之后你的 repo 将处于与 git 克隆之后相同的状态）：

git fetch --unshallow

或者如果仍然太多，则只获取最后 25 次提交：

git fetch --depth=25

其他方式： git clone 不可恢复，但您可以先 git clone 在第三方服务器上，然后通过 [=36 下载完整的 repo =] 这实际上是可恢复的。

如何下载大型 Git 存储库？

How do I download a large Git Repository?

git

bitbucket

`archive-zip`: support files bigger than 4GB

如何下载大型 Git 存储库？

How do I download a large Git Repository?

git

bitbucket

archive-zip: support files bigger than 4GB

`archive-zip`: support files bigger than 4GB