GIT 并行克隆所有存储库,即克隆所有存储库所花费的总时间接近于最大存储库所需的时间:致命:索引包失败

GIT clone all repositories in parallel i.e. total time taken to clone all is close to what you'd take for the largest repo: fatal: index-pack failed

好的。 Mac OS.

alias gcurl
alias gcurl='curl -s -H "Authorization: token IcIcv21a5b20681e7eb8fe7a86ced5f9dbhahaLOL" '

echo $IG_API_URL 
https://someinstance-git.mycompany.com/api/v3

运行 查看以下内容:用户有权访问的所有组织的列表。 注意: 给新用户(在这里传递 $IG_API_URL 会给你所有可以使用的 REST 端点)

gcurl ${IG_API/URL}/user/orgs

运行 上面给了我一个很好的 JSON 对象输出,我投入 jq 并得到了信息,现在我终于有了相应的 git url 我可以用来克隆一个 repo。

我创建了一个主回购文件:

git@someinstance-git.mycompany.com:someorg1:some-repo1.git
git@someinstance-git.mycompany.com:someorg1:some-repo2.git
git@someinstance-git.mycompany.com:someorg2:some-repo1.git
git@someinstance-git.mycompany.com:someorgN:some-repoM.git
...
....
some 1000+ such entries here in this file.

我创建了一个小的 oneliner 脚本(逐行阅读 - 我知道它是连续的但是)和 运行 git clone ,效果很好。

我讨厌并试图找到更好的解决方案的是:
1) 它是按顺序进行的,而且速度很慢(即一件一件地进行)。

2) 我想克隆所有存储库 在克隆最大存储库所需的最长时间内 。即如果回购 A 需要 3 秒,B 需要 20,C 需要 3,所有其他回购需要不到 10 秒,那么我想知道是否有一种方法可以在 20 下快速克隆所有回购-30 秒(相对于 3+20+3+...+...+... 秒 > 分钟,这会很多)。

为了做同样的事情,我尝试了我的思想贫困 运行 git 后台克隆步骤,这样我就可以更快地迭代以阅读这些行。

git clone ${git_url_line} $$_${datetimestamp}_${git_repo_fetch_from_url} &

嘿,脚本很快就结束了,运行 ps -eAf|egrep "ssh|git" 展示了一些有趣的东西 运行。巧合的是,其中一个人大喊 :) Incinga 正在显示一些非常高的很酷的指标。我认为这是由于我,但我想我可以做 N 不。从我的 GIT 个实例中克隆 git 个,而不会影响任何网络中断/奇怪的事情。

好的,事情 运行 成功了一段时间,我开始在屏幕上看到一堆 git 克隆输出。在第二个会话中,我看到文件夹被填充得很好,直到我终于看到我不期望的东西:

Resolving deltas: 100% (3392/3392), done.
remote: Total 5050 (delta 0), reused 0 (delta 0), pack-reused 5050
Receiving objects: 100% (5050/5050), 108.50 MiB | 1.60 MiB/s, done.
Resolving deltas: 100% (1777/1777), done.
remote: Total 10691 (delta 0), reused 0 (delta 0), pack-reused 10691
Receiving objects: 100% (10691/10691), 180.86 MiB | 1.57 MiB/s, done.
Resolving deltas: 100% (5148/5148), done.
remote: Total 5994 (delta 6), reused 0 (delta 0), pack-reused 5968
Receiving objects: 100% (5994/5994), 637.66 MiB | 2.61 MiB/s, done.
Resolving deltas: 100% (3017/3017), done.
Checking out files: 100% (794/794), done.
packet_write_wait: Connection to 10.20.30.40 port 22: Broken pipe
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

我怀疑您通过一次启动约 1000 个进程来耗尽本地计算机或远程计算机上的资源。您可能想限制启动的进程数。一种技术是使用 xargs.

如果您有权访问 GNU xargs,它可能看起来像这样:

xargs --replace -P10 git clone {} < repos.txt
  • -P10 是“10 个进程”
  • --replace - 将 {} 替换为映射参数

如果您受困于残缺的 BSD xargs,例如 osx(或想要更高的兼容性),您可以使用更便携的:

xargs -I{} -P10 git clone {} < repos.txt

这种形式也适用于 GNU xargs

感谢安东尼。

为了并行执行 GIT 克隆(直到给定的 xargs -P),我尝试了各种数字(-P5-P10-P15。 .., -P100,...-P<Limit_number_as_per_ulimit>, -P<No.of.processes_a_user_can_have_at_a_given_time>)。结论是坚持使用 xargs -P5-P10,因为 -P<N> 的数字更大,但并非每次都成功(由于资源问题() 在我运行的机器上 command/script).

如果增加 -P(N 值),您可能会看到如下错误:

packet_write_wait: Connection to 10.20.30.40 port 22: Broken pipe
or
fatal: The remote end hung up unexpectedly
or
fatal: early EOF
or
fatal: index-pack failed
or
sign_and_send_pubkey: signing failed: agent refused operation
or
ssh: connect to host somegit-instance.mycompany.com port 22: Operation timed out
fatal: Could not read from remote repository.

最终脚本:

#!/bin/bash

# Variables
pattern=""; # Create git pattern to fetch enteries from master config based upon user's parameters, defaults to blank.

usage() {
 echo -e "\nUsage:\n------\ngit-clone-repos.parallel.sh [usage | help | <pattern>]\n"
 echo "git-clone-repos.parallel.sh \"github.mycompany.com\"             .................................... (This will re-clone every repository under every org in Git instance 'github.mycompany.com')"
 echo "git-clone-repos.parallel.sh \"github.mycompany.com:tools-ansible-some-org\"  ................ (This will re-clone every repository under org: 'tools-ansible-some-org' in Git instance 'github.mycompany.com')"
 echo "git-clone-repos.parallel.sh \"somegit-instance.mycompany.com:coolrepo-org/somerepo.git\"  .... (This will re-clone repo: 'somerepo' in org: 'coolrepo-org' in Git instance: 'somegit-instance.mycompany.com')"
 echo -e "\n\n"
}

# If help/usage as first arg, show usage help
if [[ ("" == "usage" || "" == "help") || $# -eq 0 ]]; then usage; exit 0; fi

# Set pattern
pattern=""
mc_file=~/AKS/common/master-config.git-repos-ssh-urls.txt
echo "-- Master config file: $mc_file"; echo
echo "-- Pattern passed for fetching repos from master config file is: \"$pattern\""

# Create a workspace dir in PWD so that everything sits fresh in a new folder. Tweak it if you don't want it.
dir="$$_$(date +%s)"
mkdir ${dir} && cd $dir

# First create a temp repo file filtered by pattern and for '@' lines only (i.e. ignoring commented out lines)
tmprepofile=$(mktemp)
grep "${pattern}" ${mc_file} | grep '@' | cut -d':' -f3- > ${tmprepofile}

# GIT clone in parallel mode (xargs -P5 is optimal, -P10 can be used).
# Git a repo as a different name so that all repos in any organization in any instance clones without any conflict.
xargs -I{} -P10 bash -c 'git clone {} $(echo {} | cut -d'@' -f2 | sed "s#\:#__#g;s#/#__#g;s#\.git##")' < ${tmprepofile}

使用的示例主配置文件为:

#-- Sample Master Config file, which can be generated using GIT rest api - against a user's org to find all user org repositories (in my case) looks like:
## github coolrepo-org org/repogroup contains:
##-----------
github.mycompany.com:coolrepo-org:git@github.mycompany.com:coolrepo-org/somerepo1.git
github.mycompany.com:coolrepo-org:git@github.mycompany.com:coolrepo-org/somerepo2.git

## somegit-instance pipeline org/repogroup contains:
##-----------
somegit-instance.mycompany.com:pipeline:git@somegit-instance.mycompany.com:pipeline/shinynew-cool-pipeline.git

## !!!!! NO ORG ACCESS REPO ENTRIES BELOW !!!!! ##
## -----------------------------------------------
## somegit-instance Misc no access org but access at just repo level enteries contains:
##----------- (appended to the master file at the end of master file generation script) ---------
somegit-instance.mycompany.com:someorg-org:git@somegit-instance.mycompany.com:someorg-org/somerepofooter.git
somegit-instance.mycompany.com:someorg-org:git@somegit-instance.mycompany.com:someorg-org/somereponav.git