GIT 并行克隆所有存储库,即克隆所有存储库所花费的总时间接近于最大存储库所需的时间:致命:索引包失败
GIT clone all repositories in parallel i.e. total time taken to clone all is close to what you'd take for the largest repo: fatal: index-pack failed
好的。 Mac OS.
alias gcurl
alias gcurl='curl -s -H "Authorization: token IcIcv21a5b20681e7eb8fe7a86ced5f9dbhahaLOL" '
echo $IG_API_URL
https://someinstance-git.mycompany.com/api/v3
运行 查看以下内容:用户有权访问的所有组织的列表。
注意: 给新用户(在这里传递 $IG_API_URL 会给你所有可以使用的 REST 端点)。
gcurl ${IG_API/URL}/user/orgs
运行 上面给了我一个很好的 JSON 对象输出,我投入 jq
并得到了信息,现在我终于有了相应的 git url 我可以用来克隆一个 repo。
我创建了一个主回购文件:
git@someinstance-git.mycompany.com:someorg1:some-repo1.git
git@someinstance-git.mycompany.com:someorg1:some-repo2.git
git@someinstance-git.mycompany.com:someorg2:some-repo1.git
git@someinstance-git.mycompany.com:someorgN:some-repoM.git
...
....
some 1000+ such entries here in this file.
我创建了一个小的 oneliner 脚本(逐行阅读 - 我知道它是连续的但是)和 运行 git clone ,效果很好。
我讨厌并试图找到更好的解决方案的是:
1) 它是按顺序进行的,而且速度很慢(即一件一件地进行)。
2) 我想克隆所有存储库 在克隆最大存储库所需的最长时间内 。即如果回购 A 需要 3 秒,B 需要 20,C 需要 3,所有其他回购需要不到 10 秒,那么我想知道是否有一种方法可以在 20 下快速克隆所有回购-30 秒(相对于 3+20+3+...+...+... 秒 > 分钟,这会很多)。
为了做同样的事情,我尝试了我的思想贫困 运行 git 后台克隆步骤,这样我就可以更快地迭代以阅读这些行。
git clone ${git_url_line} $$_${datetimestamp}_${git_repo_fetch_from_url} &
嘿,脚本很快就结束了,运行 ps -eAf|egrep "ssh|git"
展示了一些有趣的东西 运行。巧合的是,其中一个人大喊 :) Incinga 正在显示一些非常高的很酷的指标。我认为这是由于我,但我想我可以做 N 不。从我的 GIT 个实例中克隆 git 个,而不会影响任何网络中断/奇怪的事情。
好的,事情 运行 成功了一段时间,我开始在屏幕上看到一堆 git 克隆输出。在第二个会话中,我看到文件夹被填充得很好,直到我终于看到我不期望的东西:
Resolving deltas: 100% (3392/3392), done.
remote: Total 5050 (delta 0), reused 0 (delta 0), pack-reused 5050
Receiving objects: 100% (5050/5050), 108.50 MiB | 1.60 MiB/s, done.
Resolving deltas: 100% (1777/1777), done.
remote: Total 10691 (delta 0), reused 0 (delta 0), pack-reused 10691
Receiving objects: 100% (10691/10691), 180.86 MiB | 1.57 MiB/s, done.
Resolving deltas: 100% (5148/5148), done.
remote: Total 5994 (delta 6), reused 0 (delta 0), pack-reused 5968
Receiving objects: 100% (5994/5994), 637.66 MiB | 2.61 MiB/s, done.
Resolving deltas: 100% (3017/3017), done.
Checking out files: 100% (794/794), done.
packet_write_wait: Connection to 10.20.30.40 port 22: Broken pipe
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
我怀疑您通过一次启动约 1000 个进程来耗尽本地计算机或远程计算机上的资源。您可能想限制启动的进程数。一种技术是使用 xargs
.
如果您有权访问 GNU xargs,它可能看起来像这样:
xargs --replace -P10 git clone {} < repos.txt
-P10
是“10 个进程”
--replace
- 将 {}
替换为映射参数
如果您受困于残缺的 BSD xargs
,例如 osx(或想要更高的兼容性),您可以使用更便携的:
xargs -I{} -P10 git clone {} < repos.txt
这种形式也适用于 GNU xargs
感谢安东尼。
为了并行执行 GIT 克隆(直到给定的 xargs -P),我尝试了各种数字(-P5
、-P10
、-P15
。 .., -P100
,...-P<Limit_number_as_per_ulimit>
, -P<No.of.processes_a_user_can_have_at_a_given_time>
)。结论是坚持使用 xargs -P5
或 -P10
,因为 -P<N>
的数字更大,但并非每次都成功(由于资源问题() 在我运行的机器上 command/script).
如果增加 -P(N 值),您可能会看到如下错误:
packet_write_wait: Connection to 10.20.30.40 port 22: Broken pipe
or
fatal: The remote end hung up unexpectedly
or
fatal: early EOF
or
fatal: index-pack failed
or
sign_and_send_pubkey: signing failed: agent refused operation
or
ssh: connect to host somegit-instance.mycompany.com port 22: Operation timed out
fatal: Could not read from remote repository.
最终脚本:
#!/bin/bash
# Variables
pattern=""; # Create git pattern to fetch enteries from master config based upon user's parameters, defaults to blank.
usage() {
echo -e "\nUsage:\n------\ngit-clone-repos.parallel.sh [usage | help | <pattern>]\n"
echo "git-clone-repos.parallel.sh \"github.mycompany.com\" .................................... (This will re-clone every repository under every org in Git instance 'github.mycompany.com')"
echo "git-clone-repos.parallel.sh \"github.mycompany.com:tools-ansible-some-org\" ................ (This will re-clone every repository under org: 'tools-ansible-some-org' in Git instance 'github.mycompany.com')"
echo "git-clone-repos.parallel.sh \"somegit-instance.mycompany.com:coolrepo-org/somerepo.git\" .... (This will re-clone repo: 'somerepo' in org: 'coolrepo-org' in Git instance: 'somegit-instance.mycompany.com')"
echo -e "\n\n"
}
# If help/usage as first arg, show usage help
if [[ ("" == "usage" || "" == "help") || $# -eq 0 ]]; then usage; exit 0; fi
# Set pattern
pattern=""
mc_file=~/AKS/common/master-config.git-repos-ssh-urls.txt
echo "-- Master config file: $mc_file"; echo
echo "-- Pattern passed for fetching repos from master config file is: \"$pattern\""
# Create a workspace dir in PWD so that everything sits fresh in a new folder. Tweak it if you don't want it.
dir="$$_$(date +%s)"
mkdir ${dir} && cd $dir
# First create a temp repo file filtered by pattern and for '@' lines only (i.e. ignoring commented out lines)
tmprepofile=$(mktemp)
grep "${pattern}" ${mc_file} | grep '@' | cut -d':' -f3- > ${tmprepofile}
# GIT clone in parallel mode (xargs -P5 is optimal, -P10 can be used).
# Git a repo as a different name so that all repos in any organization in any instance clones without any conflict.
xargs -I{} -P10 bash -c 'git clone {} $(echo {} | cut -d'@' -f2 | sed "s#\:#__#g;s#/#__#g;s#\.git##")' < ${tmprepofile}
使用的示例主配置文件为:
#-- Sample Master Config file, which can be generated using GIT rest api - against a user's org to find all user org repositories (in my case) looks like:
## github coolrepo-org org/repogroup contains:
##-----------
github.mycompany.com:coolrepo-org:git@github.mycompany.com:coolrepo-org/somerepo1.git
github.mycompany.com:coolrepo-org:git@github.mycompany.com:coolrepo-org/somerepo2.git
## somegit-instance pipeline org/repogroup contains:
##-----------
somegit-instance.mycompany.com:pipeline:git@somegit-instance.mycompany.com:pipeline/shinynew-cool-pipeline.git
## !!!!! NO ORG ACCESS REPO ENTRIES BELOW !!!!! ##
## -----------------------------------------------
## somegit-instance Misc no access org but access at just repo level enteries contains:
##----------- (appended to the master file at the end of master file generation script) ---------
somegit-instance.mycompany.com:someorg-org:git@somegit-instance.mycompany.com:someorg-org/somerepofooter.git
somegit-instance.mycompany.com:someorg-org:git@somegit-instance.mycompany.com:someorg-org/somereponav.git
好的。 Mac OS.
alias gcurl
alias gcurl='curl -s -H "Authorization: token IcIcv21a5b20681e7eb8fe7a86ced5f9dbhahaLOL" '
echo $IG_API_URL
https://someinstance-git.mycompany.com/api/v3
运行 查看以下内容:用户有权访问的所有组织的列表。 注意: 给新用户(在这里传递 $IG_API_URL 会给你所有可以使用的 REST 端点)。
gcurl ${IG_API/URL}/user/orgs
运行 上面给了我一个很好的 JSON 对象输出,我投入 jq
并得到了信息,现在我终于有了相应的 git url 我可以用来克隆一个 repo。
我创建了一个主回购文件:
git@someinstance-git.mycompany.com:someorg1:some-repo1.git
git@someinstance-git.mycompany.com:someorg1:some-repo2.git
git@someinstance-git.mycompany.com:someorg2:some-repo1.git
git@someinstance-git.mycompany.com:someorgN:some-repoM.git
...
....
some 1000+ such entries here in this file.
我创建了一个小的 oneliner 脚本(逐行阅读 - 我知道它是连续的但是)和 运行 git clone ,效果很好。
我讨厌并试图找到更好的解决方案的是:
1) 它是按顺序进行的,而且速度很慢(即一件一件地进行)。
2) 我想克隆所有存储库 在克隆最大存储库所需的最长时间内 。即如果回购 A 需要 3 秒,B 需要 20,C 需要 3,所有其他回购需要不到 10 秒,那么我想知道是否有一种方法可以在 20 下快速克隆所有回购-30 秒(相对于 3+20+3+...+...+... 秒 > 分钟,这会很多)。
为了做同样的事情,我尝试了我的思想贫困 运行 git 后台克隆步骤,这样我就可以更快地迭代以阅读这些行。
git clone ${git_url_line} $$_${datetimestamp}_${git_repo_fetch_from_url} &
嘿,脚本很快就结束了,运行 ps -eAf|egrep "ssh|git"
展示了一些有趣的东西 运行。巧合的是,其中一个人大喊 :) Incinga 正在显示一些非常高的很酷的指标。我认为这是由于我,但我想我可以做 N 不。从我的 GIT 个实例中克隆 git 个,而不会影响任何网络中断/奇怪的事情。
好的,事情 运行 成功了一段时间,我开始在屏幕上看到一堆 git 克隆输出。在第二个会话中,我看到文件夹被填充得很好,直到我终于看到我不期望的东西:
Resolving deltas: 100% (3392/3392), done.
remote: Total 5050 (delta 0), reused 0 (delta 0), pack-reused 5050
Receiving objects: 100% (5050/5050), 108.50 MiB | 1.60 MiB/s, done.
Resolving deltas: 100% (1777/1777), done.
remote: Total 10691 (delta 0), reused 0 (delta 0), pack-reused 10691
Receiving objects: 100% (10691/10691), 180.86 MiB | 1.57 MiB/s, done.
Resolving deltas: 100% (5148/5148), done.
remote: Total 5994 (delta 6), reused 0 (delta 0), pack-reused 5968
Receiving objects: 100% (5994/5994), 637.66 MiB | 2.61 MiB/s, done.
Resolving deltas: 100% (3017/3017), done.
Checking out files: 100% (794/794), done.
packet_write_wait: Connection to 10.20.30.40 port 22: Broken pipe
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
我怀疑您通过一次启动约 1000 个进程来耗尽本地计算机或远程计算机上的资源。您可能想限制启动的进程数。一种技术是使用 xargs
.
如果您有权访问 GNU xargs,它可能看起来像这样:
xargs --replace -P10 git clone {} < repos.txt
-P10
是“10 个进程”--replace
- 将{}
替换为映射参数
如果您受困于残缺的 BSD xargs
,例如 osx(或想要更高的兼容性),您可以使用更便携的:
xargs -I{} -P10 git clone {} < repos.txt
这种形式也适用于 GNU xargs
感谢安东尼。
为了并行执行 GIT 克隆(直到给定的 xargs -P),我尝试了各种数字(-P5
、-P10
、-P15
。 .., -P100
,...-P<Limit_number_as_per_ulimit>
, -P<No.of.processes_a_user_can_have_at_a_given_time>
)。结论是坚持使用 xargs -P5
或 -P10
,因为 -P<N>
的数字更大,但并非每次都成功(由于资源问题() 在我运行的机器上 command/script).
如果增加 -P(N 值),您可能会看到如下错误:
packet_write_wait: Connection to 10.20.30.40 port 22: Broken pipe
or
fatal: The remote end hung up unexpectedly
or
fatal: early EOF
or
fatal: index-pack failed
or
sign_and_send_pubkey: signing failed: agent refused operation
or
ssh: connect to host somegit-instance.mycompany.com port 22: Operation timed out
fatal: Could not read from remote repository.
最终脚本:
#!/bin/bash
# Variables
pattern=""; # Create git pattern to fetch enteries from master config based upon user's parameters, defaults to blank.
usage() {
echo -e "\nUsage:\n------\ngit-clone-repos.parallel.sh [usage | help | <pattern>]\n"
echo "git-clone-repos.parallel.sh \"github.mycompany.com\" .................................... (This will re-clone every repository under every org in Git instance 'github.mycompany.com')"
echo "git-clone-repos.parallel.sh \"github.mycompany.com:tools-ansible-some-org\" ................ (This will re-clone every repository under org: 'tools-ansible-some-org' in Git instance 'github.mycompany.com')"
echo "git-clone-repos.parallel.sh \"somegit-instance.mycompany.com:coolrepo-org/somerepo.git\" .... (This will re-clone repo: 'somerepo' in org: 'coolrepo-org' in Git instance: 'somegit-instance.mycompany.com')"
echo -e "\n\n"
}
# If help/usage as first arg, show usage help
if [[ ("" == "usage" || "" == "help") || $# -eq 0 ]]; then usage; exit 0; fi
# Set pattern
pattern=""
mc_file=~/AKS/common/master-config.git-repos-ssh-urls.txt
echo "-- Master config file: $mc_file"; echo
echo "-- Pattern passed for fetching repos from master config file is: \"$pattern\""
# Create a workspace dir in PWD so that everything sits fresh in a new folder. Tweak it if you don't want it.
dir="$$_$(date +%s)"
mkdir ${dir} && cd $dir
# First create a temp repo file filtered by pattern and for '@' lines only (i.e. ignoring commented out lines)
tmprepofile=$(mktemp)
grep "${pattern}" ${mc_file} | grep '@' | cut -d':' -f3- > ${tmprepofile}
# GIT clone in parallel mode (xargs -P5 is optimal, -P10 can be used).
# Git a repo as a different name so that all repos in any organization in any instance clones without any conflict.
xargs -I{} -P10 bash -c 'git clone {} $(echo {} | cut -d'@' -f2 | sed "s#\:#__#g;s#/#__#g;s#\.git##")' < ${tmprepofile}
使用的示例主配置文件为:
#-- Sample Master Config file, which can be generated using GIT rest api - against a user's org to find all user org repositories (in my case) looks like:
## github coolrepo-org org/repogroup contains:
##-----------
github.mycompany.com:coolrepo-org:git@github.mycompany.com:coolrepo-org/somerepo1.git
github.mycompany.com:coolrepo-org:git@github.mycompany.com:coolrepo-org/somerepo2.git
## somegit-instance pipeline org/repogroup contains:
##-----------
somegit-instance.mycompany.com:pipeline:git@somegit-instance.mycompany.com:pipeline/shinynew-cool-pipeline.git
## !!!!! NO ORG ACCESS REPO ENTRIES BELOW !!!!! ##
## -----------------------------------------------
## somegit-instance Misc no access org but access at just repo level enteries contains:
##----------- (appended to the master file at the end of master file generation script) ---------
somegit-instance.mycompany.com:someorg-org:git@somegit-instance.mycompany.com:someorg-org/somerepofooter.git
somegit-instance.mycompany.com:someorg-org:git@somegit-instance.mycompany.com:someorg-org/somereponav.git