`git gc` 和 `git repack -ad; 有区别吗? git 修剪`?

Is there any difference between `git gc` and `git repack -ad; git prune`?

git gcgit repack -ad; git prune有区别吗?
如果是,git gc 将完成哪些额外步骤(反之亦然)?
在 space 优化或安全方面使用哪一个更好?

git help gc 包含一些提示...

The optional configuration variable gc.rerereresolved indicates how long records of conflicted merge you resolved earlier are kept.

The optional configuration variable gc.rerereunresolved indicates how long records of conflicted merge you have not resolved are kept.

我相信如果你只做 git repack -ad; git prune,这些都不会完成。

Is there any difference between git gc and git repack -ad; git prune?

不同之处在于,默认情况下 git gc 对于需要哪些内务处理任务非常保守。例如,它不会 运行 git repack 除非存储库中松散对象的数量超过某个阈值(可通过 gc.auto 变量配置)。此外,git gc 将要完成 运行 个任务,而不仅仅是 git repackgit prune

If yes, what additional steps will be done by git gc (or vice versa)?

根据 documentationgit gc 运行s:

  • git-prune
  • git-reflog
  • git-repack
  • git-rerere

更具体地说,通过查看 source code of gc.c (lines 338-343)1 我们可以看到它最多调用 以下命令:

  • pack-refs --all --prune
  • reflog expire --all
  • repack -d -l
  • prune --expire
  • worktree prune --expire
  • rerere gc

取决于包数(lines 121-126), it may run repack with -A option instead (lines 203-212):

* If there are too many loose objects, but not too many
* packs, we run "repack -d -l". If there are too many packs,
* we run "repack -A -d -l".  Otherwise we tell the caller
* there is no need.
if (too_many_packs())
    add_repack_all_option();
else if (!too_many_loose_objects())
    return 0;

关于 line 211-212 of the need_for_gc function 的通知,如果存储库中没有足够的松散对象,gc 根本就不是 运行。

这在 documentation 中进一步阐明:

Housekeeping is required if there are too many loose objects or too many packs in the repository. If the number of loose objects exceeds the value of the gc.auto configuration variable, then all loose objects are combined into a single pack using git repack -d -l. Setting the value of gc.auto to 0 disables automatic packing of loose objects.

If the number of packs exceeds the value of gc.autoPackLimit, then existing packs (except those marked with a .keep file) are consolidated into a single pack by using the -A option of git repack.

如您所见,git gc 努力根据存储库的状态做正确的事情。

Which one is better to use in regard to space optimization or safety?

一般来说,运行 git gc --auto 更好,因为它会做最少的必要工作来保持存储库的良好状态 – 安全且不会浪费太多资源。

但是,请记住垃圾收集可能已经在执行某些命令后自动触发,除非通过将 gc.auto 配置变量设置为 0.[=65= 来禁用此行为]

来自documentation

--auto
With this option, git gc checks whether any housekeeping is required; if not, it exits without performing any work. Some git commands run git gc --auto after performing operations that could create many loose objects.

因此,对于大多数存储库,您不需要经常明确地 运行 git gc,因为它已经为您处理好了。


1. 截至 2016-08-08 提交 a0a1831

请注意,git prune is run by git gc,前者已随着 Git 2.22(2019 年第 2 季度)

发展

"git prune" 已学会在可能时利用可达性位图。

参见 commit cc80c95, commit c2bf473, commit fde67d6, commit d55a30b (14 Feb 2019) by Jeff King (peff)
(由 Junio C Hamano -- gitster -- in commit f7213a3 合并,2019 年 3 月 7 日)

prune: use bitmaps for reachability traversal

Pruning generally has to traverse the whole commit graph in order to see which objects are reachable.
This is the exact problem that reachability bitmaps were meant to solve, so let's use them (if they're available, of course).

参见 reachability bitmap here

Here are timings on git.git:

Test                            HEAD^             HEAD
------------------------------------------------------------------------
5304.6: prune with bitmaps      3.65(3.56+0.09)   1.01(0.92+0.08) -72.3%

And on linux.git:

Test                            HEAD^               HEAD
--------------------------------------------------------------------------
5304.6: prune with bitmaps      35.05(34.79+0.23)   3.00(2.78+0.21) -91.4%

The tests show a pretty optimal case, as we'll have just repacked and should have pretty good coverage of all refs with our bitmaps.
But that's actually pretty realistic: normally prune is run via "gc" right after repacking.

Notes on the implementation: the change is actually in reachable.c, so it would improve reachability traversals by "reflog expire --stale-fix", as well.
Those aren't performed regularly, though (a normal "git gc" doesn't use --stale-fix), so they're not really worth measuring. There's a low chance of regressing that caller, since the use of bitmaps is totally transparent from the caller's perspective.

并且:

参见 commit fe6f2b0 (18 Apr 2019) by Jeff King (peff)
(由 Junio C Hamano -- gitster -- in commit d1311be 合并,2019 年 5 月 8 日)

prune: lazily perform reachability traversal

The general strategy of "git prune" is to do a full reachability walk, then for each loose object see if we found it in our walk.
But if we don't have any loose objects, we don't need to do the expensive walk in the first place.

This patch postpones that walk until the first time we need to see its results.

Note that this is really a specific case of a more general optimization, which is that we could traverse only far enough to find the object under consideration (i.e., stop the traversal when we find it, then pick up again when asked about the next object, etc).
That could save us in some instances from having to do a full walk. But it's actually a bit tricky to do with our traversal code, and you'd need to do a full walk anyway if you have even a single unreachable object (which you generally do, if any objects are actually left after running git-repack).

So in practice this lazy-load of the full walk catches one easy but common case (i.e., you've just repacked via git-gc, and there's nothing unreachable).

The perf script is fairly contrived, but it does show off the improvement:

 Test                            HEAD^             HEAD
 -------------------------------------------------------------------------
 5304.4: prune with no objects   3.66(3.60+0.05)   0.00(0.00+0.00) -100.0%

and would let us know if we accidentally regress this optimization.

Note also that we need to take special care with prune_shallow(), which relies on us having performed the traversal.
So this optimization can only kick in for a non-shallow repository. Since this is easy to get wrong and is not covered by existing tests, let's add an extra test to t5304 that covers this case explicitly.

prune: use bitmaps for reachability traversal

Pruning generally has to traverse the whole commit graph in order to see which objects are reachable.
This is the exact problem that reachability bitmaps were meant to solve, so let's use them (if they're available, of course).

Here are timings on git.git:

 Test                            HEAD^             HEAD
 ------------------------------------------------------------------------
 5304.6: prune with bitmaps      3.65(3.56+0.09)   1.01(0.92+0.08) -72.3%

然后 linux.git

 Test                            HEAD^               HEAD
 --------------------------------------------------------------------------
 5304.6: prune with bitmaps      35.05(34.79+0.23)   3.00(2.78+0.21) -91.4%

The tests show a pretty optimal case, as we'll have just repacked and should have pretty good coverage of all refs with our bitmaps.
But that's actually pretty realistic: normally prune is run via "gc" right after repacking.

A few notes on the implementation:

  • the change is actually in reachable.c, so it would improve reachability traversals by "reflog expire --stale-fix", as well.
    Those aren't performed regularly, though (a normal "git gc" doesn't use --stale-fix), so they're not really worth measuring.
    There's a low chance of regressing that caller, since the use of bitmaps is totally transparent from the caller's perspective.

  • The bitmap case could actually get away without creating a "struct object", and instead the caller could just look up each object id in the bitmap result. However, this would be a marginal improvement in runtime, and it would make the callers much more complicated.
    They'd have to handle both the bitmap and non-bitmap cases separately, and in the case of git-prune, we'd also have to tweak prune_shallow(), which relies on our SEEN flags.

  • Because we do create real object structs, we go through a few contortions to create ones of the right type.
    This isn't strictly necessary (lookup_unknown_object() would suffice), but it's more memory efficient to use the correct types, since we already know them.


当可达性位图生效时(自 2019 年 Git 2.22 起),错误地禁用了保护我们免受竞争影响的“不要丢失最近创建的对象和可从它们访问的对象”安全性:已通过 Git 2.32(2021 年第二季度)更正。

参见 commit 2ba582b, commit 1e951c6 (28 Apr 2021) by Jeff King (peff)
(由 Junio C Hamano -- gitster -- in commit 6e08cbd 合并,2021 年 5 月 7 日)

prune: save reachable-from-recent objects with bitmaps

Reported-by: David Emett
Signed-off-by: Jeff King

We pass our prune expiration to mark_reachable_objects(), which will traverse not only the reachable objects, but consider any recent ones as tips for reachability; see d3038d2 ("prune: keep objects reachable from recent objects", 2014-10-15, Git v2.2.0-rc0 -- merge) for details.

However, this interacts badly with the bitmap code path added in fde67d6 ("prune: use bitmaps for reachability traversal", 2019-02-13, Git v2.22.0-rc0 -- merge listed in batch #2).
If we hit the bitmap-optimized path, we return immediately to avoid the regular traversal, accidentally skipping the "also traverse recent" code.

Instead, we should do an if-else for the bitmap versus regular traversal, and then follow up with the "recent" traversal in either case.
This reuses the "rev_info" for a bitmap and then a regular traversal, but that should work OK (the bitmap code clears the pending array in the usual way, just like a regular traversal would).