如何使用 Git 中变更集的重新生成的哈希 ID 制作某些分支的副本?

How can I make a copy of some branch with regenerated hash ids of changesets in Git?

我可以通过多种方式复制任何分支,但让我们采取:

git branch copyBranch

它将创建一个与当前分支具有相同状态的新分支。

如果使用 git log 你可以看到,提交的哈希 ID 是相同的。

我可以使用 git replace --editgit filter-branch 在复制分支中重新生成新的 ID。

但我想以更优雅的方式来做到这一点。也许有一个更简单的 git 命令可以达到这样的目的。

如果你想问我why do I want it?

因为,我想复制分支,将它们与特定标签相关联,为存档分支的每个提交重新生成 id,以传递可能的冲突。然后只合并原来的分支和另一个分支并关闭(删除)它。

那么,对于这种行为是否存在更优雅的方式?

你可以使用这个:

https://rtyley.github.io/bfg-repo-cleaner/

它比 filter-branch 快 10-720 倍!!!

The BFG will update your commits and all branches and tags so they are clean, but it doesn't physically delete the unwanted stuff.

Examine the repo to make sure your history has been updated, and then use the standard git gc command to strip out the unwanted dirty data, which Git will now recognise as surplus to requirements.

您可以用它做任何您想做的事,而不仅仅是删除数据。


The BFG 如何比 git-filter-branch 更快地做到这一点?

git-filter-branch steps through every commit in your history, executing whatever shell scripts you gave it against the contents- the full file tree -of each commit (so you can write a bash script to, for instance, delete a file), and this gives you a crazy amount of power. Too much power.

Each commit you clean, only a small amount of data will have changed - but your bash script is running over the entire file tree of the commit. You're cleaning the same damn files over and over again. That is slow and, speaking broadly, totally freakin' redundant.

Remember that for a given set of file contents, that file will only be stored once in the Git DB. Remember that a folder containing files & sub-folders will only be stored once, if the files & sub-folders have not changed. Why clean those precise files more than once? Git is begging you not to repeat yourself.

This is the idea of The BFG: Clean a given Git object once. Remember the result: Store the 'dirty' id and the 'clean' id in a simple map, and every time you encounter an object (file or folder) while cleaning a commit, check its id to see if you've cleaned it before, and if you have, just use the cleaned object you stored from last time. Frequently, you get a big win and a massive sub-folder does not have to be cleaned, because you already have the Git-id of what it looks like when it's been purged.

This kind of structure is also very amenable to parallelism, so while you have to clean commits in order, you can still actually fire off a ton parallel workers to clean their file contents, and get good use of all the CPU cores in your computer.

如果您主要对清理大文件感兴趣,您也可以尝试 github 的大文件解决方案:https://git-lfs.github.com/

As for me, I want to have a clean copy/archive of branch which will have the different ids for the future possible different ways of work.

除了 git 分支 copyBranch.

,您无需执行任何其他操作即可创建 "archive" 分支

你的“currentBranch”(你只是 "copied")将独立发展:也许会有 rebase 会改变它的 SHA1,但这对你的复制分支没有影响。

假设您在当前分支中删除了一个提交(通过 rebase --interactive),结果如下:

A--B--C (currentBranch, copyBranch)

收件人(在 git rebase -i 之后):

A--B--C (copyBranch)
 \
  C' (currentBranch)

看到了吗?您当前的分支将具有不同的 SHA1! (对于已重新定位的部分)。

无需处理 copyBranch 的 SHA1:它们将永远保持 "archived"(由 copyBranch HEAD 引用)不变。
而您的 currentBranch 会根据需要更改。

恢复 copyBranch 将变得微不足道。