以机器可读格式获取 git revlist 的输出

Question

我正在使用 git rev-list --all --format="%H%n%B" 检索 git 存储库的所有（可访问）提交。

我需要能够将结果输出解析为 提交哈希 以及 原始主体 的单独字段。

-> 是否有任何稳健的方式来格式化输出以使其能够被解析？

虽然提交哈希的长度是固定的，但原始正文中有数量未知的行，因此需要某种分隔符。我考虑过像标签一样将输出包装在 xml 中，例如--format="<record>%H%n%B</record>"，但这有一个明显的缺点，即如果将字符串 </record> 插入到原始正文中，将会阻止解析器。当然，我可以使定界符更复杂，以降低有人将它们插入提交消息的风险，但我真正需要的是一个从技术上讲不能成为原始主体一部分的字符。我尝试使用 ASCII 控制字符作为记录分隔符“\x1F”。但是，它并没有按预期插入到输出中，而是按原样打印出来。

根据 torek 的回复（谢谢！）我能够创建一个小的 python 函数：

from subprocess import Popen, PIPE
from codecs import decode

directory = '/path/to/git/repo'

git_rev_list = Popen(['git', '-C', directory, 'rev-list', '--all'], stdout=PIPE)
git_cat_file = Popen(['git', '-C', directory, 'cat-file', '--batch'],
                     stdin=git_rev_list.stdout, stdout=PIPE)
while True:
    line = git_cat_file.stdout.readline()
    try:
        hash_, type_, bytes_ = map(decode, line.split())
    except ValueError:
        break
    content = decode(git_cat_file.stdout.read(int(bytes_)))
    if type_ == 'commit':
        yield _get_commit(hash_, content)
    git_cat_file.stdout.readline()

Answer 1

您使用“\x1F”走在正确的道路上，但它应该是“%x1F”，您可以开始了。

来自 git rev-list 的联机帮助页：

· %x00: print a byte from a hex code

Answer 2

要通过格式插入 ASCII RS，请使用 %x1F，而不是 \x1F。

一般来说，最好的办法是单独进行尸体检索，因为 %B 可以从字面上扩展到任何东西，而且没有可用的保护措施。通常很容易运行 git log --no-walk --pretty=format:%B 一次提交一个，只是很慢。

要加快速度，您可以使用 git cat-file --batch 或类似的方法，确实提供了一种在程序中解析数据的简单方法：每个对象前面都有其尺寸。提交对象也很容易解析，因为 %B 等价物只是 "everything after the first two adjacent newlines"。因此，而不是：

git rev-list --all --format=something-tricky | ...

您可以使用：

git rev-list --all | git cat-file --batch | ...

并修改预期的输入格式以预期 <hash> <type> <size-in-bytes> LF <bytes> 的序列。或者，将格式指令添加到 git cat-file 以放弃对象类型（但我会保留它，因为这意味着您可以区分提交和带注释的标签）。

以机器可读格式获取 git revlist 的输出

Get output of git revlist in machine-readable format

git

git-rev-list