ELF、Build-ID，是否有实用程序可以重新计算它？

Question

我在 ELF 二进制文件中发现了这个有用的功能 -- Build ID. "It ... is (normally) the SHA1 hash over all code sections in the ELF image." 可以使用 GNU 实用程序阅读它：

$ readelf -n /bin/bash
...
Displaying notes found at file offset 0x00000274 with length 0x00000024:
  Owner                 Data size   Description
  GNU                  0x00000014   NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 54967822da027467f21e65a1eac7576dec7dd821

而且我想知道是否有一种简单的方法可以自己重新计算 Build ID？检查它是否未损坏等

Answer 1

构建 ID 不是程序的哈希值，而是构建的唯一标识符，仅被视为 "unique blob" — 至少在某些时候它曾经被定义为时间戳和绝对文件路径的哈希值，但这也不能保证稳定性。

Answer 2

I wonder if there is an easy way to recompute Build ID yourself?

不，没有，设计。

您链接到自身的页面链接到原始 description build-id 是什么以及它的用途。该页面说：

But I'd like to specify it explicitly as being a unique identifier good
only for matching, not any kind of checksum that can be verified against 
the contents.

(There are external general means for content verification, and I don't 
think debuginfo association needs to do that.)

其他并发症是：链接器can take any of：

--build-id
--build-id=sha1
--build-id=md5
--build-id=0xhexstring

因此构建 ID 不一定 sha1 总和开头。

Answer 3

所以，我从 Mark 那里得到了答案。由于它是最新信息，因此我将其 post 放在这里。但基本上你们是对的。确实没有计算 Build-ID 的工具，Build-ID 的意图不是 (1) 文件内容的识别，甚至 (2) 可执行（代码）部分的识别，但它用于 (3) 捕获构建的 "semantic meaning"，这是形式化的难点。（数字是 self-reference。）

引自电子邮件：

-- "Is there a user tool recomputing the build-id from the file itself, to check if it's not corrupted/compromised somehow etc?" If you have time, maybe you could post an answer there?

抱歉，我没有 Whosebug 帐户。但答案是：不，没有这样的工具，因为 build-id 计算未指定。它必须是普遍的独特。甚至没有指定 build-id 的精确长度。那里有多种使用不同哈希算法的方法 build-id 可能是计算以获得普遍唯一的价值。并非所有数据都可能（仍然）在 ELF 文件中重新计算它，即使你知道它是怎么回事最初创建。

Apparently, the intentions of Build-ID changed since the Fedora Feature page was written about it. And people's opinions diverge on what it is now. Maybe in your answer you could include status of Build-ID and what it is now as well?

我认为事情的表述不是很精确。如果一个工具改变了创建 ELF 文件的构建，因此它不是“语义相同的”二进制文件，那么它应该得到一个新的（重新计算的） build-id。但是，如果一个工具改变了文件的某些内容，仍然结果是 "semantically identical" 二进制然后 build-id 保持一样。

没有准确定义的是"semantically identical binary" 方法。目的是它捕获构建的所有内容由。因此，如果用于生成二进制文件的源文件是不同那么你期望不同的 build-ids，即使二进制代码生产的可能恰好是一样的。

这就是为什么在通过散列计算文件的 build-id 时您不仅使用（分配的）代码段的算法，还使用 debuginfo 部分（将包含对源文件的引用名字）。

但是如果你然后例如剥离调试信息（并将其放入单独的文件）那么这不会改变 build-id（文件仍然是从同一个版本创建）。

这也是为什么，即使您知道用于计算的精确哈希算法计算 build-id，您可能无法重新计算 build-id。因为您可能会遗漏一些在中使用的原始数据计算 build-id.
的哈希算法
随时与他人分享此答案。

干杯，

马克

此外，对于对 debuginfo（linux 性能和跟踪感兴趣的人，有人吗？），他提到了几个在 Fedora 上管理它们的项目：

ELF、Build-ID，是否有实用程序可以重新计算它？

ELF, Build-ID, is there a utility to recompute it?

linux

linker

elf