如何在不知道 infohash 的情况下从 DHT 节点获取 infohash 和 torrent 元数据?

How do I fetch infohash and torrent metadata from DHT nodes without having knowledge of infohash?

根据这个 DefCon 演讲 Crawling BitTorrent DHTs for Fun,即使所有其他 torrent 站点连同它们的备份都被对手关闭,BitTorrent DHT 节点可以被抓取以在一夜之间从头开始构建 torrent 站点。

在 kademlia 中,对等点从 torrent 站点学习 infohash,为每个 torrent 建立磁力链接索引。一个对等点发出 get_peers 请求以获取当前正在下载和播种 torrent 的对等点列表。 节点 ID 最接近查询对等方的 infohash return 元数据的 dht 节点。

无论如何,我需要 infohash 来查询同行。那么,如果一个人没有 infohash,怎么能在一夜之间建立一个 torrent 站点呢?我认为唯一可能的方法是详尽的密钥搜索。必须随机生成 160-bit infohash 并开始查询同行,但这会花费很长时间。

kademlia 中是否存在任何现有的 远程过程调用,例如 get_infohashget_metadata 允许 dht 节点查询相邻节点的 infohash,因为那是直接从 dht 节点学习 infohash 的唯一方法。

它被称为DHT Infohash Indexing。此 BitTorrent 增强提案BEP 正在考虑标准化。此扩展使 DHT 节点能够检索其他节点当前在其存储中的信息哈希样本。

来自作者,BEP 51

DHT indexing already is possible and done in practice by passively observing get_peers queries. But that approach is inefficient, favoring indexers with lots of unique IP addresses at their disposal. It also incentivizes bad behavior such as spoofing node IDs and attempting to pollute other nodes' routing tables.

With this extension a single node should be able to survey the entire DHT within a few hours without having to resort to non-compliant behavior.

Since it cannot be directly used to search for specific torrents it is not expected that average clients actually use this RPC, they only need to support replying to it. Instead the intended use is that a few specialized indexers in the network use it as building block to create and curate a database of available torrents and then make it available to end users through other means, e.g. as a web service or through torrent feeds.

Message Format

Request:

{
    "a":
    {
        "id": <20 byte id of sending node (string)>,
        "target": <20 byte ID for nodes>,
    },
    "t": <transaction-id (string)>,
    "y": "q",
    "q": "sample_infohashes"
}

Response:

{
    "r":
    {
        "id": <20 byte id of sending node (string)>,
        "interval": <the subset refresh interval in seconds (integer)>,
        "nodes": <nodes close to 'target'>,
        "num": <number of infohashes in storage (integer)>,
        "samples": <subset of stored infohashes, N × 20 bytes (string)>
    },
    "t": <transaction-id (string)>,
    "y": "r"
}

As usual, additional fields may be defined by other BEPs.