DHT 洪流索引站点如何有效地抓取 infoHashes？

Question

我对 DHT 种子索引网站的工作原理很感兴趣。我有使用 nodejs lib 编写的 inhoHashes 工作抓取工具。第一次我尝试在 NAT 后面执行，但效率不高，然后我用 public IP 转到 BSD 服务器，事情真的好多了。在许多关于这个主题的 public 文章中，我了解到最好的解决方案是运行几个虚拟 DHT 节点来更快地抓取 infoHashes。我有代码启动几个 DHT 节点实例运行ned 具有唯一的 NODEID 和自己的端口。

我的nodejs代码：

"use strict"

const DHT = require('bittorrent-dht')
const crypto = require('crypto');

let DHTnodeID = []
for(let i = 1; i<=10; i++){
  DHTnodeID.push({[i]:crypto.createHash('sha1').update(`myDHTnodeLocal${i}`).digest('hex')}) //Give each node unique hash ID
}

let dhtOpt =  {
  nodeId: '',      // 160-bit DHT node ID (Buffer or hex string, default: randomly generated)
  //bootstrap: [],   // bootstrap servers (default: router.bittorrent.com:6881, router.utorrent.com:6881, dht.transmissionbt.com:6881)
  host: false,     // host of local peer, if specified then announces get added to local table (String, disabled by default)
  concurrency: 16, // k-rpc option to specify maximum concurrent UDP requests allowed (Number, 16 by default)
  //hash: Function,  // custom hash function to use (Function, SHA1 by default),
  //krpc: krpc(),     // optional k-rpc instance
  //timeBucketOutdated: 900000, // check buckets every 15min
  //maxAge: Infinity  // optional setting for announced peers to time out
}

var dhtNodes = []
for(let i = 1; i<=DHTnodeID.length; i++){
  dhtOpt.nodeId = DHTnodeID[i-1][String(i)]
  dhtNodes.push(new DHT(dhtOpt))
}

let port = 6881 //run 10 DHT nodes
for(let item of dhtNodes){  
  item.listen(port, listenFce)
  item.on('ready', readyFce)
  item.on('announce', announceFce)

  port++
}

然后我找到了一个大学研究项目，其中有以下语句：

The most obvious approach to increasing throughput is using several DHT nodes instead of one. Using several ports on a single IP address was not considered a viable option due to IP-address based filtering against potential DoS attacks. Instead the indexer is designed to run on several hosts or on a multihomed host. Individual instances synchronize their indexing activity through a shared relational database that stores discovered infohashes and the current processing stage for each .torrent file.

By Aaron Grunthal - University of Applied Sciences Esslingen

如果以上陈述为真，是否意味着我的 10 个节点 DHT 实例将被视为 DoS 攻击，我会受到某种惩罚吗？如果那是真的，那么那些网站（DHT 洪流索引站点）如何处理这个问题？是否有可能运行在一台服务器上使用一个 public IP 的高效 infoHash 抓取工具？显然，我执行的实例越多，我得到的哈希值就越多，但上面的陈述让我担心。非常感谢您。

Answer 1

If above statement is true does it mean, that my 10 node DHT instances will be considered as DoS attack and can I be penalized somehow?

这取决于网络中其他节点的实施质量。高级实施将实施各种 sanitizing strategies 以保持其路由 table 不受恶意对等方的影响。其中一种策略是每个 IP 地址只允许一个路由 table 条目。

If that is true, how then those websites (DHT torrent indexing site) deal with this problem?

他们可能会运行恶意节点，尝试进入比正常节点更多的路由 tables，但这会被 above-mentioned 清理策略抵消，因此不可靠（并且对生态系统）战略。他们还可以从您报价中提到的多个 IP 地址进行操作。

Is there any possibility to run efficient infoHash scraper with one public IP on one server?

BEP 51 从单个主机启用高效索引。

DHT 洪流索引站点如何有效地抓取 infoHashes？

How does DHT torrent indexing sites scrape infoHashes efficiently?

bittorrent

dht

node.js

torrent

webtorrent