MongoDB 耗尽碎片但平衡器没有运行？（removeShard 花费太多时间）

Question

我正在尝试将一个目前有 8 个分片的分片集群缩减为一个有 4 个分片的集群。

我从第 8 个碎片开始，并尝试先将其删除。

db.adminCommand( { removeShard : "rs8" } );
----
{
    "msg" : "draining ongoing",
    "state" : "ongoing",
    "remaining" : {
        "chunks" : NumberLong(1575),
        "dbs" : NumberLong(0)
    },
    "note" : "you need to drop or movePrimary these databases",
    "dbsToMove" : [ ],
    "ok" : 1
}

因此有 1575 个块要迁移到集群的其余部分。

但是运行 sh.isBalancerRunning() 我得到值 false 并且 sh.status() 的输出如下所示：

  ...
  ...

  active mongoses:
        "3.4.10" : 16
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
NaN
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                59 : Success
                1 : Failed with error 'aborted', from rs8 to rs1
                1 : Failed with error 'aborted', from rs2 to rs6
                1 : Failed with error 'aborted', from rs8 to rs5
                4929 : Failed with error 'aborted', from rs2 to rs7
                1 : Failed with error 'aborted', from rs8 to rs2
                506 : Failed with error 'aborted', from rs8 to rs7
                1 : Failed with error 'aborted', from rs2 to rs3
...

所以平衡器是启用，但不是运行。但是有一个 draining shard (rs8) 正在被移除，所以我认为平衡器应该一直运行，对吗？事实并非如此，正如我在上面提供的日志中所证明的那样。

而且这个过程花费了难以置信的时间，过去将近一天，剩余区块的数量只减少了10个区块，从1575到1565！这样，我将需要几个月的时间才能将 8 个实例的分片集群减少到 4 个实例的分片集群！

似乎 MongoDB 本身并没有停止写入耗尽碎片，所以我遇到的是块的增加速率，也许几乎抵消了它们的减少？

非常感谢任何帮助！
谢谢

Answer 1

编辑

太好了，整整一个月后，过程结束了，我有了一个 4 分片集群！执行我在下面描述的技巧有助于减少本来可以花费的时间，但老实说，这是我做过的最慢的事情。

好的，所以在这里回答我自己的问题，

我无法让自动平衡行为以我想要的速度运行，每天我观察到大约有 5 到 7 个块被迁移（这意味着整个过程需要数年时间！）

为了 kinda 克服这个问题，我所做的是手动使用 moveChunk 命令。

所以我基本上做的是：

while 'can still sample':
    // Sample the 8th shard for 100 documents
    db.col.aggreagte([{$sample: {size: 100}}])

    For every document:
        db.moveChunk(namespace, {shardKey: value}, `rs${NUM}`);

所以我手动将块从第 8 个分片移到前 4 个分片（一个缺点是因为我们需要启用平衡器并且 only one shard can be draining at every moment，其中一些迁移的块将再次迁移自动转到碎片 5-7，我稍后也想将其删除，这会导致该过程花费更多时间，有解决方案吗？）。

由于 8th 分片正在 draining，它不会再被平衡器填充，现在整个过程要快得多，每天大约 350-400 个块。所以希望每个分片最多需要大约 5 天，然后整个调整大小大约需要 20 天！

这是我能做到的最快的速度，我感谢任何有任何其他答案或策略来更好地执行此缩小的人。

MongoDB 耗尽碎片但平衡器没有运行？（removeShard 花费太多时间）

MongoDB draining shard but balancer not running? (removeShard taking too much time)

sharding

mongodb

MongoDB 耗尽碎片但平衡器没有 运行？ （removeShard 花费太多时间）

MongoDB draining shard but balancer not running? (removeShard taking too much time)

sharding

mongodb

MongoDB 耗尽碎片但平衡器没有运行？（removeShard 花费太多时间）