如何提高 'db.collection.insert()' 批量插入 100K MongoDB 个对象的速度

How to increase 'db.collection.insert()' batch insertion speed of 100K MongoDB objects

在我的 Ubuntu 服务器上,我在 Rails 应用程序上有一个 Ruby 依赖于 MongoDB。我经常使用 Mongoid to inject objects into the DB, but when injecting large amounts of object I compile a huge array of hashes and inject it with the mongo Shell method db.collection.insert():

ObjectName.collection.insert([{_id: BSON::ObjectId('5671329e4368725951010000'), name: "foo"}, {_id: BSON::ObjectId('567132c94368725951020000'), name: "bar"}])

批量插入时间对我来说是个瓶颈。例如批量插入150000个对象需要23秒。是否有可能以一种使批量插入速度更快的方式分配资源?

您可以尝试使用 mongoid gem

batch = [{_id: BSON::ObjectId('5671329e4368725951010000'), name: "foo"}, {_id: BSON::ObjectId('567132c94368725951020000'), name: "bar"}]

Post.collection.insert(batch) #lest Post is the model

或者您可以通过 Ruby MongoDb Driver

require 'mongo'
mongo_client = Mongo::MongoClient.new
coll = mongo_client['test_db']['test_collection']
bulk = coll.initialize_ordered_bulk_op
batch.each do |hash|
  bulk.insert(hash)
end
bulk.execute

如果你想通过mongo以同样的方式查询。你可以关注Bulk Insert

要增加数据,您可以使用 sharding

Sharding is the process of storing data records across multiple machines and is MongoDB’s approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. With sharding, you add more machines to support data growth and the demands of read and write operations.

缩放比例不同

Vertical scaling adds more CPU and storage resources to increase capacity. Scaling by adding capacity has limitations: high performance systems with large numbers of CPUs and large amount of RAM are disproportionately more expensive than smaller systems. Additionally, cloud-based providers may only allow users to provision smaller instances. As a result there is a practical maximum capability for vertical scaling. Sharding, or horizontal scaling, by contrast, divides the data set and distributes the data over multiple servers, or shards. Each shard is an independent database, and collectively, the shards make up a single logical database.