如何将 100 万条记录异步保存到 mongodb?
How to save 1 million records to mongodb asyncronously?
我想使用 javascript 将 100 万条记录保存到 mongodb,如下所示:
for (var i = 0; i<10000000; i++) {
model = buildModel(i);
db.save(model, function(err, done) {
console.log('cool');
});
}
我试过了,它保存了~160条记录,然后挂了2分钟,然后退出了。为什么?
它失败了,因为您没有等待异步调用完成就可以继续进行下一次迭代。这意味着您正在构建一个 "stack" 未解决的操作,直到这导致问题为止。这个站点的名称又是什么?拿到照片了吗?
所以这不是继续 "Bulk" insertions. Fortunately the underlying MongoDB driver has already thought about this, aside from the callback issue mentioned earlier. There is in fact a "Bulk API" available to make this a whole lot better. And assuming you already pulled the native driver as the db
object. But I prefer just using the .collection
accessor from the model, and the "async" 模块使一切都清楚的最佳方法:
var bulk = Model.collection.initializeOrderedBulkOp();
var counter = 0;
async.whilst(
// Iterator condition
function() { return count < 1000000 },
// Do this in the iterator
function(callback) {
counter++;
var model = buildModel(counter);
bulk.insert(model);
if ( counter % 1000 == 0 ) {
bulk.execute(function(err,result) {
bulk = Model.collection.initializeOrderedBulkOp();
callback(err);
});
} else {
callback();
}
},
// When all is done
function(err) {
if ( counter % 1000 != 0 )
bulk.execute(function(err,result) {
console.log( "inserted some more" );
});
console.log( "I'm finished now" ;
}
);
不同之处在于在完成时同时使用 "asynchronous" 回调方法,而不是仅仅建立一个堆栈,而且还使用 "Bulk Operations API" 以通过批量提交所有内容来减轻异步写入调用更新 1000 个条目的报表。
这不仅不像您自己的示例代码那样 "build up a stack" 执行函数,而且还执行高效的 "wire" 事务,因为不会将所有内容都发送到单独的语句中,而是分解成可管理的"batches" 用于服务器承诺。
你应该使用类似 Async's eachLimit
:
// Create a array of numbers 0-999999
var models = new Array(1000000);
for (var i = models.length; i >= 0; i--)
models[i] = i;
// Iterate over the array performing a MongoDB save operation for each item
// while never performing more than 20 parallel saves at the same time
async.eachLimit(models, 20, function iterator(model, next){
// Build a model and save it to the DB, call next when finished
db.save(buildModel(model), next);
}, function done(err, results){
if (err) { // When an error has occurred while trying to save any model to the DB
console.error(err);
} else { // When all 1,000,000 models have been saved to the DB
console.log('Successfully saved ' + results.length + ' models to MongoDB.');
}
});
我想使用 javascript 将 100 万条记录保存到 mongodb,如下所示:
for (var i = 0; i<10000000; i++) {
model = buildModel(i);
db.save(model, function(err, done) {
console.log('cool');
});
}
我试过了,它保存了~160条记录,然后挂了2分钟,然后退出了。为什么?
它失败了,因为您没有等待异步调用完成就可以继续进行下一次迭代。这意味着您正在构建一个 "stack" 未解决的操作,直到这导致问题为止。这个站点的名称又是什么?拿到照片了吗?
所以这不是继续 "Bulk" insertions. Fortunately the underlying MongoDB driver has already thought about this, aside from the callback issue mentioned earlier. There is in fact a "Bulk API" available to make this a whole lot better. And assuming you already pulled the native driver as the db
object. But I prefer just using the .collection
accessor from the model, and the "async" 模块使一切都清楚的最佳方法:
var bulk = Model.collection.initializeOrderedBulkOp();
var counter = 0;
async.whilst(
// Iterator condition
function() { return count < 1000000 },
// Do this in the iterator
function(callback) {
counter++;
var model = buildModel(counter);
bulk.insert(model);
if ( counter % 1000 == 0 ) {
bulk.execute(function(err,result) {
bulk = Model.collection.initializeOrderedBulkOp();
callback(err);
});
} else {
callback();
}
},
// When all is done
function(err) {
if ( counter % 1000 != 0 )
bulk.execute(function(err,result) {
console.log( "inserted some more" );
});
console.log( "I'm finished now" ;
}
);
不同之处在于在完成时同时使用 "asynchronous" 回调方法,而不是仅仅建立一个堆栈,而且还使用 "Bulk Operations API" 以通过批量提交所有内容来减轻异步写入调用更新 1000 个条目的报表。
这不仅不像您自己的示例代码那样 "build up a stack" 执行函数,而且还执行高效的 "wire" 事务,因为不会将所有内容都发送到单独的语句中,而是分解成可管理的"batches" 用于服务器承诺。
你应该使用类似 Async's eachLimit
:
// Create a array of numbers 0-999999
var models = new Array(1000000);
for (var i = models.length; i >= 0; i--)
models[i] = i;
// Iterate over the array performing a MongoDB save operation for each item
// while never performing more than 20 parallel saves at the same time
async.eachLimit(models, 20, function iterator(model, next){
// Build a model and save it to the DB, call next when finished
db.save(buildModel(model), next);
}, function done(err, results){
if (err) { // When an error has occurred while trying to save any model to the DB
console.error(err);
} else { // When all 1,000,000 models have been saved to the DB
console.log('Successfully saved ' + results.length + ' models to MongoDB.');
}
});