Mongo 查找查询需要 2 分钟

Question

我的一个集合中有大约 75,000 个文档。

数据库总大小约为 45GB。
在 75k 文档中，大约 45k 每个文档大小为 900 KB（大约 42 GB），其余文档每个文档大约 120 KB。

每个文档都映射到其他集合中的一个 custId ObjectId，并且有一个 timestamp，都已编入索引。

现在我需要获取上个月特定 custId 的文档。计数约为 5500 个文档。此 custId 包含每个大小约为 120 KB 的小文档。

以下是我的查询：

db.mycollection.find(
{
    custId:ObjectId("CUST_OBJECT_ID_HERE"),
    timestamp:{$gte:one_month_ago_date, $lt:current_date}
}).sort({timestamp:-1})

查询仍然需要 2 分钟来获取所有记录。是因为文档的数量还是较大文档的大小？有什么办法可以解决这个问题吗？

注意： 从 nodejs 触发查询需要 2 分钟。如果我在 mongo shell 上触发它，它会很快 return 但这可能是因为它只是获取前 50 条记录。当我将 .count() 附加到 mongo shell 上的查询时，return 花费了 2 分钟的时间。

更新：
索引详情：

"wiredTiger" : {
    "nindexes" : 3,
    "totalIndexSize" : 2396160,
    "indexSizes" : {
        "_id_" : 1138688,
        "custId_1" : 598016,
        "timestamp_1" : 659456
    }
}

解释输出：（带排序）

{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "mydb.mycollection",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [
                {
                    "custId" : {
                        "$eq" : ObjectId("CUST_OBJECT_ID_HERE")
                    }
                },
                {
                    "timestamp" : {
                        "$lt" : ISODate("2017-05-15T14:20:04.393Z")
                    }
                },
                {
                    "timestamp" : {
                        "$gte" : ISODate("2017-04-15T14:20:04.393Z")
                    }
                }
            ]
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "filter" : {
                "custId" : {
                    "$eq" : ObjectId("CUST_OBJECT_ID_HERE")
                }
            },
            "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "timestamp" : 1
                },
                "indexName" : "timestamp_1",
                "isMultiKey" : false,
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 1,
                "direction" : "backward",
                "indexBounds" : {
                    "timestamp" : [
                        "(new Date(1494858004393), new Date(1492266004393)]"
                    ]
                }
            }
        },
        "rejectedPlans" : [
            {
                "stage" : "SORT",
                "sortPattern" : {
                    "timestamp" : -1
                },
                "inputStage" : {
                    "stage" : "SORT_KEY_GENERATOR",
                    "inputStage" : {
                        "stage" : "FETCH",
                        "filter" : {
                            "$and" : [
                                {
                                    "timestamp" : {
                                        "$lt" : ISODate("2017-05-15T14:20:04.393Z")
                                    }
                                },
                                {
                                    "timestamp" : {
                                        "$gte" : ISODate("2017-04-15T14:20:04.393Z")
                                    }
                                }
                            ]
                        },
                        "inputStage" : {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "custId" : 1
                            },
                            "indexName" : "custId_1",
                            "isMultiKey" : false,
                            "isUnique" : false,
                            "isSparse" : false,
                            "isPartial" : false,
                            "indexVersion" : 1,
                            "direction" : "forward",
                            "indexBounds" : {
                                "custId" : [
                                    "[ObjectId('CUST_OBJECT_ID_HERE'), ObjectId('CUST_OBJECT_ID_HERE')]"
                                ]
                            }
                        }
                    }
                }
            }
        ]
    },
    "serverInfo" : {
        "host" : "test-machine",
        "port" : 27017,
        "version" : "3.2.12",
        "gitVersion" : "REMOVED_BY_OP"
    },
    "ok" : 1
}

Answer 1

这就是索引的用途！

为 timestamp 和 custId 创建索引（同时使用两者的复合索引效率最高），这样就没问题了。由于按时间戳排序，在复合索引中，将时间戳放在第一个（顺序很重要）

这是在mongo中创建复合索引的代码：

const mongoose = require('mongoose');
const Schema = mongoose.Schema;

const userSchema = new Schema({
    //...
});

userSchema.index({timestamp: 1, custId: 1});

mongoose.model('User', userSchema);
module.exports = userSchema;

Answer 2

试试这个指数：

db.mycollection.createIndex({custId:1,timestamp:1}, {background:true})

Answer 3

以上回答完全正确。只是要投入我的 2 美分。这个答案在很大程度上取决于您可用的内存，以及您需要 return 的信息是 "real time" 还是可以以某种方式缓存信息。

Mongodb 因占用内存而臭名昭著。（我喜欢 mongodb 但内存是致命弱点）。其次，如上所述，在进行查询之前，您可以采取任何措施来改善查询结果，这在时间、读取和核心使用方面都是一个很大的优势。当涉及到文档存储时，您可能（或将）发现正确设置的 Redis 缓存也将极大地帮助您缩短响应时间。

显然这需要内存，在您的情况下需要平衡（包括负载平衡）。它是内存、速度和磁盘使用（即使是 SSD）的适当组合，这将帮助您平衡这些查询请求与系统的要求。

希望对您有所帮助。

Mongo 查找查询需要 2 分钟

Mongo find query takes 2 minutes

mongodb

node.js

mongodb-query

node-mongodb-native